New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow immediate run in pending queue with deploy #1538

Merged
merged 7 commits into from Jun 12, 2017

Conversation

Projects
None yet
2 participants
@PtrTeixeira
Contributor

PtrTeixeira commented May 10, 2017

Give immediate run requests in the pending queue a separate key, so they
are permitted to be in the pending queue at the same time as a new
deploy for the same requestID / deployID. This means that a pending
deploy will not block an immediate run from being enqueued.

/cc @ssalinas

PtrTeixeira added some commits May 10, 2017

Allow immediate run in pending queue with deploy
Give immediate run requests in the pending queue a seperate key, so they
are permitted to be in the pending queue at the same time as a new
deploy for the same requestID / deployID. This means that a pending
deploy will not block an immediate run from being enqueued.
Add test that immediate replaces scheduled
Add test to validate that an immediate (run-now) test replaces a
scheduled task in the pending list.

@ssalinas ssalinas modified the milestone: 0.16.0 May 10, 2017

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas May 10, 2017

Member

Thanks for the additional test, let's give this a go in staging 👍

Member

ssalinas commented May 10, 2017

Thanks for the additional test, let's give this a go in staging 👍

Consistently delete immediate run requests
In particular, if attempting to delete an immediate run node from ZK
indicates that the node wasn't there, try deleting it from the other
location instead.

/cc @ssalinas
Show outdated Hide outdated ...tyService/src/main/java/com/hubspot/singularity/data/RequestManager.java
(existingRequest.get().getPendingType() == PendingType.TASK_DONE
|| existingRequest.get().getPendingType() == PendingType.NEW_DEPLOY)) {
boolean markImmediate = forceImmediate
|| (existingRequest.isPresent()

This comment has been minimized.

@ssalinas

ssalinas May 30, 2017

Member

I think this may still be causing us some issues. I've seen a scheduled request or two pop through that never got a pending task in place after a successful deploy, I think due to the two possible deletes and the fact that they could be enqueued between those attempts. We also have a weird bit going on with the getPendingPath(String requestId, String deployId) implementation which returns a different version of the path when we call getPendingRequest

I'm wondering about the consequences of returning String.format("%s%s", deployKey, pendingRequest.getTimestamp()) for all cases. Since we sort the pending requests by timestamp, we will still be able to put multiple in the queue. The work then falls to the scheduler to de-dupe and process appropriately. We should check out where else we might be relying on the EXISTED response to see if this is possible

@ssalinas

ssalinas May 30, 2017

Member

I think this may still be causing us some issues. I've seen a scheduled request or two pop through that never got a pending task in place after a successful deploy, I think due to the two possible deletes and the fact that they could be enqueued between those attempts. We also have a weird bit going on with the getPendingPath(String requestId, String deployId) implementation which returns a different version of the path when we call getPendingRequest

I'm wondering about the consequences of returning String.format("%s%s", deployKey, pendingRequest.getTimestamp()) for all cases. Since we sort the pending requests by timestamp, we will still be able to put multiple in the queue. The work then falls to the scheduler to de-dupe and process appropriately. We should check out where else we might be relying on the EXISTED response to see if this is possible

Defer duplicate request processing to scheduler
Defer checking duplicate requests from ZK into the scheduler.  This
allows slightly more control over what kinds of requests are allowed to
be in the pending queue simultaneously, and avoids the kind of edge
cases that we were running into previously by attempting to use
nodenames in ZK to resolve the same problem.

/cc @ssalinas
Show outdated Hide outdated ...tyService/src/main/java/com/hubspot/singularity/data/RequestManager.java
return ZKPaths.makePath(PENDING_PATH_ROOT, nodeName);
}
private String pendingQueueKey(SingularityPendingRequest pendingRequest) {
SingularityDeployKey deployKey = new SingularityDeployKey(pendingRequest.getRequestId(), pendingRequest.getDeployId());
return String.format("%s%s", deployKey.toString(), pendingRequest.getTimestamp());

This comment has been minimized.

@ssalinas

ssalinas May 31, 2017

Member

Only immediate thing I see here is that we'd need to handle any pending requests that are currently in the queue when singularity starts. Otherwise we will not delete them correctly since we will calculate the wrong path, but we will continually process them since it's just a getChildren call. Should be able to write a simple zk migration for this that reads them all writes to the correct path and deletes the old path on startup

@ssalinas

ssalinas May 31, 2017

Member

Only immediate thing I see here is that we'd need to handle any pending requests that are currently in the queue when singularity starts. Otherwise we will not delete them correctly since we will calculate the wrong path, but we will continually process them since it's just a getChildren call. Should be able to write a simple zk migration for this that reads them all writes to the correct path and deletes the old path on startup

This comment has been minimized.

@PtrTeixeira

PtrTeixeira Jun 1, 2017

Contributor

If only immediate / one-off requests have a different pending path now, do we still need the zk migration?

@PtrTeixeira

PtrTeixeira Jun 1, 2017

Contributor

If only immediate / one-off requests have a different pending path now, do we still need the zk migration?

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

We most likely do. This will still change the path of things that might be currently in the pending queue on startup. So we'd need to rewrite those to their new path

@ssalinas

ssalinas Jun 1, 2017

Member

We most likely do. This will still change the path of things that might be currently in the pending queue on startup. So we'd need to rewrite those to their new path

Show outdated Hide outdated ...rc/main/java/com/hubspot/singularity/scheduler/SingularityScheduler.java
List<SingularityPendingRequest> effectivePendingRequests = new ArrayList<>();
pendingRequestsForDeploy.sort(Comparator.comparingLong(SingularityPendingRequest::getTimestamp));
for (SingularityPendingRequest pendingRequest : pendingRequestsForDeploy) {
final SingularityRequest updatedRequest = updatedRequest(maybePendingDeploy, pendingRequest, maybeRequest.get());

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

We should move this earlier on and make sure all methods that use the SingularityRequest object are using this one. i.e. in the old code getMatchingTaskIds was also called with the updated one instead of maybeRequest.get().getRequest(). Maybe put this up right after the isRequestActive check?

@ssalinas

ssalinas Jun 1, 2017

Member

We should move this earlier on and make sure all methods that use the SingularityRequest object are using this one. i.e. in the old code getMatchingTaskIds was also called with the updated one instead of maybeRequest.get().getRequest(). Maybe put this up right after the isRequestActive check?

This comment has been minimized.

@PtrTeixeira

PtrTeixeira Jun 1, 2017

Contributor

The getMatchingTaskIds method really only cares about the requestId and whether the request is long-running, which I don't think can change per pending request. The reason that I'm pushing back here is that if it needs to be run on the updated request, then it needs to be run in the inner loop, rather than the outer one.

@PtrTeixeira

PtrTeixeira Jun 1, 2017

Contributor

The getMatchingTaskIds method really only cares about the requestId and whether the request is long-running, which I don't think can change per pending request. The reason that I'm pushing back here is that if it needs to be run on the updated request, then it needs to be run in the inner loop, rather than the outer one.

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

Ah that's right we've got the inner loop for the individual pending requests now, 👍

@ssalinas

ssalinas Jun 1, 2017

Member

Ah that's right we've got the inner loop for the individual pending requests now, 👍

Show outdated Hide outdated ...rc/main/java/com/hubspot/singularity/scheduler/SingularityScheduler.java
LOG.trace("Holding pending request {} because it is scheduled and has an active task", pendingRequest);
heldForScheduledActiveTask++;
continue;
requestManager.deletePendingRequest(pendingRequest);

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

for any pending requests that we are scheduling tasks for, the delete should come after the scheduleTasks call. If we are interrupted between here and scheduleTasks we lose a pending request and could get into an inconsistent state

@ssalinas

ssalinas Jun 1, 2017

Member

for any pending requests that we are scheduling tasks for, the delete should come after the scheduleTasks call. If we are interrupted between here and scheduleTasks we lose a pending request and could get into an inconsistent state

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

second thing on this, if we hit the else case there, should we really be deleting the pending request? In this particular case if we had a BOUNCE and a NEW_DEPLOY in the queue, we'd take the bounce and end up deleting the new deploy

@ssalinas

ssalinas Jun 1, 2017

Member

second thing on this, if we hit the else case there, should we really be deleting the pending request? In this particular case if we had a BOUNCE and a NEW_DEPLOY in the queue, we'd take the bounce and end up deleting the new deploy

Show outdated Hide outdated ...rc/main/java/com/hubspot/singularity/scheduler/SingularityScheduler.java
requestManager.deletePendingRequest(pendingRequest);
totalNewScheduledTasks += scheduledInstances;

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

let's keep the variable as numScheduledTasks since we call it tasks not instances everywhere else in this class

@ssalinas

ssalinas Jun 1, 2017

Member

let's keep the variable as numScheduledTasks since we call it tasks not instances everywhere else in this class

Show outdated Hide outdated ...rc/main/java/com/hubspot/singularity/scheduler/SingularityScheduler.java
@@ -398,8 +416,30 @@ private void deleteScheduledTasks(final Collection<SingularityPendingTask> sched
}
}
private int scheduleTasks(SingularitySchedulerStateCache stateCache, SingularityRequest request, RequestState state, SingularityDeployStatistics deployStatistics,
SingularityPendingRequest pendingRequest, List<SingularityTaskId> matchingTaskIds, Optional<SingularityPendingDeploy> maybePendingDeploy) {
private List<SingularityTaskId> getMatchingTaskIds(SingularitySchedulerStateCache stateCache, SingularityRequest request, SingularityDeployKey deployKey) {

This comment has been minimized.

@ssalinas

ssalinas Jun 1, 2017

Member

is there a reason you duplicated this method?

@ssalinas

ssalinas Jun 1, 2017

Member

is there a reason you duplicated this method?

PtrTeixeira added some commits Jun 1, 2017

Respond to PR comments
In particular
- Revert to only using the timestamp format for immediate & one-off
  requests
- Correct the order of schedule / delete in the scheduler to
  guarantee that the scheduler is not left in an inconsistent
  state.
- Rename some variables for consistency with the result of the
  class
Add ZK migration for pending requests
Add a ZK migration that will rewrite pending requests onto the new
format, where they are timestamped to allow multiples in the pending
queue at a time.
@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 1, 2017

Member

👍 looks to go give this another go with the zk migration in there now

Member

ssalinas commented Jun 1, 2017

👍 looks to go give this another go with the zk migration in there now

@ssalinas

This comment has been minimized.

Show comment
Hide comment
@ssalinas

ssalinas Jun 12, 2017

Member

this has looked good with the new updates, merging. Thanks @PtrTeixeira

Member

ssalinas commented Jun 12, 2017

this has looked good with the new updates, merging. Thanks @PtrTeixeira

@ssalinas ssalinas merged commit dd72a31 into master Jun 12, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ssalinas ssalinas deleted the add-items-in-pending-queue branch Jun 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment