-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reintroduce worker race condition fix from #98 that was lost during the migration #187
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Unlike the legacy JobManager, I think we may be able to keep the JobStore interface as is and free of biz logic per my comments below.
@@ -142,6 +118,11 @@ public void updateJob(UUID jobId, PortabilityJob job) throws IOException { | |||
throw new IOException("Could not find record for jobId: " + jobId); | |||
} | |||
|
|||
if (previousState != null) { | |||
PortabilityJob previousJob = PortabilityJob.fromMap(getProperties(previousEntity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future PR: Maybe if we json serialize the job, we won't have to go from entity -> map -> job object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, @jimmarino brought up the same thing in Slack while working on the Azure impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added issue #191
// instance polled the same job, and already claimed it, it will have updated the job's state | ||
// to CREDS_ENCRYPTION_KEY_GENERATED. | ||
try { | ||
store.updateJob(jobId, updatedJob, JobAuthorization.State.CREDS_AVAILABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can ensure here that the existingJob state is CREDS_AVAILABLE rather than pass it into the JobStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use Validator
@@ -124,13 +124,13 @@ private void handleWorkerAssignmentFlow(HttpExchange exchange, UUID jobId) throw | |||
Preconditions.checkArgument( | |||
!Strings.isNullOrEmpty(importAuthCookieValue), "Import auth cookie required"); | |||
|
|||
// We have the data, now update to 'pending worker assignment' so a worker may be assigned | |||
// We have the data, now update to 'CREDS_AVAILABLE' so a worker may poll the job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have the existing job here and can validate it was INITIAL before setting the new state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TODO since this is the LegacyPortabilityJob impl
* @throws IllegalStateException if state validation was requested ({@code previousState} was | ||
* non-null) and the job's state was not the expected {@code previousState} | ||
*/ | ||
void updateJob(UUID jobId, PortabilityJob job, JobAuthorization.State previousState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm weary of adding the third param simply for validation since this class mostly just cares about storage. Since we usually have the existing job object before updateJob is called, can we validate before the updateJob method is called rather than having it leak into the JobStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed we should minimize business logic in here, but if we do the validation before, I think we'd still get the race condition:
Worker 1 polls job, validates state CREDS_AVAILABLE
Worker 2 polls job, validates state CREDS_AVAILABLE
Worker 1 updates with its public key
Worker 2 updates with its public key too
both workers start processing the same job
With the current approach, the race condition is handled b/c:
Worker 1 polls job
Worker 2 polls job
Worker 1 starts transaction
Worker 2 starts transaction - this will wait to look up job until previous transaction is closed
Worker 1 transaction: validate state CREDS_AVAILABLE, update with new state & public key - succeeds
Worker 2 transaction: validate state CREDS_AVAILABLE - fails, polls a new job
One way to avoid adding the new param, we could add a check for if you are setting public key, validate it was previously empty. That is still business logic inside this class but I think that's OK since it's a "Job Store" so can have some job-related logic.
The benefit to passing in the Authorization State is we can use that to validate other state changes like the comment you added about INITIAL -> CREDS_AVAILABLE. It would be slightly cleaner to pass a flag 'validateStateChange' that would validate the previous auth state. I don't know of any other race conditions now with state changes but would be good to prevent them in general.
The other approach I had a while ago was to expose the db transaction outside the jobstore and have the worker manage the transaction, but I think that would be messier.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good description of the issue. Ok, let's move forward with this.
It's up to you if you want to do this now or later, but maybe we can pass in an JobUpdateValidator interface that the client can provide to wrap this logic.
void updateJob(UUID jobId, PortabilityJob job, JobUpdateValidator validator)
Something like
inteface JobUpdateValidator {
/** Validates before update. */
void validate(PortabilityJob previous, PortabilityJob updated);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like that solution, I went with that
try { | ||
store.updateJob(jobId, updatedJob, JobAuthorization.State.CREDS_AVAILABLE); | ||
} catch (IllegalStateException e) { | ||
throw new IOException("Could not 'claim' job " + jobId + ". It was probably already " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed your comment about business logic in the job stores. Let me know what you think, thanks!
@@ -142,6 +118,11 @@ public void updateJob(UUID jobId, PortabilityJob job) throws IOException { | |||
throw new IOException("Could not find record for jobId: " + jobId); | |||
} | |||
|
|||
if (previousState != null) { | |||
PortabilityJob previousJob = PortabilityJob.fromMap(getProperties(previousEntity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, @jimmarino brought up the same thing in Slack while working on the Azure impl
* @throws IllegalStateException if state validation was requested ({@code previousState} was | ||
* non-null) and the job's state was not the expected {@code previousState} | ||
*/ | ||
void updateJob(UUID jobId, PortabilityJob job, JobAuthorization.State previousState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed we should minimize business logic in here, but if we do the validation before, I think we'd still get the race condition:
Worker 1 polls job, validates state CREDS_AVAILABLE
Worker 2 polls job, validates state CREDS_AVAILABLE
Worker 1 updates with its public key
Worker 2 updates with its public key too
both workers start processing the same job
With the current approach, the race condition is handled b/c:
Worker 1 polls job
Worker 2 polls job
Worker 1 starts transaction
Worker 2 starts transaction - this will wait to look up job until previous transaction is closed
Worker 1 transaction: validate state CREDS_AVAILABLE, update with new state & public key - succeeds
Worker 2 transaction: validate state CREDS_AVAILABLE - fails, polls a new job
One way to avoid adding the new param, we could add a check for if you are setting public key, validate it was previously empty. That is still business logic inside this class but I think that's OK since it's a "Job Store" so can have some job-related logic.
The benefit to passing in the Authorization State is we can use that to validate other state changes like the comment you added about INITIAL -> CREDS_AVAILABLE. It would be slightly cleaner to pass a flag 'validateStateChange' that would validate the previous auth state. I don't know of any other race conditions now with state changes but would be good to prevent them in general.
The other approach I had a while ago was to expose the db transaction outside the jobstore and have the worker manage the transaction, but I think that would be messier.
WDYT?
5cb73c4
to
2fa7458
Compare
I'll take a look at how to handle this in Cosmos DB. It claims to be transactional so it shold be OK. |
* Validation to do as part of an atomic update. Implementers should throw an | ||
* {@code IllegalStateException} if the validation fails. | ||
*/ | ||
void validate(PortabilityJob previous, PortabilityJob updated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So clients can see it in the contract of the method signature, let's explicitly declare throws IllegalStateException or a custom ValidationException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IllegalStateException is unchecked, and this is the same convention as Preconditions.checkState (i.e. that also documents @throws IllegalStateException but does not explicitly declare in method signature)
I think it's probably unrecoverable if validate fails, so should stick with an unchecked exception, depending on what we decide for #176. Can we revisit this with #176?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm thinking more about defining explicitly how a client handles exceptions to the validation to avoid a mishmash of inconsistent exception handling.
If we go with this approach, I put a placeholder in the Cosmos Store to implement this