Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce worker race condition fix from #98 that was lost during the migration #187

Merged
merged 3 commits into from
Mar 8, 2018

Conversation

rtannenbaum
Copy link
Contributor

If we go with this approach, I put a placeholder in the Cosmos Store to implement this

Copy link
Collaborator

@holachuy holachuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Unlike the legacy JobManager, I think we may be able to keep the JobStore interface as is and free of biz logic per my comments below.

@@ -142,6 +118,11 @@ public void updateJob(UUID jobId, PortabilityJob job) throws IOException {
throw new IOException("Could not find record for jobId: " + jobId);
}

if (previousState != null) {
PortabilityJob previousJob = PortabilityJob.fromMap(getProperties(previousEntity));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR: Maybe if we json serialize the job, we won't have to go from entity -> map -> job object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, @jimmarino brought up the same thing in Slack while working on the Azure impl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added issue #191

// instance polled the same job, and already claimed it, it will have updated the job's state
// to CREDS_ENCRYPTION_KEY_GENERATED.
try {
store.updateJob(jobId, updatedJob, JobAuthorization.State.CREDS_AVAILABLE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ensure here that the existingJob state is CREDS_AVAILABLE rather than pass it into the JobStore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use Validator

@@ -124,13 +124,13 @@ private void handleWorkerAssignmentFlow(HttpExchange exchange, UUID jobId) throw
Preconditions.checkArgument(
!Strings.isNullOrEmpty(importAuthCookieValue), "Import auth cookie required");

// We have the data, now update to 'pending worker assignment' so a worker may be assigned
// We have the data, now update to 'CREDS_AVAILABLE' so a worker may poll the job.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have the existing job here and can validate it was INITIAL before setting the new state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TODO since this is the LegacyPortabilityJob impl

* @throws IllegalStateException if state validation was requested ({@code previousState} was
* non-null) and the job's state was not the expected {@code previousState}
*/
void updateJob(UUID jobId, PortabilityJob job, JobAuthorization.State previousState)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm weary of adding the third param simply for validation since this class mostly just cares about storage. Since we usually have the existing job object before updateJob is called, can we validate before the updateJob method is called rather than having it leak into the JobStore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we should minimize business logic in here, but if we do the validation before, I think we'd still get the race condition:
Worker 1 polls job, validates state CREDS_AVAILABLE
Worker 2 polls job, validates state CREDS_AVAILABLE
Worker 1 updates with its public key
Worker 2 updates with its public key too
both workers start processing the same job

With the current approach, the race condition is handled b/c:
Worker 1 polls job
Worker 2 polls job
Worker 1 starts transaction
Worker 2 starts transaction - this will wait to look up job until previous transaction is closed
Worker 1 transaction: validate state CREDS_AVAILABLE, update with new state & public key - succeeds
Worker 2 transaction: validate state CREDS_AVAILABLE - fails, polls a new job

One way to avoid adding the new param, we could add a check for if you are setting public key, validate it was previously empty. That is still business logic inside this class but I think that's OK since it's a "Job Store" so can have some job-related logic.

The benefit to passing in the Authorization State is we can use that to validate other state changes like the comment you added about INITIAL -> CREDS_AVAILABLE. It would be slightly cleaner to pass a flag 'validateStateChange' that would validate the previous auth state. I don't know of any other race conditions now with state changes but would be good to prevent them in general.

The other approach I had a while ago was to expose the db transaction outside the jobstore and have the worker manage the transaction, but I think that would be messier.

WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good description of the issue. Ok, let's move forward with this.

It's up to you if you want to do this now or later, but maybe we can pass in an JobUpdateValidator interface that the client can provide to wrap this logic.

void updateJob(UUID jobId, PortabilityJob job, JobUpdateValidator validator)

Something like
inteface JobUpdateValidator {
/** Validates before update. */
void validate(PortabilityJob previous, PortabilityJob updated);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like that solution, I went with that

try {
store.updateJob(jobId, updatedJob, JobAuthorization.State.CREDS_AVAILABLE);
} catch (IllegalStateException e) {
throw new IOException("Could not 'claim' job " + jobId + ". It was probably already "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Contributor Author

@rtannenbaum rtannenbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed your comment about business logic in the job stores. Let me know what you think, thanks!

@@ -142,6 +118,11 @@ public void updateJob(UUID jobId, PortabilityJob job) throws IOException {
throw new IOException("Could not find record for jobId: " + jobId);
}

if (previousState != null) {
PortabilityJob previousJob = PortabilityJob.fromMap(getProperties(previousEntity));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, @jimmarino brought up the same thing in Slack while working on the Azure impl

* @throws IllegalStateException if state validation was requested ({@code previousState} was
* non-null) and the job's state was not the expected {@code previousState}
*/
void updateJob(UUID jobId, PortabilityJob job, JobAuthorization.State previousState)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed we should minimize business logic in here, but if we do the validation before, I think we'd still get the race condition:
Worker 1 polls job, validates state CREDS_AVAILABLE
Worker 2 polls job, validates state CREDS_AVAILABLE
Worker 1 updates with its public key
Worker 2 updates with its public key too
both workers start processing the same job

With the current approach, the race condition is handled b/c:
Worker 1 polls job
Worker 2 polls job
Worker 1 starts transaction
Worker 2 starts transaction - this will wait to look up job until previous transaction is closed
Worker 1 transaction: validate state CREDS_AVAILABLE, update with new state & public key - succeeds
Worker 2 transaction: validate state CREDS_AVAILABLE - fails, polls a new job

One way to avoid adding the new param, we could add a check for if you are setting public key, validate it was previously empty. That is still business logic inside this class but I think that's OK since it's a "Job Store" so can have some job-related logic.

The benefit to passing in the Authorization State is we can use that to validate other state changes like the comment you added about INITIAL -> CREDS_AVAILABLE. It would be slightly cleaner to pass a flag 'validateStateChange' that would validate the previous auth state. I don't know of any other race conditions now with state changes but would be good to prevent them in general.

The other approach I had a while ago was to expose the db transaction outside the jobstore and have the worker manage the transaction, but I think that would be messier.

WDYT?

@rtannenbaum rtannenbaum merged commit e6b1f15 into master Mar 8, 2018
@rtannenbaum rtannenbaum deleted the worker-authstate branch March 8, 2018 17:34
@jimmarino
Copy link
Collaborator

I'll take a look at how to handle this in Cosmos DB. It claims to be transactional so it shold be OK.

* Validation to do as part of an atomic update. Implementers should throw an
* {@code IllegalStateException} if the validation fails.
*/
void validate(PortabilityJob previous, PortabilityJob updated);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clients can see it in the contract of the method signature, let's explicitly declare throws IllegalStateException or a custom ValidationException

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalStateException is unchecked, and this is the same convention as Preconditions.checkState (i.e. that also documents @throws IllegalStateException but does not explicitly declare in method signature)

I think it's probably unrecoverable if validate fails, so should stick with an unchecked exception, depending on what we decide for #176. Can we revisit this with #176?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'm thinking more about defining explicitly how a client handles exceptions to the validation to avoid a mishmash of inconsistent exception handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants