Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changingEnsemble should be negated before calling unset success #1857

Merged
merged 1 commit into from
Dec 6, 2018

Conversation

ivankelly
Copy link
Contributor

If the first pending add op is completed, but does not have the
replaced bookie in its write set, callbacks are triggered straight
away.

Previously this would then hang forever, as the changingEnsemble would
be true. This patch sets changingEnsemble to false before calling
unsetSuccessAndSendWriteRequest so that if callbacks are triggered
straight away, they can actually complete. It also moves the call to
unsetSuccessAndSendWriteRequest outside of the metadataLock so that
the callbacks don't run inside the lock.

If the first pending add op is completed, but does not have the
replaced bookie in its write set, callbacks are triggered straight
away.

Previously this would then hang forever, as the changingEnsemble would
be true. This patch sets changingEnsemble to false before calling
unsetSuccessAndSendWriteRequest so that if callbacks are triggered
straight away, they can actually complete. It also moves the call to
unsetSuccessAndSendWriteRequest outside of the metadataLock so that
the callbacks don't run inside the lock.
@ivankelly
Copy link
Contributor Author

@athanatos actually flagged this in original review: #1646 (comment)

blockEnsembleChange.complete(null); // allow ensemble change to continue

log.info("e2 should complete");
e2.get(10, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can use 'get()'. The test will fail due to timeout if hanging endlessly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli I'm trying with raw get() now, not timing out even after 6 minutes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the default timeout is 30 minutes, which is too much :/

@sijie sijie merged commit 4c64b3c into apache:master Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants