New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix three bugs with segment publishing. #6155
Conversation
1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me 👍
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. +1 after travis.
Please check this:
|
@@ -323,6 +323,8 @@ public SegmentPublishResult inTransaction( | |||
final TransactionStatus transactionStatus | |||
) throws Exception | |||
{ | |||
definitelyNotUpdated.set(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add a comment that this overwrites definitelyNotUpdated
on retrying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
log.info("Our segments really do exist, awaiting handoff."); | ||
} else { | ||
throw new ISE("Failed to publish segments[%s]", segmentsAndMetadata.getSegments()); | ||
throw new ISE("Failed to publish segments."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change for removing too large logs? I feel sometimes this log helps..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it's not necessary, since they will get logged in the catch statement via:
log.warn(e, "Failed publish, not removing segments: %s", segmentsAndMetadata.getSegments());
if (txnFailure.get()) { | ||
return new SegmentPublishResult(ImmutableSet.of(), false); | ||
if (definitelyNotUpdated.get()) { | ||
return SegmentPublishResult.fail(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding an exception to SegmentPublishResult
on failure, so that callers can figure out why it failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not necessary, since there is supposed to be only one reason: compare-and-swap failure with the metadata update.
This is happening because I referenced it in a javadoc. Apparently that's not good enough for the plugin. I removed the reference. |
@gianm please check the build failure.
|
@jihoonson thanks, I pushed again. |
👍 |
There are still some build failures.
|
OMG, sorry, I'll check more thoroughly before I push again. |
Hmm, now some unit tests are failing and looks legitimate.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gianm thanks for the quick fix!
* Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.
…che#6187) * [Backport] Fix three bugs with segment publishing. (apache#6155) * Fix KafkaIndexTask
…che#6187) * [Backport] Fix three bugs with segment publishing. (apache#6155) * Fix KafkaIndexTask
was already pushed. This is important because if we don't do this, it causes
the issue mentioned in KafkaIndexTask can delete published segments on restart #6124.
a "not published" result instead of throwing an exception, when there was one
metadata update failure, followed by some random exception. This is done by
resetting the AtomicBoolean that tracks what case we're in, each time the
callback runs.
publish result. Skip killing if we just got some exception. The reason for this
is that we want to avoid killing segments if they are in an unknown state.
Two other changes to clarify the contracts a bit and hopefully prevent future bugs:
more similar to announceHistoricalSegments.
must indicate that the publish definitely did not happen. Unknown states must be
exceptions. This helps BaseAppenderatorDriver do the right thing.