New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KafkaIndexTask can delete published segments on restart #6124
Comments
This doesn't seem right: there is code specifically to handle the case where a task tries to publish a segment that some task already published. It happens all the time with replicas, and they just ignore that segment and move on to the next one. I wonder if the real reason for publish failure is that the startMetadata doesn't match up. I bet it wouldn't match up: it sounds like the task is trying to publish from the point it originally started from rather than from the point it last published. It sounds like there are two separate problems here:
|
Yes, correct. The task doesn't fail at this point, but it just fails to update the metastore.
Here is the code what this error happens. final boolean published = publisher.publishSegments(
ImmutableSet.copyOf(segmentsAndMetadata.getSegments()),
metadata == null ? null : ((AppenderatorDriverMetadata) metadata).getCallerMetadata()
);
if (published) {
log.info("Published segments.");
} else {
log.info("Transaction failure while publishing segments, checking if someone else beat us to it.");
final Set<SegmentIdentifier> segmentsIdentifiers = segmentsAndMetadata
.getSegments()
.stream()
.map(SegmentIdentifier::fromDataSegment)
.collect(Collectors.toSet());
if (usedSegmentChecker.findUsedSegments(segmentsIdentifiers)
.equals(Sets.newHashSet(segmentsAndMetadata.getSegments()))) {
log.info(
"Removing our segments from deep storage because someone else already published them: %s",
segmentsAndMetadata.getSegments()
);
segmentsAndMetadata.getSegments().forEach(dataSegmentKiller::killQuietly);
log.info("Our segments really do exist, awaiting handoff.");
} else {
throw new ISE("Failed to publish segments[%s]", segmentsAndMetadata.getSegments());
}
}
I didn't see any logs related to metadata mismatch.
Not sure about this. This can make publishing segments non-atomic as well as potential garbage segments. Probably solving (3) would be enough because this should never happen?
Yes, I think we should keep the task states in local disk which represents what the task was doing. Also, for |
1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing.
* Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in #6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.
* Fix three bugs with segment publishing. 1. In AppenderatorImpl: always use a unique path if requested, even if the segment was already pushed. This is important because if we don't do this, it causes the issue mentioned in apache#6124. 2. In IndexerSQLMetadataStorageCoordinator: Fix a bug that could cause it to return a "not published" result instead of throwing an exception, when there was one metadata update failure, followed by some random exception. This is done by resetting the AtomicBoolean that tracks what case we're in, each time the callback runs. 3. In BaseAppenderatorDriver: Only kill segments if we get an affirmative false publish result. Skip killing if we just got some exception. The reason for this is that we want to avoid killing segments if they are in an unknown state. Two other changes to clarify the contracts a bit and hopefully prevent future bugs: 1. Return SegmentPublishResult from TransactionalSegmentPublisher, to make it more similar to announceHistoricalSegments. 2. Make it explicit, at multiple levels of javadocs, that a "false" publish result must indicate that the publish _definitely_ did not happen. Unknown states must be exceptions. This helps BaseAppenderatorDriver do the right thing. * Remove javadoc-only import. * Updates. * Fix test. * Fix tests.
I think this may be fixed by #6155, but would like to double-check that there are no lingering issues due to the fact that the sequences persist file and the appenderator persist file are separate files. |
To make it clear, #6155 fixes Kafka tasks to not delete published segments any more. In the same scenario, the restored Kafka tasks would fail and the supervisor would respawn the new tasks which will finish the job. |
Got it, I think we can close this then. |
@gianm I think it makes sense to close this issue since the title is about kafka tasks deleting pushed segments. However, the tasks would still fail in the same scenario which can make users confused. Do you think we need to file this in another issue? |
This can happen in the following scenario.
The text was updated successfully, but these errors were encountered: