-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix replica task failures with metadata inconsistency while running concurrent append replace #16614
Conversation
server/src/main/java/org/apache/druid/segment/realtime/appenderator/BaseAppenderatorDriver.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few questions about the approach
...c/main/java/org/apache/druid/indexing/appenderator/ActionBasedPublishedSegmentRetriever.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/druid/indexing/appenderator/ActionBasedPublishedSegmentRetriever.java
Outdated
Show resolved
Hide resolved
@AmatyaAvadhanula , I have added a new task action to fetch segments by ID. I have also reverted all the refactoring changes so that the PR is easier to review. The refactoring changes can be made later. |
The overall approach looks good to me. However, could you please add benchmarks for |
Could you please also add details about any cluster testing that has been done with this patch? |
@AmatyaAvadhanula , I have added cluster testing details in the PR description, hope this suffices. Update: Added benchmarking details too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Have minor suggestions about naming
@@ -38,9 +38,8 @@ | |||
@JsonSubTypes.Type(name = "segmentTransactionalInsert", value = SegmentTransactionalInsertAction.class), | |||
@JsonSubTypes.Type(name = "segmentTransactionalAppend", value = SegmentTransactionalAppendAction.class), | |||
@JsonSubTypes.Type(name = "segmentTransactionalReplace", value = SegmentTransactionalReplaceAction.class), | |||
// Type name doesn't correspond to the name of the class for backward compatibility. | |||
@JsonSubTypes.Type(name = "segmentListById", value = RetrieveSegmentsByIdAction.class), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please rename type name to "retrieveSegmentsById" since there is no need to maintain backward compatibility for this action
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it segmentListById
to be consistent with segmentListUsed
and segmentListUnused
. If you feel strongly about this, I can include the rename in my follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are supposed to be consistent with the task action's.
The comment that was moved indicates that certain actions have a different key than the name because of backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The names are supposed to be consistent with the task action's.
Sure, this is preferable, but not a requirement.
In this case, it made more sense to me to adhere to the nomenclature that we are now going to support forever i.e. segmentListXXX
.
But I agree that it is better to stick to the convention used by all the other task actions rather than the 2 bad ones. Thanks for calling this out!
@@ -274,7 +275,7 @@ Stream<SegmentsOfInterval> getAllSegmentsOfInterval() | |||
{ | |||
this.appenderator = Preconditions.checkNotNull(appenderator, "appenderator"); | |||
this.segmentAllocator = Preconditions.checkNotNull(segmentAllocator, "segmentAllocator"); | |||
this.usedSegmentChecker = Preconditions.checkNotNull(usedSegmentChecker, "usedSegmentChecker"); | |||
this.usedSegmentChecker = Preconditions.checkNotNull(usedSegmentChecker, "segmentRetriever"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename this variable to segmentRetriever
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do this in a follow up PR that renames the other things too. Don't want to trigger CI for this.
We hit this issue. I guess we can expect this in the next release of Druid... |
Description
A streaming ingestion with multiple replicas may sometimes run into the following error
when publishing segments to an interval. This can happen only when there is a concurrent
REPLACE task has recently committed higher version segments to that same interval.
This error does not cause any data loss but is an operational overhead since it leads to unnecessary task failures.
The situation typically plays out as below:
I
S(v0,p1)
, i.e. segment partition 1 on version 0.v1
in interval I, say segmentsS(v1, p0)
,S(v1, p1)
v0
is now completely overshadowed by versionv1
S(v0, p1)
, which is upgraded by the overlord toS(v1, p2)
S(v0, p1)
but fails because higher offsets have already been committedv0
, the second replica is not able to find it in the set of "used and visible" segments and thus it fails.Fix
While looking for published segments, the second replica should search not only visible, but also overshadowed and unused segments.
Changes
RetrieveSegmentsByIdAction
Testing
Setup
skipOffsetFromLatest = PT0S
Observation
All the replicas finish successfully even if they are not able to publish segments due to a concurrent replace task.
Replica 1
kafka_super_1976-04-16T00:00:00.000Z_1976-04-16T01:00:00.000Z_2024-06-21T10:22:46.040Z_1
which overshadowed the published segmentkafka_super_1976-04-16T00:00:00.000Z_1976-04-16T01:00:00.000Z_1970-01-01T00:00:00.000Z_1
Replica 2
kafka_super_1976-04-16T00:00:00.000Z_1976-04-16T01:00:00.000Z_1970-01-01T00:00:00.000Z_1
.Benchmarking
Setup
segmentListById
to fetch 500 segments (250 used, 250 unused)task/action/run/time
Observation
Run times are 26ms, 16ms, 17ms.