-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow reordered segment allocation in kafka indexing service #5805
Conversation
void setAppendingSegment(SegmentWithState appendingSegment) | ||
{ | ||
// There should be only one appending segment at any time | ||
Preconditions.checkState(this.appendingSegment == null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include an error message here. (Probably a "WTF?!" message if it should never happen.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
this.appendingSegment = appendingSegment; | ||
} | ||
|
||
void addAppendFinishedSegment(SegmentWithState appendFinishedSegment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only supposed to be using during bootstrapping (startJob
)? It doesn't seem like it would make sense otherwise. It could be clearer if this was made into a constructor instead: something that takes a list of initial segments. (Up to you though - this is just a suggestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Fixed.
// UNIQUE key for the row, ensuring sequences do not fork in two directions. | ||
// Using a single column instead of (sequence_name, sequence_prev_id) as some MySQL storage engines | ||
// have difficulty with large unique keys (see https://github.com/druid-io/druid/issues/2319) | ||
final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit too much code duplication here. Please share some more code between this method and the other similar one. I know it is slightly different, but it seems close enough that it could be shared. Perhaps take a string for the secondary key and have that either be the previousId (in one path) or the interval (in another path).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored.
// Avoiding ON DUPLICATE KEY since it's not portable. | ||
// Avoiding try/catch since it may cause inadvertent transaction-splitting. | ||
|
||
// UNIQUE key for the row, ensuring sequences do not fork in two directions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not accurate (its purpose is no longer "ensuring sequences do not fork in two directions"; it changed so now its purpose is to ensure we don't have more than one segment per sequence per interval).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
.asBytes() | ||
); | ||
|
||
handle.createStatement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems shareable too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -103,7 +103,7 @@ public AppenderatorDriverAddResult add( | |||
String sequenceName | |||
) throws IOException | |||
{ | |||
return append(row, sequenceName, null, false, true); | |||
return append(row, sequenceName, null, true, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the BatchAppenderatorDriver skipping the lineage check now? I thought it could still make more than one segment per interval if it's running in non-incremental-publishing mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Thanks.
.filter(segmentWithState -> segmentWithState.getState() == SegmentState.APPENDING) | ||
.map(SegmentWithState::getSegmentIdentifier) | ||
.collect(Collectors.toList()); | ||
final Map<SegmentIdentifier, SegmentWithState> requestedSegmentIdsForSequences = getAppendingSegments(sequenceNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for moving the creation of requestedSegmentIdsForSequences
from after the push, to before the push? Is it fixing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it shouldn't fix anything, but is more reliable and understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @jihoonson!
…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug
…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug
…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug
Fix #5761.
Major changes are:
SegmentsOfInterval
)IndexerSQLMetadataStorageCoordinator.allocatePendingSegment()
to respectskipSegmentLineageCheck
to avoid the unique constraint violation forsequence_name_prev_id_sha1
. IfskipSegmentLineageCheck
is true,sequence_prev_id
is always an empty string andsequence_name_prev_id_sha1
is created as below.skipSegmentLineageCheck
can be still false for backward compatibility.This change is