Allow reordered segment allocation in kafka indexing service #5805

jihoonson · 2018-05-25T13:27:09Z

Major changes are:

Enforce to have at most a single active segment per sequence and per interval (see SegmentsOfInterval)
Fix IndexerSQLMetadataStorageCoordinator.allocatePendingSegment() to respect skipSegmentLineageCheck to avoid the unique constraint violation for sequence_name_prev_id_sha1. If skipSegmentLineageCheck is true, sequence_prev_id is always an empty string and sequence_name_prev_id_sha1 is created as below.

    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
        Hashing.sha1()
               .newHasher()
               .putBytes(StringUtils.toUtf8(sequenceName))
               .putByte((byte) 0xff)
               .putLong(interval.getStartMillis())
               .putLong(interval.getEndMillis())
               .hash()
               .asBytes()
    );

skipSegmentLineageCheck can be still false for backward compatibility.

This change is

gianm · 2018-05-29T15:55:11Z

server/src/main/java/io/druid/segment/realtime/appenderator/BaseAppenderatorDriver.java

+    void setAppendingSegment(SegmentWithState appendingSegment)
+    {
+      // There should be only one appending segment at any time
+      Preconditions.checkState(this.appendingSegment == null);


Please include an error message here. (Probably a "WTF?!" message if it should never happen.)

gianm · 2018-05-29T15:57:25Z

server/src/main/java/io/druid/segment/realtime/appenderator/BaseAppenderatorDriver.java

+      this.appendingSegment = appendingSegment;
+    }
+
+    void addAppendFinishedSegment(SegmentWithState appendFinishedSegment)


Is this only supposed to be using during bootstrapping (startJob)? It doesn't seem like it would make sense otherwise. It could be clearer if this was made into a constructor instead: something that takes a list of initial segments. (Up to you though - this is just a suggestion)

Sounds good. Fixed.

gianm · 2018-05-30T05:40:42Z

server/src/main/java/io/druid/metadata/IndexerSQLMetadataStorageCoordinator.java

+    // UNIQUE key for the row, ensuring sequences do not fork in two directions.
+    // Using a single column instead of (sequence_name, sequence_prev_id) as some MySQL storage engines
+    // have difficulty with large unique keys (see https://github.com/druid-io/druid/issues/2319)
+    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(


There's a bit too much code duplication here. Please share some more code between this method and the other similar one. I know it is slightly different, but it seems close enough that it could be shared. Perhaps take a string for the secondary key and have that either be the previousId (in one path) or the interval (in another path).

Refactored.

gianm · 2018-05-30T05:40:50Z

server/src/main/java/io/druid/metadata/IndexerSQLMetadataStorageCoordinator.java

+    // Avoiding ON DUPLICATE KEY since it's not portable.
+    // Avoiding try/catch since it may cause inadvertent transaction-splitting.
+
+    // UNIQUE key for the row, ensuring sequences do not fork in two directions.


This comment is not accurate (its purpose is no longer "ensuring sequences do not fork in two directions"; it changed so now its purpose is to ensure we don't have more than one segment per sequence per interval).

gianm · 2018-05-30T05:41:15Z

server/src/main/java/io/druid/metadata/IndexerSQLMetadataStorageCoordinator.java

+               .asBytes()
+    );
+
+    handle.createStatement(


This code seems shareable too.

gianm · 2018-05-30T05:46:05Z

server/src/main/java/io/druid/segment/realtime/appenderator/BatchAppenderatorDriver.java

@@ -103,7 +103,7 @@ public AppenderatorDriverAddResult add(
      String sequenceName
  ) throws IOException
  {
-    return append(row, sequenceName, null, false, true);
+    return append(row, sequenceName, null, true, true);


Why is the BatchAppenderatorDriver skipping the lineage check now? I thought it could still make more than one segment per interval if it's running in non-incremental-publishing mode.

My bad. Thanks.

gianm · 2018-05-30T05:47:37Z

server/src/main/java/io/druid/segment/realtime/appenderator/BatchAppenderatorDriver.java

-        .filter(segmentWithState -> segmentWithState.getState() == SegmentState.APPENDING)
-        .map(SegmentWithState::getSegmentIdentifier)
-        .collect(Collectors.toList());
+    final Map<SegmentIdentifier, SegmentWithState> requestedSegmentIdsForSequences = getAppendingSegments(sequenceNames)


What is the reason for moving the creation of requestedSegmentIdsForSequences from after the push, to before the push? Is it fixing something?

I think it shouldn't fix anything, but is more reliable and understandable.

gianm

LGTM, thanks @jihoonson!

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

…ce (#5943) * Allow reordered segment allocation in kafka indexing service (#5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug * commit remaining changes

Allow reordered segment allocation in kafka indexing service

b02daf0

jihoonson added Bug Area - Streaming Ingestion labels May 25, 2018

gianm reviewed May 30, 2018

View reviewed changes

jihoonson added 2 commits May 31, 2018 16:24

address comments

1dc24ef

fix a bug

e667d21

gianm approved these changes Jul 2, 2018

View reviewed changes

gianm merged commit b6c957b into apache:master Jul 2, 2018

jihoonson added this to the 0.12.2 milestone Jul 3, 2018

jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 3, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

cce1269

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson added a commit to jihoonson/druid that referenced this pull request Jul 5, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

504ca80

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson mentioned this pull request Jul 5, 2018

[Backport] Allow reordered segment allocation in kafka indexing service #5943

Merged

jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 5, 2018

Allow reordered segment allocation in kafka indexing service (apache#…

a03137c

…5805) * Allow reordered segment allocation in kafka indexing service * address comments * fix a bug

jihoonson mentioned this pull request Aug 6, 2018

Druid 0.12.2 release notes #6116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow reordered segment allocation in kafka indexing service #5805

Allow reordered segment allocation in kafka indexing service #5805

jihoonson commented May 25, 2018 •

edited

Loading

gianm May 29, 2018

jihoonson May 31, 2018

gianm May 29, 2018

jihoonson May 31, 2018

gianm May 30, 2018

jihoonson May 31, 2018

gianm May 30, 2018

jihoonson May 31, 2018

gianm May 30, 2018

jihoonson May 31, 2018

gianm May 30, 2018

jihoonson May 31, 2018

gianm May 30, 2018

jihoonson May 31, 2018

gianm left a comment

Allow reordered segment allocation in kafka indexing service #5805

Allow reordered segment allocation in kafka indexing service #5805

Conversation

jihoonson commented May 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm left a comment

Choose a reason for hiding this comment

jihoonson commented May 25, 2018 •

edited

Loading