Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reordered segment allocation in kafka indexing service #5805

Merged
merged 3 commits into from
Jul 2, 2018

Conversation

jihoonson
Copy link
Contributor

@jihoonson jihoonson commented May 25, 2018

Fix #5761.

Major changes are:

  • Enforce to have at most a single active segment per sequence and per interval (see SegmentsOfInterval)
  • Fix IndexerSQLMetadataStorageCoordinator.allocatePendingSegment() to respect skipSegmentLineageCheck to avoid the unique constraint violation for sequence_name_prev_id_sha1. If skipSegmentLineageCheck is true, sequence_prev_id is always an empty string and sequence_name_prev_id_sha1 is created as below.
    final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
        Hashing.sha1()
               .newHasher()
               .putBytes(StringUtils.toUtf8(sequenceName))
               .putByte((byte) 0xff)
               .putLong(interval.getStartMillis())
               .putLong(interval.getEndMillis())
               .hash()
               .asBytes()
    );
  • skipSegmentLineageCheck can be still false for backward compatibility.

This change is Reviewable

void setAppendingSegment(SegmentWithState appendingSegment)
{
// There should be only one appending segment at any time
Preconditions.checkState(this.appendingSegment == null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include an error message here. (Probably a "WTF?!" message if it should never happen.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

this.appendingSegment = appendingSegment;
}

void addAppendFinishedSegment(SegmentWithState appendFinishedSegment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only supposed to be using during bootstrapping (startJob)? It doesn't seem like it would make sense otherwise. It could be clearer if this was made into a constructor instead: something that takes a list of initial segments. (Up to you though - this is just a suggestion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Fixed.

// UNIQUE key for the row, ensuring sequences do not fork in two directions.
// Using a single column instead of (sequence_name, sequence_prev_id) as some MySQL storage engines
// have difficulty with large unique keys (see https://github.com/druid-io/druid/issues/2319)
final String sequenceNamePrevIdSha1 = BaseEncoding.base16().encode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit too much code duplication here. Please share some more code between this method and the other similar one. I know it is slightly different, but it seems close enough that it could be shared. Perhaps take a string for the secondary key and have that either be the previousId (in one path) or the interval (in another path).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

// Avoiding ON DUPLICATE KEY since it's not portable.
// Avoiding try/catch since it may cause inadvertent transaction-splitting.

// UNIQUE key for the row, ensuring sequences do not fork in two directions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not accurate (its purpose is no longer "ensuring sequences do not fork in two directions"; it changed so now its purpose is to ensure we don't have more than one segment per sequence per interval).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

.asBytes()
);

handle.createStatement(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems shareable too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -103,7 +103,7 @@ public AppenderatorDriverAddResult add(
String sequenceName
) throws IOException
{
return append(row, sequenceName, null, false, true);
return append(row, sequenceName, null, true, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the BatchAppenderatorDriver skipping the lineage check now? I thought it could still make more than one segment per interval if it's running in non-incremental-publishing mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Thanks.

.filter(segmentWithState -> segmentWithState.getState() == SegmentState.APPENDING)
.map(SegmentWithState::getSegmentIdentifier)
.collect(Collectors.toList());
final Map<SegmentIdentifier, SegmentWithState> requestedSegmentIdsForSequences = getAppendingSegments(sequenceNames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for moving the creation of requestedSegmentIdsForSequences from after the push, to before the push? Is it fixing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it shouldn't fix anything, but is more reliable and understandable.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @jihoonson!

@gianm gianm merged commit b6c957b into apache:master Jul 2, 2018
@jihoonson jihoonson added this to the 0.12.2 milestone Jul 3, 2018
jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 3, 2018
…5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
jihoonson added a commit to jihoonson/druid that referenced this pull request Jul 5, 2018
…5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug
jihoonson added a commit to implydata/druid-public that referenced this pull request Jul 5, 2018
…5805)

    * Allow reordered segment allocation in kafka indexing service

    * address comments

    * fix a bug
fjy pushed a commit that referenced this pull request Jul 5, 2018
…ce (#5943)

* Allow reordered segment allocation in kafka indexing service (#5805)

* Allow reordered segment allocation in kafka indexing service

* address comments

* fix a bug

* commit remaining changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants