Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reload by interval API #7490

Merged
merged 16 commits into from
Apr 26, 2019
Merged

Add reload by interval API #7490

merged 16 commits into from
Apr 26, 2019

Conversation

dampcake
Copy link
Contributor

Implements the reload proposal of #7439
Added tests and updated docs

Implements the reload proposal of #7439
Added tests and updated docs
Copy link

@surekhasaharan surekhasaharan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dampcake for adding this API and fine refactoring in SQLMetadataSegmentManager. I left some minor comments.

Use 404 with message when a segmentId is not found
Fix typo in doc
Return number of segments modified.
Restrict update call to only segments that need updating.

return true;
@Override
public int enableSegments(final String dataSource, final Interval interval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it'd be better to call this enableInterval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also suggest adding some comments or javadoc to these enable* methods that explain that they only enable non-overshadowed segments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation to the Interface. Though it made more sense there as any additional implementations (if there ever are any) should behave the same way.

Copy link
Contributor

@jon-wei jon-wei Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addOverlappingEnabledSegmentsToVersionIntervalTimeline uses a potentially old, cached view of the currently enabled segments that's updated by polling (default 1 minute period), while buildVersionedIntervalTimeline(dataSource, interval); reads fresh from metadata. I think it would be better to do the enable based on a self-consistent view of the segments, so I'll suggest the following:

For the interval case:

  1. Create an empty timeline
  2. Query the metadata store for both the used status and payload of segments that overlap with the provided interval (SELECT used, payload...)
  3. Go through the used/segment pairs: If the segment is used, add the segment to the timeline. If the segment is not used, check that the segments interval is contained by the provided interval. If the unused segment is contained, add the segment to the timeline, and track this segment in a separate list.
  4. Go through the list of unused+contained segments in the list built in step 3, and enable the unused+contained segments for which isOvershadowed returns false.

For the segmentId case:

  1. Create an empty timeline
  2. Add the provided segments to the timeline
  3. Using JodaUtils.condenseIntervals(), build a condensed list of intervals from the intervals of the provided segment set
  4. In a single transaction, issue a used=true + interval overlaps query for each interval in the condensed intervals list constructed in 3 (could be a single query with an OR clause for each condensed interval), and add these used+overlapping segments to the timeline
  5. For each provided segment, enable it if isOvershadowed is false

Copy link
Contributor Author

@dampcake dampcake Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the segmentId case:

  1. Create an empty timeline
  2. Add the provided segments to the timeline

This requires a query to convert the segmentId Strings to actual segments. Should this
query be in the same transaction as step 4?

  1. Using JodaUtils.condenseIntervals(), build a condensed list of intervals from the intervals of the provided segment set
  2. In a single transaction, issue a used=true + interval overlaps query for each interval in the condensed intervals list constructed in 3 (could be a single query with an OR clause for each condensed interval), and add these used+overlapping segments to the timeline
  3. For each provided segment, enable it if isOvershadowed is false

Copy link
Contributor

@jon-wei jon-wei Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can actually skip the first query when adding those segments to the timeline:

  • The SegmentId class has methods for parsing the ID string, you can grab the intervals from the parsed objects
  • The timeline doesn't really need a full DataSegment object, the interval and version are what matter for determining overshadowing, and you can create a dummy partition chunk object, you can use public void add(final Interval interval, VersionType version, PartitionChunk<ObjectType> object) on VersionedIntervalTimeline

Copy link
Contributor

@jon-wei jon-wei Apr 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't be able to check for UnknownSegmentIdException case with that, but I think sacrificing that existence check to skip the query is worth it (the worst that would happen if you get a non-existent segment to enable is potentially wasted effort searching for overlapping segments, and an update call that will do nothing)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the churn, let's go back to issuing a query to get the DataSegment objects for the segment ID case.

In the future, the overshadowing decision could require more information beyond just version/interval (e.g. #7547), and so this API would need to retrieve the full objects.

@@ -230,6 +230,18 @@ Enables all segments of datasource which are not overshadowed by others.

Enables a segment of a datasource.

* `/druid/coordinator/v1/datasources/{dataSourceName}/markUsed`

Enables segments of a datasource by the interval or by an array of segment id's passed in the request payload.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the below two things:

  • The segments would be enabled when this API is called, but then can be disabled by the coordinator if no loadRules match or any dropRule matches.
  • If a kill task was running to kill same segments at the same time, the behavior is not defined. Some segments might be killed and others might be enabled. Please add a warning to not use the kill task and this API at the same time.

@jon-wei
Copy link
Contributor

jon-wei commented Apr 24, 2019

Can you add some tests to SQLMetadataSegmentManagerTest for the new enable APIs?

final DataSegment newSegment3 = new DataSegment(
datasource,
Intervals.of("2017-10-15T00:00:00.000/2017-10-16T00:00:00.000"),
"2017-10-15T20:19:12.565Z",
Copy link
Contributor

@jon-wei jon-wei Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match what the test is trying to check, the version on newSegment3 needs to be lower than the versions on 1 and 2.

All of the segments here have the same version and interval currently (same iD), so the publish calls are really only publishing one segment three times.

This set matches iterateAllSegments right now ImmutableSet.of(segment1, segment2, newSegment1, newSegment2) because DataSegment is hashed by ID only and newSegment1 and newSegment2 are identical in that respect (so it's really a 3 item set).

Please adjust the version timestamps here and in other tests.

@jon-wei
Copy link
Contributor

jon-wei commented Apr 26, 2019

LGTM after CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants