-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reload by interval API #7490
Conversation
Implements the reload proposal of #7439 Added tests and updated docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dampcake for adding this API and fine refactoring in SQLMetadataSegmentManager
. I left some minor comments.
server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/server/http/DataSourcesResource.java
Outdated
Show resolved
Hide resolved
Use 404 with message when a segmentId is not found Fix typo in doc Return number of segments modified.
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
Restrict update call to only segments that need updating.
|
||
return true; | ||
@Override | ||
public int enableSegments(final String dataSource, final Interval interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think it'd be better to call this enableInterval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also suggest adding some comments or javadoc to these enable* methods that explain that they only enable non-overshadowed segments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added documentation to the Interface. Though it made more sense there as any additional implementations (if there ever are any) should behave the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addOverlappingEnabledSegmentsToVersionIntervalTimeline
uses a potentially old, cached view of the currently enabled segments that's updated by polling (default 1 minute period), while buildVersionedIntervalTimeline(dataSource, interval);
reads fresh from metadata. I think it would be better to do the enable based on a self-consistent view of the segments, so I'll suggest the following:
For the interval case:
- Create an empty timeline
- Query the metadata store for both the used status and payload of segments that overlap with the provided interval (SELECT used, payload...)
- Go through the used/segment pairs: If the segment is used, add the segment to the timeline. If the segment is not used, check that the segments interval is contained by the provided interval. If the unused segment is contained, add the segment to the timeline, and track this segment in a separate list.
- Go through the list of unused+contained segments in the list built in step 3, and enable the unused+contained segments for which isOvershadowed returns false.
For the segmentId case:
- Create an empty timeline
- Add the provided segments to the timeline
- Using JodaUtils.condenseIntervals(), build a condensed list of intervals from the intervals of the provided segment set
- In a single transaction, issue a used=true + interval overlaps query for each interval in the condensed intervals list constructed in 3 (could be a single query with an OR clause for each condensed interval), and add these used+overlapping segments to the timeline
- For each provided segment, enable it if isOvershadowed is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the segmentId case:
- Create an empty timeline
- Add the provided segments to the timeline
This requires a query to convert the segmentId Strings to actual segments. Should this
query be in the same transaction as step 4?
- Using JodaUtils.condenseIntervals(), build a condensed list of intervals from the intervals of the provided segment set
- In a single transaction, issue a used=true + interval overlaps query for each interval in the condensed intervals list constructed in 3 (could be a single query with an OR clause for each condensed interval), and add these used+overlapping segments to the timeline
- For each provided segment, enable it if isOvershadowed is false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can actually skip the first query when adding those segments to the timeline:
- The
SegmentId
class has methods for parsing the ID string, you can grab the intervals from the parsed objects - The timeline doesn't really need a full DataSegment object, the interval and version are what matter for determining overshadowing, and you can create a dummy partition chunk object, you can use
public void add(final Interval interval, VersionType version, PartitionChunk<ObjectType> object)
onVersionedIntervalTimeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't be able to check for UnknownSegmentIdException
case with that, but I think sacrificing that existence check to skip the query is worth it (the worst that would happen if you get a non-existent segment to enable is potentially wasted effort searching for overlapping segments, and an update call that will do nothing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the churn, let's go back to issuing a query to get the DataSegment objects for the segment ID case.
In the future, the overshadowing decision could require more information beyond just version/interval (e.g. #7547), and so this API would need to retrieve the full objects.
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/metadata/SQLMetadataSegmentManager.java
Outdated
Show resolved
Hide resolved
@@ -230,6 +230,18 @@ Enables all segments of datasource which are not overshadowed by others. | |||
|
|||
Enables a segment of a datasource. | |||
|
|||
* `/druid/coordinator/v1/datasources/{dataSourceName}/markUsed` | |||
|
|||
Enables segments of a datasource by the interval or by an array of segment id's passed in the request payload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the below two things:
- The segments would be enabled when this API is called, but then can be disabled by the coordinator if no loadRules match or any dropRule matches.
- If a kill task was running to kill same segments at the same time, the behavior is not defined. Some segments might be killed and others might be enabled. Please add a warning to not use the kill task and this API at the same time.
Can you add some tests to |
final DataSegment newSegment3 = new DataSegment( | ||
datasource, | ||
Intervals.of("2017-10-15T00:00:00.000/2017-10-16T00:00:00.000"), | ||
"2017-10-15T20:19:12.565Z", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match what the test is trying to check, the version on newSegment3 needs to be lower than the versions on 1 and 2.
All of the segments here have the same version and interval currently (same iD), so the publish calls are really only publishing one segment three times.
This set matches iterateAllSegments right now ImmutableSet.of(segment1, segment2, newSegment1, newSegment2)
because DataSegment is hashed by ID only and newSegment1 and newSegment2 are identical in that respect (so it's really a 3 item set).
Please adjust the version timestamps here and in other tests.
LGTM after CI |
Implements the reload proposal of #7439
Added tests and updated docs