New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add segment pruning based on secondary partition dimension #2982
Conversation
@@ -65,6 +66,8 @@ | |||
|
|||
boolean hasFilters(); | |||
|
|||
DimFilter getFilter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion : change return type to Optional since not all queries have filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we have a long and proud history of null
return for various Query things, and this is not the right PR to change those to Optional.
Also I'm not totally sure Guava Optional actually makes the code significantly better, it's missing stuff like foreach
/ ifPresent
, map
, and flatMap
that make Scala Options and Java 8 Optionals really usable.
@acslk there are merge conflicts |
👍 |
@@ -218,14 +222,28 @@ public CachingClusteredClient( | |||
// Let tool chest filter out unneeded segments | |||
final List<TimelineObjectHolder<String, ServerSelector>> filteredServersLookup = | |||
toolChest.filterSegments(query, serversLookup); | |||
Map<String, Optional<RangeSet<String>>> dimensionRangeMap = Maps.newHashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming this dimensionRangeCache
would be clearer
@acslk looking good so far! Just had some minor comments about naming, docs, and formatting. |
👍 LGTM |
Fixes #2940
Currently the CachingClusterClient, which is the queryRunner for broker node, puts all segments within the query interval into segment descriptors which is then retrieved from cache or other servers. This can be optimized by filtering out data segments with singleDimensionShardSpec that does not match the query filter. For instance, dataSegment with dimension "id" and value from "a" to "b" does not need to be retrieved for query with selection filter "id"="person".
This PR addresses this by calculating the range of possible values of the given dimension for the query filter, and intersect the range of filter with range of data segments to determine whether or not to add the segment to descriptors for retrieval.