Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add segment pruning based on secondary partition dimension #2982

Merged
merged 9 commits into from Jun 24, 2016

Conversation

acslk
Copy link
Contributor

@acslk acslk commented May 18, 2016

Fixes #2940

Currently the CachingClusterClient, which is the queryRunner for broker node, puts all segments within the query interval into segment descriptors which is then retrieved from cache or other servers. This can be optimized by filtering out data segments with singleDimensionShardSpec that does not match the query filter. For instance, dataSegment with dimension "id" and value from "a" to "b" does not need to be retrieved for query with selection filter "id"="person".

This PR addresses this by calculating the range of possible values of the given dimension for the query filter, and intersect the range of filter with range of data segments to determine whether or not to add the segment to descriptors for retrieval.

@fjy fjy added this to the 0.9.2 milestone May 18, 2016
@@ -65,6 +66,8 @@

boolean hasFilters();

DimFilter getFilter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion : change return type to Optional since not all queries have filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we have a long and proud history of null return for various Query things, and this is not the right PR to change those to Optional.

Also I'm not totally sure Guava Optional actually makes the code significantly better, it's missing stuff like foreach / ifPresent, map, and flatMap that make Scala Options and Java 8 Optionals really usable.

@fjy
Copy link
Contributor

fjy commented Jun 22, 2016

@acslk there are merge conflicts

@fjy
Copy link
Contributor

fjy commented Jun 23, 2016

👍

@@ -218,14 +222,28 @@ public CachingClusteredClient(
// Let tool chest filter out unneeded segments
final List<TimelineObjectHolder<String, ServerSelector>> filteredServersLookup =
toolChest.filterSegments(query, serversLookup);
Map<String, Optional<RangeSet<String>>> dimensionRangeMap = Maps.newHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming this dimensionRangeCache would be clearer

@gianm
Copy link
Contributor

gianm commented Jun 24, 2016

@acslk looking good so far! Just had some minor comments about naming, docs, and formatting.

@gianm
Copy link
Contributor

gianm commented Jun 24, 2016

👍 LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants