Add segment pruning based on secondary partition dimension #2982

acslk · 2016-05-18T00:58:01Z

Currently the CachingClusterClient, which is the queryRunner for broker node, puts all segments within the query interval into segment descriptors which is then retrieved from cache or other servers. This can be optimized by filtering out data segments with singleDimensionShardSpec that does not match the query filter. For instance, dataSegment with dimension "id" and value from "a" to "b" does not need to be retrieved for query with selection filter "id"="person".

This PR addresses this by calculating the range of possible values of the given dimension for the query filter, and intersect the range of filter with range of data segments to determine whether or not to add the segment to descriptors for retrieval.

nishantmonu51 · 2016-05-18T09:04:51Z

processing/src/main/java/io/druid/query/Query.java

@@ -65,6 +66,8 @@

  boolean hasFilters();

+  DimFilter getFilter();


suggestion : change return type to Optional since not all queries have filters.

IMO we have a long and proud history of null return for various Query things, and this is not the right PR to change those to Optional.

Also I'm not totally sure Guava Optional actually makes the code significantly better, it's missing stuff like foreach / ifPresent, map, and flatMap that make Scala Options and Java 8 Optionals really usable.

fjy · 2016-06-22T21:26:35Z

@acslk there are merge conflicts

…ed client

fjy · 2016-06-23T21:34:14Z

👍

gianm · 2016-06-24T15:17:18Z

server/src/main/java/io/druid/client/CachingClusteredClient.java

@@ -218,14 +222,28 @@ public CachingClusteredClient(
    // Let tool chest filter out unneeded segments
    final List<TimelineObjectHolder<String, ServerSelector>> filteredServersLookup =
        toolChest.filterSegments(query, serversLookup);
+    Map<String, Optional<RangeSet<String>>> dimensionRangeMap = Maps.newHashMap();


Naming this dimensionRangeCache would be clearer

gianm · 2016-06-24T15:54:38Z

@acslk looking good so far! Just had some minor comments about naming, docs, and formatting.

gianm · 2016-06-24T21:51:08Z

👍 LGTM

fjy added this to the 0.9.2 milestone May 18, 2016

fjy added the Improvement label May 18, 2016

nishantmonu51 reviewed May 18, 2016
View reviewed changes

acslk added 6 commits June 23, 2016 10:59

add get dimension rangeset to filters

baef042

add get domain to ShardSpec and added chunk filter in caching cluster…

d6bbc5c

…ed client

add null check and modified not filter, started with unit test

15d2517

add filter test with caching

e0e1d9f

refactor and some comments

c06e631

extract filtershard to helper function

7fa41d6

acslk force-pushed the feature-prun branch from 2f5d6b8 to 7fa41d6 Compare June 23, 2016 21:05

fjy closed this Jun 23, 2016

fjy reopened this Jun 23, 2016

fixup

75b11ca

gianm reviewed Jun 24, 2016
View reviewed changes

acslk added 2 commits June 24, 2016 12:03

minor changes

09f85f2

update javadoc

20aa962

fjy merged commit 8a08398 into apache:master Jun 24, 2016

acslk deleted the feature-prun branch July 5, 2016 23:46

acslk mentioned this pull request Jul 22, 2016

Add numeric StringComparator #3270

Merged

gianm mentioned this pull request Jul 26, 2016

Add slf4j requst logger #3146

Merged

This was referenced May 2, 2020

Add segment pruning for hash based partitioning #9809

Closed

Add segment pruning for hash based shard spec #9810

Merged

snyk-bot mentioned this pull request Jan 13, 2022

[Snyk] Security upgrade axios from 0.19.0 to 0.20.0 Accedian/incubator-druid#738

Open

snyk-bot mentioned this pull request Feb 10, 2022

[Snyk] Security upgrade axios from 0.19.0 to 0.20.0 Accedian/incubator-druid#917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add segment pruning based on secondary partition dimension #2982

Add segment pruning based on secondary partition dimension #2982

acslk commented May 18, 2016

nishantmonu51 May 18, 2016

gianm May 18, 2016

fjy commented Jun 22, 2016

fjy commented Jun 23, 2016

gianm Jun 24, 2016

gianm commented Jun 24, 2016

gianm commented Jun 24, 2016

		@@ -65,6 +66,8 @@

		boolean hasFilters();

		DimFilter getFilter();

Add segment pruning based on secondary partition dimension #2982

Add segment pruning based on secondary partition dimension #2982

Conversation

acslk commented May 18, 2016

nishantmonu51 May 18, 2016

Choose a reason for hiding this comment

gianm May 18, 2016

Choose a reason for hiding this comment

fjy commented Jun 22, 2016

fjy commented Jun 23, 2016

gianm Jun 24, 2016

Choose a reason for hiding this comment

gianm commented Jun 24, 2016

gianm commented Jun 24, 2016