PHOENIX-4594: Perform binary search on guideposts during query compilation #347

binshi-bing · 2018-09-13T21:52:28Z

…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details:

1. The aglorithm continuously decodes and loads guide posts in batches and perform binary search in each batch, until it finds the start key of the first region in the scan ranges or its insertion position.
   There are two reasons to use moving windows to load guide posts in batches.
   a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint.
   b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead.
2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.

twdsilva · 2018-09-17T18:26:40Z

@karanmehta93 Can you also review this?

…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details: 1. The aglorithm continuously decode and load guide posts in batches (moving window). For each moving window, firstly compare the searching key with the last element to see whether the searching key is in the current window. If it isn't, perform binary search in the window; otherwise, move to the next window and repeat the above steps until it finds the start key of the first region in the scan ranges or its insertion position There are two reasons to use moving windows to load guide posts in batches. a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint. b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead. 2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.

karanmehta93 · 2018-09-18T21:09:10Z

phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java

+                    if (firstRegionStartKey.getLength() > 0 && this.gpsMovingWindowSize > 0) {
+                        // Continuously decode and load guide posts in batches (moving window). For each moving window,
+                        // firstly compare the searching key with the last element to see whether the searching key is
+                        // in the current window. If it isn't, perform binary search in the window; otherwise, move to


nit: If it is

twdsilva

Nice work @BinShi-SecularBird ! Have you tested out the gpsMovingWindowSize with large values to see the memory footprint vs speed up performance?

twdsilva · 2018-09-28T05:14:45Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java

+            assertEquals((Long) 6L, info.getEstimatedRows());
+            assertEquals((Long) 460L, info.getEstimatedBytes());
+            // TODO: the original code before this change will hit the following assertion. Need to investigate it.
+            // assertTrue(info.getEstimateInfoTs() > 0);


Why does this assert fail now?

As the "TODO" in the comment said, this is a new test case I wanted to add in this change, but it failed. So I reverted all the changes except this new test case, but it still failed, which means there is problem in the current code base without any of my change. I commented out the assertion here, but I opened another JIRA PHOENIX-4914 to track the original problem in the code base.

twdsilva · 2018-10-01T21:25:46Z

phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java

+                    "CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)"
+                            + " GUIDE_POSTS_WIDTH=" + guidePostWidth
+                            + ", USE_STATS_FOR_PARALLELIZATION=true" + " SPLIT ON (102, 105, 108)";
+            conn.createStatement().execute(ddl);


refactor the common init code in initDataAndStats and reuse it here.

I'll hold this change as discussed in PHOENIX-4594.

PHOENIX-5592 MapReduce job to asynchronously delete rows where the VI…

binshi-bing changed the title ~~In BaseResultIterators.getParallelScans(...), performa binary search …~~ PHOENIX-4594: In BaseResultIterators.getParallelScans(...), performa binary search … Sep 17, 2018

binshi-bing changed the title ~~PHOENIX-4594: In BaseResultIterators.getParallelScans(...), performa binary search …~~ PHOENIX-4594: Perform binary search on guideposts during query compilation Sep 17, 2018

karanmehta93 reviewed Sep 18, 2018

View reviewed changes

twdsilva reviewed Oct 1, 2018

View reviewed changes

jpisaac pushed a commit to jpisaac/phoenix that referenced this pull request Oct 1, 2020

Merge pull request apache#347 from xyan/dev/4.x-sfdc-1-phoenix-ttl

49e0a25

PHOENIX-5592 MapReduce job to asynchronously delete rows where the VI…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PHOENIX-4594: Perform binary search on guideposts during query compilation #347

PHOENIX-4594: Perform binary search on guideposts during query compilation #347

binshi-bing commented Sep 13, 2018

twdsilva commented Sep 17, 2018

karanmehta93 Sep 18, 2018

twdsilva left a comment

twdsilva Sep 28, 2018

binshi-bing Oct 5, 2018

twdsilva Oct 1, 2018

binshi-bing Oct 5, 2018

PHOENIX-4594: Perform binary search on guideposts during query compilation #347

Are you sure you want to change the base?

PHOENIX-4594: Perform binary search on guideposts during query compilation #347

Conversation

binshi-bing commented Sep 13, 2018

twdsilva commented Sep 17, 2018

karanmehta93 Sep 18, 2018

Choose a reason for hiding this comment

twdsilva left a comment

Choose a reason for hiding this comment

twdsilva Sep 28, 2018

Choose a reason for hiding this comment

binshi-bing Oct 5, 2018

Choose a reason for hiding this comment

twdsilva Oct 1, 2018

Choose a reason for hiding this comment

binshi-bing Oct 5, 2018

Choose a reason for hiding this comment