Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHOENIX-4594: Perform binary search on guideposts during query compilation #347

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

binshi-bing
Copy link
Contributor

…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details:

1. The aglorithm continuously decodes and loads guide posts in batches and perform binary search in each batch, until it finds the start key of the first region in the scan ranges or its insertion position.
   There are two reasons to use moving windows to load guide posts in batches.
   a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint.
   b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead.
2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.

@twdsilva
Copy link
Contributor

@karanmehta93 Can you also review this?

@binshi-bing binshi-bing changed the title In BaseResultIterators.getParallelScans(...), performa binary search … PHOENIX-4594: In BaseResultIterators.getParallelScans(...), performa binary search … Sep 17, 2018
@binshi-bing binshi-bing changed the title PHOENIX-4594: In BaseResultIterators.getParallelScans(...), performa binary search … PHOENIX-4594: Perform binary search on guideposts during query compilation Sep 17, 2018
…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details:

    1. The aglorithm continuously decode and load guide posts in batches (moving window). For each moving window, firstly compare the searching key with the last element to see whether the searching key is in the current window. If it isn't, perform binary search in the window; otherwise, move to the next window and repeat the above steps until it finds the start key of the first region in the scan ranges or its insertion position
       There are two reasons to use moving windows to load guide posts in batches.
       a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint.
       b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead.
    2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.
if (firstRegionStartKey.getLength() > 0 && this.gpsMovingWindowSize > 0) {
// Continuously decode and load guide posts in batches (moving window). For each moving window,
// firstly compare the searching key with the last element to see whether the searching key is
// in the current window. If it isn't, perform binary search in the window; otherwise, move to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If it is

Copy link
Contributor

@twdsilva twdsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @BinShi-SecularBird ! Have you tested out the gpsMovingWindowSize with large values to see the memory footprint vs speed up performance?

assertEquals((Long) 6L, info.getEstimatedRows());
assertEquals((Long) 460L, info.getEstimatedBytes());
// TODO: the original code before this change will hit the following assertion. Need to investigate it.
// assertTrue(info.getEstimateInfoTs() > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this assert fail now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the "TODO" in the comment said, this is a new test case I wanted to add in this change, but it failed. So I reverted all the changes except this new test case, but it still failed, which means there is problem in the current code base without any of my change. I commented out the assertion here, but I opened another JIRA PHOENIX-4914 to track the original problem in the code base.

"CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)"
+ " GUIDE_POSTS_WIDTH=" + guidePostWidth
+ ", USE_STATS_FOR_PARALLELIZATION=true" + " SPLIT ON (102, 105, 108)";
conn.createStatement().execute(ddl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor the common init code in initDataAndStats and reuse it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll hold this change as discussed in PHOENIX-4594.

jpisaac pushed a commit to jpisaac/phoenix that referenced this pull request Oct 1, 2020
PHOENIX-5592 MapReduce job to asynchronously delete rows where the VI…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants