New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHOENIX-4594: Perform binary search on guideposts during query compilation #347
base: master
Are you sure you want to change the base?
Conversation
@karanmehta93 Can you also review this? |
…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details: 1. The aglorithm continuously decode and load guide posts in batches (moving window). For each moving window, firstly compare the searching key with the last element to see whether the searching key is in the current window. If it isn't, perform binary search in the window; otherwise, move to the next window and repeat the above steps until it finds the start key of the first region in the scan ranges or its insertion position There are two reasons to use moving windows to load guide posts in batches. a. Firstly, we don't want to load all guide posts in the memory to increase memory footprint. b. Secondly, we don't want to decode and load the guide posts beyond the targed scan ranges to increase the system overhead. 2. Added config parameter STATS_GUIDEPOST_MOVING_WINDOW_SIZE to denote the size of moving window used for loading guid posts.
if (firstRegionStartKey.getLength() > 0 && this.gpsMovingWindowSize > 0) { | ||
// Continuously decode and load guide posts in batches (moving window). For each moving window, | ||
// firstly compare the searching key with the last element to see whether the searching key is | ||
// in the current window. If it isn't, perform binary search in the window; otherwise, move to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @BinShi-SecularBird ! Have you tested out the gpsMovingWindowSize with large values to see the memory footprint vs speed up performance?
assertEquals((Long) 6L, info.getEstimatedRows()); | ||
assertEquals((Long) 460L, info.getEstimatedBytes()); | ||
// TODO: the original code before this change will hit the following assertion. Need to investigate it. | ||
// assertTrue(info.getEstimateInfoTs() > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this assert fail now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the "TODO" in the comment said, this is a new test case I wanted to add in this change, but it failed. So I reverted all the changes except this new test case, but it still failed, which means there is problem in the current code base without any of my change. I commented out the assertion here, but I opened another JIRA PHOENIX-4914 to track the original problem in the code base.
"CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)" | ||
+ " GUIDE_POSTS_WIDTH=" + guidePostWidth | ||
+ ", USE_STATS_FOR_PARALLELIZATION=true" + " SPLIT ON (102, 105, 108)"; | ||
conn.createStatement().execute(ddl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor the common init code in initDataAndStats and reuse it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll hold this change as discussed in PHOENIX-4594.
PHOENIX-5592 MapReduce job to asynchronously delete rows where the VI…
…instead of linear search for the first guide post in the first region of the targeted scan ranges. In details: