-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve bitmap vector offset to report contiguous groups #11039
improve bitmap vector offset to report contiguous groups #11039
Conversation
@@ -85,6 +87,14 @@ public void advance() | |||
|
|||
currentVectorSize = to; | |||
} | |||
|
|||
if (currentVectorSize > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do this branch if currentVectorSize == 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could, and I considered it, but ultimately I decided that 1 didn't really "feel" like it met the threshold for me to consider it really contiguous. I also figured it didn't really make a lot of difference between which read method it would end up in because there is only 1 offset either way. But I don't have strong opinions on it though since either seems valid, do you think it logically should be considered contiguous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't really matter. I thought I might have been missing some reason why it wasn't ok to do it for the currentVectorSize == 1
case.
@@ -114,6 +124,9 @@ public int getCurrentVectorSize() | |||
@Override | |||
public int getStartOffset() | |||
{ | |||
if (isContiguous) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the interface, getOffsets
is supposed to throw an exception if isContiguous is true, so we should check it there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added, and tests
Description
This PR is a follow-up to #11004, which improves
BitmapVectorOffset
to report the current set of offsets as "contiguous" when they are, in order to take advantage of any optimizations available when processing contiguous vectorized gets.In #11004, I noticed the improvements to vectorized decoding of long values in my column scan benchmarks only seemed to impact a full scan. Looking closer, it was due to how the benchmark uses a
BitmapVectorOffset
to simulate filtering rows, which prior to this pr would never report the current set of offsets as contiguous, even if it was. Since #11004 primarily improved performance of contiguous vectorized gets, when using aBitmapVectorOffset
we were missing out on these improvements entirely.To help measure the improvements of allowing contiguous blocks to be processed with the contiguous get methods, I improved the benchmarks added in #11004 to now have several different filter "distributions" to simulate different filtering patterns, instead of previously only using a random distribution. These include:
BitmapVectorOffset
, bitmap is set randomly for desired number of rowsNoFilterVectorOffset
from starting offset 0 to desired number of rowsBitmapVectorOffset
, bitmap set from 0 to desired number of rowscontiguous-bitmap-start
BitmapVectorOffset
, position chosen randomly and up to 1000 contiguous values set until chosen number of rows is setThe results of this change is a pretty decent improvement to vectorized column read speed when there are contiguous vectors to read:
before:
after:
This PR has: