-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster metric scans #7920
faster metric scans #7920
Conversation
Codecov Report
@@ Coverage Diff @@
## master #7920 +/- ##
============================================
- Coverage 71.37% 71.25% -0.12%
- Complexity 4193 4210 +17
============================================
Files 1595 1594 -1
Lines 82514 82613 +99
Branches 12304 12316 +12
============================================
- Hits 58895 58870 -25
- Misses 19643 19768 +125
+ Partials 3976 3975 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
a999be7
to
e71d582
Compare
...he/pinot/segment/local/segment/index/readers/forward/FixedByteChunkSVForwardIndexReader.java
Outdated
Show resolved
Hide resolved
...he/pinot/segment/local/segment/index/readers/forward/FixedByteChunkSVForwardIndexReader.java
Outdated
Show resolved
Hide resolved
d20f123
to
9368ab2
Compare
I added a benchmark to show where the difference comes from, but the benchmark also hints at the minimum cost of query vectorization: an array copy. This cost could be eliminated for reductions.
|
@@ -91,4 +264,8 @@ public double getDouble(int docId, ChunkReaderContext context) { | |||
return _rawData.getDouble(docId * Double.BYTES); | |||
} | |||
} | |||
|
|||
private boolean isContiguousRange(int[] docIds, int length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it'd be helpful to comment a bit when the range can be contiguous and when not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? That doesn't seem like a responsibility of this class, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I'm lack of some context to understand this new check. Not necessarily a comment to the code. Would appreciate it if you could shed some light in the conversion here. Just looking at this check method, I assume it'd require the docIds to be unique and sorted in asc. So when that happens and when not. This may help me understand when the optimization can kick in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. However, I think that kind of info would be better off in a readme about the how the query and storage layers interact. I can do that (because understanding this area of the code has been hard for me) but not in this PR. I don't think the comment belongs here, and would be superfluous after reading a decent readme.
@siddharthteotia to review before merging. |
Yes, I will make sure to go through this by EOD. Thank you for waiting |
pinot-core/src/main/java/org/apache/pinot/core/common/DataFetcher.java
Outdated
Show resolved
Hide resolved
a988aa7
to
b3a7772
Compare
This feels like a fairly low risk change for a decent improvement to me. Are there any blockers here? |
...he/pinot/segment/local/segment/index/readers/forward/FixedByteChunkSVForwardIndexReader.java
Outdated
Show resolved
Hide resolved
b3a7772
to
3a96ce8
Compare
This change is motivated by a profile where a user performed a summation over a raw column in a group by query, and a significant amount of time was spent in bounds checks:
This change adds methods
fillValues
to theForwardIndexReader
interface which can avoid bounds checks and perform vectorized copies when the range of docIds is contiguous. There is no way to avoid bounds checks with the current APIs otherwise as there is no way for the compiler to infer that the docIds array is monotonic.With
This speeds up
select platform, sum(clicks) from complexWebsite group by platform
on a 7.5M row segment by ~25%: 60ms down to 45ms.Summation is not the best case for the change because it requires conversion from
long
todouble
- accumulating the sum as along
would amplify the effect.