-
Notifications
You must be signed in to change notification settings - Fork 1.4k
track group by evaluation statistics #8683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b4aa9b to
83cb5d5
Compare
|
|
||
| /** | ||
| * Sets the upper bound for the number of groups in a group by evaluation, along with the actual number of groups. | ||
| * The ratio of these two quantities indicates how efficient the group by was, and if truncation has occurred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to record something similar at the combine level as well ? There we have trimToSize and trimThreshold and may have spent time in resizing (map to PQ conversion and back) etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, can I add that as a follow up if the data tracked here turns out not be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me
Codecov Report
@@ Coverage Diff @@
## master #8683 +/- ##
=============================================
- Coverage 69.66% 26.94% -42.73%
+ Complexity 4575 1 -4574
=============================================
Files 1721 1710 -11
Lines 89824 89483 -341
Branches 13319 13282 -37
=============================================
- Hits 62577 24112 -38465
- Misses 22921 63105 +40184
+ Partials 4326 2266 -2060
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Tracks statistics about how efficient group by evaluation is as the estimate of the upper bound appears to be too high when there is a selective filter.