-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support number of centroids in approx_percentile_cont #3146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| })? | ||
| .value(); | ||
| let max_size = match lit { | ||
| ScalarValue::UInt8(Some(q)) => *q as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now i think we can not input unsigned int from sql 🤔
So need these Int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should file a ticket to allow creating unsigned int types from SQL (like in CREATE TABLE) statements.
I ran into this limitation while I was fixing #3167 -- https://github.com/apache/arrow-datafusion/pull/3167/files#r946145318
Codecov Report
@@ Coverage Diff @@
## master #3146 +/- ##
==========================================
- Coverage 85.95% 85.85% -0.10%
==========================================
Files 291 291
Lines 52382 52844 +462
==========================================
+ Hits 45025 45370 +345
- Misses 7357 7474 +117
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
@alamb PTAL😊 |
|
We should update the documentation at https://github.com/apache/arrow-datafusion/blob/master/docs/source/user-guide/sql/aggregate_functions.md as part of this PR |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Ted-Jiang -- @domodwyer / @realno I wonder if you have any interest in reviewing this PR.
Thanks your remind @andygrove add in 7d67a2f |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @Ted-Jiang -- I think the code and tests look quite nice.
Sorry for the delay in review
| got | ||
| ))) | ||
| }; | ||
| let percentile = validate_input_percentile_expr(&expr[1])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| })? | ||
| .value(); | ||
| let max_size = match lit { | ||
| ScalarValue::UInt8(Some(q)) => *q as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should file a ticket to allow creating unsigned int types from SQL (like in CREATE TABLE) statements.
I ran into this limitation while I was fixing #3167 -- https://github.com/apache/arrow-datafusion/pull/3167/files#r946145318
| let sql = "SELECT c1, approx_percentile_cont(c3, 0.95, 200) AS c3_p95 FROM aggregate_test_100 GROUP BY 1 ORDER BY 1"; | ||
| let actual = execute_to_batches(&ctx, sql).await; | ||
| let expected = vec![ | ||
| "+----+--------+", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a statistics expert but I verified that this is different than the output generated by SELECT c1, approx_percentile_cont_with_weight(c3, c2, 0.95) AS c3_p95 FROM aggregate_test_100 GROUP BY 1 ORDER BY 1 a few lines above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Thanks for your review 😊
this sql should have with the same result as
SELECT c1, approx_percentile_cont(c3, 0.95) AS c3_p95 FROM aggregate_test_100 GROUP BY 1 ORDER BY 1
in line 867
|
Benchmark runs are scheduled for baseline = 2aa0a98 and contender = 929eb6d. 929eb6d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3142
Closes #3145
Now we support approx_percentile_cont(col_name, percentile value, number of centroids)
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?