-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43986][SQL] Create error classes for HyperLogLog function call failures #41486
Conversation
Hi @MaxGekk @RyanBerti can you please help review this PR to improve error messages and add more test coverage? 🙏 |
fetch from master
sql/core/src/test/resources/sql-tests/analyzer-results/hll.sql.out
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtenedor Thank you for this PR. Inline comments are mostly about naming and phrasing.
My high level comment is that we have one more category of errors and this is related to passing BINARY values to hll_union
, hll_union_agg
, and hll_sketch_estimate
that are not valid representations of sketches. At this point we rely on the DataSketches library to throw an exception for these cases, and we let that exception surface. My take on this is that we should add one more error class for these (along the lines of an illegal argument exception/error). What do you think?
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-tests/analyzer-results/hll.sql.out
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
Outdated
Show resolved
Hide resolved
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
c303f6f
to
af78726
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtenedor Thank you for the fixes.
Posting a couple of minor comments.
sql/core/src/test/resources/sql-tests/analyzer-results/hll.sql.out
Outdated
Show resolved
Hide resolved
@dtenedor What about the above? |
respond to code review comments respond to code review comments
Sorry for missing this before, I made this suggested update. It was a good idea, I think the errors are clearer now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mkaravel for the review! I implemented your suggestions, please take another look.
sql/core/src/test/resources/sql-tests/analyzer-results/hll.sql.out
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@dtenedor Thank you for putting the time to work on this and for addressing my comments!
I have a few nits regarding the function names we pass to some error classes (case needs to be lowercase and basically, the function names we pass need to match the pretty name of the function).
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datasketchesExpressions.scala
Outdated
Show resolved
Hide resolved
Hi @MaxGekk can we trouble you to help us merge this please when you have a moment :) |
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datasketchesExpressions.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @MaxGekk, thanks for your careful review. I responded to your comments, please look again :)
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datasketchesExpressions.scala
Outdated
Show resolved
Hide resolved
.../main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala
Show resolved
Hide resolved
fetch latest from master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failure below is not related to PR's changes:
systemctl restart cri-docker: Process exited with status 1
stdout:
stderr:
Job for cri-docker.service failed because the control process exited with error code.
See "systemctl status cri-docker.service" and "journalctl -xe" for details.
*
╭─────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ * If the above advice does not help, please let us know: │
│ https://github.com/kubernetes/minikube/issues/new/choose │
│ │
│ * Please run `minikube logs --file=logs.txt` and attach logs.txt to the GitHub issue. │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────╯
Error: Process completed with exit code 90.
Thanks @MaxGekk for your reviews and merging!! |
What changes were proposed in this pull request?
This PR creates error classes for HyperLogLog function call failures.
Why are the changes needed?
These replace previous Java exceptions or other cases, in order to improve the user experience and bring consistency with other parts of Spark.
Does this PR introduce any user-facing change?
Yes, error messages change slightly.
How was this patch tested?
This PR also adds SQL query test files for the HLL functions.