-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved UDF/UDAF interfaces #2163
Comments
Would really appreciate increased functionality in UDAF's to perform basic operations such as mean, standard deviation etc. In comparison here is a sample UDAF of Spark to calculate the geometric mean. https://docs.databricks.com/spark/latest/spark-sql/udaf-scala.html It has two extra features which enable calculation of mean/standard_deviation for aggregates
|
Thanks for the feedback @ankitchiplunkar ! If you don't mind, could you also give your upvote to this feature request by giving a 👍 reaction to the first message in this thread? This allows us to automatically track number of upvotes per feature. In addition to what you said above, which specific UDAFs are you interested in? Beyond allowing users to provide their own (and making it easier to implement these, which is what this feature request covers), I am asking because some commonly requested UDAFs we can also support out of the box. |
@miguno we wanted to perform basic stats on a grouped data stream and std deviation is the first use case. We have just open-sourced a UDF which performs basic math operations and inturn enables to calculate the standard deviation. |
@ankitchiplunkar: Feel free to send us a PR to contribute any such UDFs to this repo, so they are included out of the box. |
@miguno for sure, would create a PR containing these UDF's |
for the sake of being explicit, we should call out here that UDAFs would also greatly benefit from some of the capabilities called out for UDFs in the issue description above.
|
one more that we somehow missed off the original list:
|
+1 |
1 similar comment
+1 |
🎉 I believe these are all addressed in the upcoming release of KSQL, and more! Thanks to a huge team effort with @vpapavas and @purplefox contributing some features around this as well. I'm going to close out this ticket as it's been mostly addressed, but feel free to open any specific more targeted feature requests/bug reports if you encounter them! |
Background
KSQL supports UDFs and UDAFs today. However, there are certain limitations that limit their usefulness in practice, especially for UDAFs. This ticket is about addressing the most commonly requested limitations.
UDF limitations
Related:
UDAF limitations
For UDAFs in particular are currently very limited, and we should consider prioritizing work on better UDAFs over UDFs.
The text was updated successfully, but these errors were encountered: