-
Notifications
You must be signed in to change notification settings - Fork 837
[docs] Add how-to-write-aggregate-functions #1459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the contribution! Please review the labels and make any necessary changes. |
|
Hello @sundy-li, 🎉 Thank you for opening the pull request! 🎉 |
|
awesome! |
|
|
||
| ## How to write aggregate functions | ||
|
|
||
| Datafuse allows you to write custom aggregate functions through rust codes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codes --> code?
| ## How to write aggregate functions | ||
|
|
||
| Datafuse allows you to write custom aggregate functions through rust codes. | ||
| It's not an easy way because you need to be rustacean first. Datafuse has a plan to support writing UDAF in other languages(like js, web assembly) in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*a rustacean
*UDAFs
|
|
||
| ## AggregateFunction trait introduction | ||
|
|
||
| All aggregate functions implement `AggregateFunction` trait, and we register them into a global static factory named `FactoryFuncRef`, the factory is just an index map and the keys are names of aggregate functions, noted that the function's name in datafuse are all case insensitive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted that a function's name in datafuse is case-insensitive
| - The function `return_type` indicates the return type of the function, it may vary with different arguments, such as `sum(int8)` -> `int64`, `sum(uint8)` -> `uint64`, `sum(float64)` -> `float64`. | ||
| - The function `nullable` indicates whether the `return_type` is nullable. | ||
|
|
||
| Before we start to introduce function `init_state`, let's ask a question: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the function
init_state
|
|
||
| > what's aggregate function state? | ||
|
|
||
| It indicates the temporary result of the aggregate function. Because aggregate function accumulates block by block and there will results from different query nodes after the accumulations. The state must be mergeable, serializable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*It indicates the temporary results of an aggregate function.
*Because an aggregate function accumulates data in a column block by block and there will be results after the aggregation. Therefore, the state must be ...
| } | ||
| ``` | ||
|
|
||
| - The function `init_state` wants us to initialize the aggregate function state, we ensure the memory is already allocated, we just need to initial the state with the initial value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wants us to initialize --> initializes
we just --> and we just
initial --> initialize
|
|
||
| - The function `init_state` wants us to initialize the aggregate function state, we ensure the memory is already allocated, we just need to initial the state with the initial value. | ||
| - The function `state_layout` indicates the memory layout of the state. | ||
| - The function `accumulate`, this function is used in aggregation with single batch, which means the whole block can be aggregated in a single state, no more other keys. A SQL query which applys aggregation without group-by columns will hit this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove ", this function"
*a single batch
*no other
*A SQL query, which applies aggregation without group-by columns, will hit this function.
PsiACE
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if this could be explained with a specific example.
|
|
||
| Noted that the argument `_arrays` is the function arguments, we can safely get the array by index without index bound check because we must validate the argument numbers and types in function constructor. | ||
|
|
||
| The `_input_rows` is the rows of current block, it may be useful when the `_arrays` is empty, eg: `count()` function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*the current block
*and it may
*e.g., the count() function
| The `_input_rows` is the rows of current block, it may be useful when the `_arrays` is empty, eg: `count()` function. | ||
|
|
||
|
|
||
| - The function `accumulate_keys`, similar to `accumulate` but we must take into consideration of the `keys` and `offset`, each key reprsents an unique memory address named `place`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function accumulate_keys is similar to accumulate, but we must take into consideration of the keys and offsets, for which each key represents a unique memory address named place.
|
|
||
| ## Refer to other examples | ||
| As you see, adding a new aggregate function in datafuse is not as hard as you think. | ||
| Before we start to add one, please refer to other aggregate function examples, such like `min`, `count`, `sum`, `avg`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we --> you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*such as
| Before we start to add one, please refer to other aggregate function examples, such like `min`, `count`, `sum`, `avg`. | ||
|
|
||
| ## Summary | ||
| Welcome all community users to contribute more powerful functions into datafuse. If you found any problems, feel free to open an issue in Github, we will try our best to help you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We welcome all community users to contribute more powerful functions to datafuse. If you find any problems, feel free to open an issue in Github, we will use our best efforts to help you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, sorry for my poor english :)
|
|
||
|
|
||
| - The function `accumulate_keys`, similar to `accumulate` but we must take into consideration of the `keys` and `offset`, each key reprsents an unique memory address named `place`. | ||
| - The function `serialize`, we used to serialize state into binary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*The function serialize serializes state into binary.
The same the two sentences below
Codecov Report
@@ Coverage Diff @@
## master #1459 +/- ##
======================================
Coverage 72% 72%
======================================
Files 495 495
Lines 29286 29349 +63
======================================
+ Hits 21361 21418 +57
- Misses 7925 7931 +6
Continue to review full report at Codecov.
|
BohuTANG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
CI Passed |
I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/
Summary
Summary about this PR
Changelog
Related Issues
Fixes #644
Test Plan
Unit Tests