Skip to content

Conversation

@sundy-li
Copy link
Member

I hereby agree to the terms of the CLA available at: https://datafuse.rs/policies/cla/

Summary

Summary about this PR

Changelog

  • Documentation

Related Issues

Fixes #644

Test Plan

Unit Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@databend-bot
Copy link
Member

Hello @sundy-li, 🎉 Thank you for opening the pull request! 🎉
Your pull request state is not in Draft, please add Reviewers or Re-request review again!
FuseQuery: @BohuTANG @sundy-li @zhang2014
FuseStore: @drmingdrmer @dantengsky
Or visit datafuse roadmap for some clues.

@ZhiHanZ
Copy link
Collaborator

ZhiHanZ commented Aug 13, 2021

awesome!


## How to write aggregate functions

Datafuse allows you to write custom aggregate functions through rust codes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codes --> code?

## How to write aggregate functions

Datafuse allows you to write custom aggregate functions through rust codes.
It's not an easy way because you need to be rustacean first. Datafuse has a plan to support writing UDAF in other languages(like js, web assembly) in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*a rustacean
*UDAFs


## AggregateFunction trait introduction

All aggregate functions implement `AggregateFunction` trait, and we register them into a global static factory named `FactoryFuncRef`, the factory is just an index map and the keys are names of aggregate functions, noted that the function's name in datafuse are all case insensitive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted that a function's name in datafuse is case-insensitive

- The function `return_type` indicates the return type of the function, it may vary with different arguments, such as `sum(int8)` -> `int64`, `sum(uint8)` -> `uint64`, `sum(float64)` -> `float64`.
- The function `nullable` indicates whether the `return_type` is nullable.

Before we start to introduce function `init_state`, let's ask a question:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the function init_state


> what's aggregate function state?

It indicates the temporary result of the aggregate function. Because aggregate function accumulates block by block and there will results from different query nodes after the accumulations. The state must be mergeable, serializable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*It indicates the temporary results of an aggregate function.
*Because an aggregate function accumulates data in a column block by block and there will be results after the aggregation. Therefore, the state must be ...

}
```

- The function `init_state` wants us to initialize the aggregate function state, we ensure the memory is already allocated, we just need to initial the state with the initial value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wants us to initialize --> initializes

we just --> and we just

initial --> initialize


- The function `init_state` wants us to initialize the aggregate function state, we ensure the memory is already allocated, we just need to initial the state with the initial value.
- The function `state_layout` indicates the memory layout of the state.
- The function `accumulate`, this function is used in aggregation with single batch, which means the whole block can be aggregated in a single state, no more other keys. A SQL query which applys aggregation without group-by columns will hit this function.
Copy link
Contributor

@jyizheng jyizheng Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove ", this function"

*a single batch
*no other

*A SQL query, which applies aggregation without group-by columns, will hit this function.

Copy link
Contributor

@PsiACE PsiACE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this could be explained with a specific example.


Noted that the argument `_arrays` is the function arguments, we can safely get the array by index without index bound check because we must validate the argument numbers and types in function constructor.

The `_input_rows` is the rows of current block, it may be useful when the `_arrays` is empty, eg: `count()` function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the current block
*and it may
*e.g., the count() function

The `_input_rows` is the rows of current block, it may be useful when the `_arrays` is empty, eg: `count()` function.


- The function `accumulate_keys`, similar to `accumulate` but we must take into consideration of the `keys` and `offset`, each key reprsents an unique memory address named `place`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function accumulate_keys is similar to accumulate, but we must take into consideration of the keys and offsets, for which each key represents a unique memory address named place.


## Refer to other examples
As you see, adding a new aggregate function in datafuse is not as hard as you think.
Before we start to add one, please refer to other aggregate function examples, such like `min`, `count`, `sum`, `avg`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we --> you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*such as

Before we start to add one, please refer to other aggregate function examples, such like `min`, `count`, `sum`, `avg`.

## Summary
Welcome all community users to contribute more powerful functions into datafuse. If you found any problems, feel free to open an issue in Github, we will try our best to help you.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We welcome all community users to contribute more powerful functions to datafuse. If you find any problems, feel free to open an issue in Github, we will use our best efforts to help you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry for my poor english :)



- The function `accumulate_keys`, similar to `accumulate` but we must take into consideration of the `keys` and `offset`, each key reprsents an unique memory address named `place`.
- The function `serialize`, we used to serialize state into binary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*The function serialize serializes state into binary.

The same the two sentences below

@codecov-commenter
Copy link

codecov-commenter commented Aug 13, 2021

Codecov Report

Merging #1459 (4b20627) into master (ae352ca) will increase coverage by 0%.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1459   +/-   ##
======================================
  Coverage      72%     72%           
======================================
  Files         495     495           
  Lines       29286   29349   +63     
======================================
+ Hits        21361   21418   +57     
- Misses       7925    7931    +6     
Impacted Files Coverage Δ
store/src/meta_service/raft_state_test.rs 97% <0%> (-2%) ⬇️
store/src/meta_service/raftmeta_test.rs 94% <0%> (-1%) ⬇️
store/src/fs/mod.rs 71% <0%> (ø)
store/src/meta_service/raft_state.rs 91% <0%> (+1%) ⬆️
store/src/meta_service/raft_state_kv.rs 65% <0%> (+1%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae352ca...4b20627. Read the comment docs.

Copy link
Member

@BohuTANG BohuTANG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@databend-bot databend-bot merged commit f3780e1 into databendlabs:master Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[refactor] Generic aggregate functions

7 participants