Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for approx_histogram_k, skew, and kurtosis #764

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions docs/source/authoring_features/GroupBy.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ Chronon will look for a `ts` column from the input source.
## Sketching Aggregations

Sketching algorithms are used to approximate the values of an exact aggregation when the aggregation itself is not
scalable. `unique_count` and `percentile` aggregations are examples where getting exact value requires storing all raw
values, and hence not-scalable. `approx_unique_count` and `approx_percentile` aggregations utilize a bounded amount of
scalable. `unique_count`, `percentile`, and `histogram` aggregations are examples where getting exact value requires storing all raw
values, and hence not-scalable. `approx_unique_count`, `approx_percentile`, and `approx_histogram_k` aggregations utilize a bounded amount of
memory to estimate the value of the exact aggregation. We allow users to tune this trade-off between memory and accuracy
as a parameter to the `Aggregation`. Chronon as a policy doesn't encourage use of un-scalable aggregations.
`unique_count` and `histogram` are supported but discouraged due to lack of `scalability`.
Expand Down Expand Up @@ -133,18 +133,20 @@ Limitations:

## Table of properties for aggregations

|aggregation | input type | nesting allowed? | output type | reversible | parameters | bounded memory |
|-----------------------|------------------|------------------|-------------------|------------|--------------------|----------------|
| count | all types | list, map | long | yes | | yes |
| min, max | primitive types | list, map | input | no | | yes |
| top_k, bottom_k | primitive types | list, map | list<input,> | no | k | yes |
| first, last | all types | NO | input | no | | yes |
| first_k, last_k | all types | NO | list<input,> | no | k | yes |
| average, variance | numeric types | list, map | double | yes | | yes |
| histogram | string | list, map | map<string, long> | yes | k=inf | no |
| approx_unique_count | primitive types | list, map | long | no | k=8 | yes |
| approx_percentile | primitive types | list, map | list<input,> | no | k=128, percentiles | yes |
| unique_count | primitive types | list, map | long | no | | no |
| aggregation | input type | nesting allowed? | output type | reversible | parameters | bounded memory |
|-----------------------------------|-----------------|------------------|-------------------|------------|--------------------|----------------|
| count | all types | list, map | long | yes | | yes |
| min, max | primitive types | list, map | input | no | | yes |
| top_k, bottom_k | primitive types | list, map | list<input,> | no | k | yes |
| first, last | all types | NO | input | no | | yes |
| first_k, last_k | all types | NO | list<input,> | no | k | yes |
| average | numeric types | list, map | double | yes | | yes |
| variance, skew, kurtosis | numeric types | list, map | double | no | | yes |
| histogram | string | list, map | map<string, long> | yes | k=inf | no |
| approx_histogram_k | primitive_types | list, map | map<string, long> | yes | k=inf | yes |
| approx_unique_count | primitive types | list, map | long | no | k=8 | yes |
| approx_percentile | primitive types | list, map | list<input,> | no | k=128, percentiles | yes |
| unique_count | primitive types | list, map | long | no | | no |


## Accuracy
Expand Down