-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregation function that respects the order of data #62777
Comments
This can be done in several ways.
SELECT arrayReduce('functionA', arraySort(groupArray(x))) FROM R_table;
WITH t AS MATERIALIZED (SELECT x FROM R_table ORDER BY x) SELECT functionA(x) FROM t; |
@Alex-Cheng which aggregate function? https://fiddle.clickhouse.com/88a7eba5-01f5-41d9-9577-4865ca007def Modern optimizer removes excessive order by if result of a query does not depend on it.
|
CH is not designed to do this, and you will face multiple problems and changes in the future because of this:
You can try the suggested workarounds, but they might stop working at any time. The only similar thing to what you are describing are window functions. |
Yes, the approaches you suggests were used for resolving the problem, e.g. Is my requirement too special? Maybe I should transform the aggregation function into a regular function that accepts Array as inputs, just as mentioned by @canhld94 . |
The approach suggested by @den-crane (thank you, it is very skillful workaround) is working and the solution requires the least workload. It could be a temporary solution for a while, and finally I need to think out of an alternative algorithm that does not depend on the order of input data. |
@Alex-Cheng check the code of |
With important exception of GROUP BY in external memory AFAIK |
The document says, "Creates an array of argument values. Values can be added to the array in any (indeterminate) order." And it also explains some cases where can still rely on the order of execution, but it requires "the subquery result is small enough."
|
Use case
There is a task that consume and aggregate a bunch of rows in order, i.e. given rows R = {r1,r2,...,r_n} and a stateful function(similar to aggretion) a(x), I need to invoke a(r1), a(r2), ..., a(r_n) and then populate a final aggregation result. The result is impacted by the order of invocations. For getting a stable result, I have to make sure the invocations of a(r1), a(r2), ..., a(r_n) are strictly in order.
Describe the solution you'd like
I may need a setting to force AggregationTransform working on an strictly ordered data. Actually I had achieved this by the SQL pattern like:
However, the solution does no longer work in the latest ClickHouse.
The text was updated successfully, but these errors were encountered: