-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft: Regular functions for time series analysis #64240
base: master
Are you sure you want to change the base?
Conversation
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', '### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):' |
1 similar comment
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: More than one changelog category specified: 'New Feature', '### Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):' |
This is an automatic comment. The PR descriptions does not match the template. Please, edit it accordingly. The error is: Changelog category is empty |
This is an automated comment for commit 43ec161 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
@LordVoldebug A silly question to begin with ... exponential smoothing needs to "see" the entire time series to produce a forecast (let's leave aside the point that with increasing smoothing factor, historical values become less relevant). Similarly, window-based functions need to see the N last values. Since query processing in ClickHouse is based on chunks and the chunks are processed in random order (not consecutively), how is this problem addressed in this PR? |
This was one of the first questions i faced myself while working on this. Implemented functions are regular functions which accept arrays; not aggregate functions, and as far as i understood from the codebase and common sense, at the point where we are already given an array the order is fixed (if it is not, there are many functions that would not work...). And when we need to extract data from the table to an Array, we can use functions like groupArray. In groupArray there is such string in documentation: https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/grouparray
So it seems to be ok for most cases. Of course, those functions could have been implemented as aggregate functions, but i thought that the less functions with non strict guarantees, the better. |
Okay, that makes sense, thanks. There were similar considerations in earlier PRs that implement time series functions as regular functions, e.g. here and here. It is still not clear to me what is the best way to implement this. One could argue that "horizontalizing" the data into arrays is an unnecessary step. But that's something to discuss separately, it does not take away from your work (thanks for pushing the PR). |
I actually think, that implementing them as window (or aggregate) functions make sense. BTW
Applies for all aggregate functions, not only groupArray. Another story, that people wait for different modes of prediction be implemented for
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Implementation of several regular functions for time series analysis.
Forecasters:
Signature:
methodName(series, number_to_predict, params, [fill_gaps]) -> Array[values]
methodName(series, times, times_to_predict, params, [fill_gaps]) -> Array[values]
seriesHolt
seriesAdditiveDamped
seriesMultiplicativeDamped
seriesHoltWintersMultiplicative
seriesHoltWintersAdditive
seriesHoltWintersDamped
Stationarity test:
seriesKPSS(series) -> KPSS value
Smoothing functions:
Signature: f(Array) -> Array
seriesEMA
seriesKaufmansAMA
seriesWindowMax
seriesWindowMin
seriesWindowSum
seriesWindowStandardDeviation
Documentation entry for user-facing changes
Modify your CI run
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
Include tests (required builds will be added automatically):
Exclude tests:
Extra options:
Only specified batches in multi-batch jobs: