Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling functions, rolling aggregates, sliding window, moving average #2778

Open
16 of 28 tasks
jangorecki opened this issue Apr 21, 2018 · 42 comments
Open
16 of 28 tasks
Assignees
Labels
feature request froll top request One of our most-requested issues

Comments

@jangorecki
Copy link
Member

jangorecki commented Apr 21, 2018

To gather requirements in single place and refresh ~4 years old discussions creating this issue to cover rolling functions feature (also known as rolling aggregates, sliding window or moving average/moving aggregates).

rolling functions

features

@jangorecki

This comment was marked as resolved.

@MichaelChirico

This comment was marked as outdated.

@st-pasha

This comment was marked as outdated.

@MichaelChirico

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@st-pasha

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@jangorecki
Copy link
Member Author

jangorecki commented Apr 27, 2018

@mattdowle answering questions from PR

Why are we doing this inside data.table? Why are we integrating it instead of contributing to existing packages and using them from data.table?

  1. There were 3 different issues created asking for that functionality in data.table. Also multiple SO questions tagged data.table. Users expects that to be in scope of data.table.
  2. data.table fits perfectly for time-series data and rolling aggregates are pretty useful statistic there.

my guess is it comes down to syntax (features only possible or convenient if built into data.table; e.g. inside [...] and optimized) and building data.table internals into the rolling function at C level; e.g. froll* should be aware and use data.table indices and key. If so, more specifics on that are needed; e.g. a simple short example.

For me personally it is about speed and lack of chain of dependencies, nowadays not easy to achieve.
Key/indices could be useful for frollmin/frollmax, but it is unlikely that user will create index on measure variable. It is unlikely that user will make index on measure variable, also we haven't made this optimization for min/max yet. I don't see much sense for GForce optimization because allocated memory is not released after roll* call but returned as answer (as opposed to non-rolling mean, sum, etc.).

If there is no convincing argument for integrating, then we should contribute to the other packages instead.

I listed some above, if you are not convinced I recommend you to fill a question to data.table users, ask on twitter, etc. to check response. This feature was long time requested and by many users. If response won't convince you then you can close this issue.

jangorecki added a commit that referenced this issue May 19, 2018
jangorecki added a commit that referenced this issue May 29, 2018
@harryprince

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@MichaelChirico

This comment was marked as outdated.

@st-pasha

This comment was marked as outdated.

@jangorecki

This comment was marked as off-topic.

@jangorecki
Copy link
Member Author

jangorecki commented Aug 22, 2019

An example of how vectorized x/n arguments can impact performance.
AdrianAntico/AutoQuant@d837071#r34769837
less loops, code easier to read, much faster. Code using frollmean in a loop vs passing lists/vectors to frollmean, result 10x-36x speedup.

@jangorecki
Copy link
Member Author

frollapply ready: #3600

    ### fun             mean     sum  median
    # rollfun          8.815   5.151  60.175
    # zoo::rollapply  34.373  27.837  88.552
    # zoo::roll[fun]   0.215   0.185      NA
    # frollapply       5.404   1.419  56.475
    # froll[fun]       0.003   0.002      NA

@jerryfuyu0104

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@waynelapierre

This comment was marked as duplicate.

@eliocamp

This comment was marked as resolved.

@MichaelChirico

This comment was marked as resolved.

@jangorecki

This comment was marked as resolved.

@eliocamp

This comment was marked as resolved.

@eliocamp

This comment was marked as resolved.

@jangorecki

This comment was marked as resolved.

@eliocamp

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@eliocamp

This comment was marked as outdated.

@ywhcuhk

This comment was marked as outdated.

@jangorecki

This comment was marked as outdated.

@ywhcuhk

This comment was marked as outdated.

@AdrianAntico

This comment was marked as resolved.

@jangorecki

This comment was marked as outdated.

@jangorecki
Copy link
Member Author

jangorecki commented Aug 30, 2023

rollcor
rollcov
rollrank
rollunqn
rolllm

went out of scope as of current moment. All can work using frollapply (not master branch but PRs), just not super fast. We could consider adding them to scope in future. For the current moment the following set of sum mean prod min max sd var median feels fine and complete to me.

@MichaelChirico MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request froll top request One of our most-requested issues
Projects
None yet
Development

No branches or pull requests