Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rolling functions, rolling aggregates, sliding window, moving average #2778

Open
16 of 27 tasks
jangorecki opened this issue Apr 21, 2018 · 40 comments
Open
16 of 27 tasks

rolling functions, rolling aggregates, sliding window, moving average #2778

jangorecki opened this issue Apr 21, 2018 · 40 comments
Assignees

Comments

@jangorecki
Copy link
Member

@jangorecki jangorecki commented Apr 21, 2018

To gather requirements in single place and refresh ~4 years old discussions creating this issue to cover rolling functions feature (also known as rolling aggregates, sliding window or moving average/moving aggregates).

rolling functions

  • rollmean
  • rollsum
  • rollmin
  • rollmax
  • rollmedian
  • rollprod
  • rollsd
  • rollvar
  • rollrank
  • rollapply (user provided FUN)
  • rollregression (highly demanded)
  • rollcor?
  • rollcov?

features

  • multiple columns at once
  • multiple windows at once
  • multiple columns and multiple windows at once
  • atomic vectors input and single window returns atomic vectors
  • various length list of vectors
  • align: left/center/right
  • handling NAs
  • fill constant
  • long vector support
  • partial window support (if needed can be found in ea766f2)
  • adaptive window support
  • use openmp to parallelize calculation of multiple columns/windows
  • rounding error correction
  • timing in verbose mode from parallel region (blocked by #3422, #3423)
@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Apr 21, 2018

Proposed rollmean implementation, simplified.

x = data.table(v1=1:5, v2=1:5)
k = c(2, 3)
i - single column
j - single window
m - int referring to single row
w - current row's sum of rolling window
r - answer for each i, j
for i in x
  for j in k
  r = NA_real_
  w = 0
    for m in 1:length(i)
      w = w + i[m]
      w = w - i[m-j]
      r[m] = w / j

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Apr 21, 2018

@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Apr 21, 2018

I always envisioned rolling window functionality as grouping the dataset into multiple overlapping groups (windows). Then the API would look something like this:

DT[i, j,
   by = roll(width=5, align="center")]

Then if j contains, say, mean(A), we can internally replace it with rollmean(A) -- exactly like we are doing with gmean() right now. Or j can contain an arbitrarily complicated functionality (say, run a regression for each window), in which case we'd supply .SD data.table to it -- exactly like we do with groups right now.

This way there's no need to introduce 10+ new functions, just one. And it feels data.table-y in spirit too.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Apr 21, 2018

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Apr 21, 2018

@st-pasha interesting idea, looks like data.table-y spirit, but it will impose many limitations, and isn't really appropriate for this category of functions.


  • how to perform rollmean by group
DT[, rollmean(V1, 3), by=V2]
  • how to calculate different window sizes for different columns
DT[, .(rollmean(V1, 3), rollmean(V2, 100))]
  • how to calculate rollmean outside of [.data.table as we now allow for shift
rollmean(rnorm(10), 3)
  • how to support queries like
DT[, .(rollmean(list(V1, V2), c(5, 20)), rollmean(list(V2, V3), c(10, 30)))]
  • how to call mean and rollmean in same j call
DT[, .(rollmean(V1, 3), mean(V1)), by=V2]

Usually when using by we aggregate data to smaller number of rows, while rolling functions always returns vector of same length as input. This types of functions in SQL have API in the following style:

SELECT AVG(value) OVER (ROWS BETWEEN 99 PRECEDING AND CURRENT ROW)
FROM tablename;

You can still combine it with GROUP BY as follows:

SELECT AVG(value) OVER (ROWS BETWEEN 99 PRECEDING AND CURRENT ROW)
FROM tablename
GROUP BY group_columns;

So in SQL those functions stays in SELECT which refers to j in DT.
In DT we could achieve the same with:

DT[, rollmean(value, 100)]
DT[, rollmean(value, 100), group_columns]

Rolling functions fits into same category of functions as shift which also returns same number of rows as it got on input.
Shift in SQL looks like:

SELECT LAG(value, 1) OVER ()
FROM tablename;

mean and rollmean are not just different functions, they are different categories of functions. One meant to aggregate according to group, another not aggregate at all. This is easily visible in SQL where we don't use GROUP BY for rolling functions type but we do need to use GROUP BY for aggregates like mean (eventually getting grant total when grouping clause is not present).
I don't see strong reasoning to try to apply same optimizations rules as we do for mean, especially when it doesn't really fit to use case, and all that just for the sake of data.table-y spirit. Current proposal is data.table-y spirit too, it can easily combined with :=, same as shift. It just adds set of new functions, currently not available in data.table.

@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Apr 22, 2018

@jangorecki Thanks, these are all valid considerations. Of course different people have different experiences, and different views as to what should be considered "natural".

It is possible to perform rollmean by group: this is just a 2-level grouping: DT[, mean(V1), by=.(V2, roll(3))]. However I don't see how to make different window sizes on different columns with my syntax...

I must admit I never seen SQL syntax for rolling joins before. It's interesting that they use standard aggregator such as AVG yet apply the windowing specification to it. Looking at the Transact-SQL documentation there are some interesting ideas there, for example the distinction between logical/physical row selection. They do allow different "OVER" operators on different columns, however in all examples they give, it is the same OVER clause repeated multiple times. So it suggests that this use-case is much more common, and hence using a single roll() group would result in less repetition.

Also, this SO question provides an interesting insight why the OVER syntax was introduced in SQL at all:

You can use GROUP BY SalesOrderID. The difference is, with GROUP BY you can only have the aggregated values for the columns that are not included in GROUP BY. In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values. That is, although you are not doing that in your example query, you could retrieve both individual OrderQty values and their sums, counts, averages etc. over groups of same SalesOrderIDs.

So it appears that the syntax is designed to circumvent the limitation of standard SQL where group-by results could not be combined with unaggregated values (i.e. selecting both A and mean(A) in the same expression). However data.table does not have such a limitation, so it has more freedom in its choice of syntax.


Now, if we want to really get ahead of the curve, we need to think in a broader perspective: what are the "rolling" functions, what are they used for, how they can be extended, etc. Here's my take this, coming from a statistician's point-of-view:

"Rolling mean" function is used to smooth some noisy input. Say, if you have observations over time and you want to have some notion of "average quantity", which would nevertheless vary over time although very slowly. In this case "rolling mean over last 100 observations" or "rolling mean over all previous observations" can be considered. Similarly, if you observe certain quantity over a range of inputs, you may smooth it out by applying "rolling mean over ±50 observations".

  • So, the first extension is to look at "smooth windows": imagine a mean over past observations where the further an observation in the past, the less its contribution is. Or an average of nearby observations over a Gaussian kernel.
  • Second are adaptive windows. For example, if you have a noisy input defined over an interval [0, 1], then smoothing it using a ±N window produces a biased result near the edges. An unbiased estimator would adapt the shape of the window based on the distance from the edge.
  • Resample smoothing: the restriction to produce exactly as many observations as in the source data is too limiting. If you think of your data as a noisy observations of some underlying function, then it is perfectly reasonable to ask to compute the smoothed values of that function on a mesh that is coarser / finer than the original input. Or perhaps the original data is spaced irregularly and you want to resample it onto a regular grid.
  • Jackknife: for each observation you want to compute average/regression over all observations except the current.
  • K-fold split: view data as multiple groups where each group excludes only a small portion of the observations.

All of these can be implemented as extended grouping operators, with rolling windows being just one of the elements on this list. That being said, I don't why we can't have it both ways.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Apr 23, 2018

I must admit I never seen SQL syntax for rolling joins before.

I assume you mean rolling functions, issue has nothing to do with rolling joins.

They do allow different "OVER" operators on different columns, however in all examples they give, it is the same OVER clause repeated multiple times. So it suggests that this use-case is much more common, and hence using a single roll() group would result in less repetition.

It is just a matter of use case, if you are calling same OVER() many time you may find it more performant to use GROUP BY, build lookup table and re-use in other queries. Whatever examples are there, repeating OVER() is required to retain the feature of locality for each measure provided. My uses cases from Data Warehouses where not as simple as those from Microsoft docs.

In contrast, using windowed aggregate functions instead of GROUP BY, you can retrieve both aggregated and non-aggregated values.

In data.table we do := and by in one query to achieve it.

So it appears that the syntax is designed to circumvent the limitation of standard SQL where group-by results could not be combined with unaggregated values (i.e. selecting both A and mean(A) in the same expression). However data.table does not have such a limitation, so it has more freedom in its choice of syntax.

It isn't much limitation of SQL but just design of GROUP BY, that it will aggregate, the same way that our by aggregates. New API was required to cover new window functionalities. Grouping for SQL window function can be provided for each function call using FUN() OVER (PARTITION BY ...) where partition by is like local grouping for single measure. So to achieve flexibility of SQL we would need to use j = mean(V1, roll=5) or j = over(mean(V1), roll=5) keeping that API in j. Still this approach will not allow to support all use cases mentioned above.

you may smooth it out by applying "rolling mean over ±50 observations".

This is what align argument is used for.

So, the first extension is to look at "smooth windows": imagine a mean over past observations where the further an observation in the past, the less its contribution is. Or an average of nearby observations over a Gaussian kernel.

There are many variants (virtually unlimited number) of moving averages, the most common smoothing window function (other than rollmean/SMA) is exponential moving average (EMA). Which should be included, and which not, is not trivial to decide, and actually best to make that decision according to feature requests that will come from users, so far none like this was requested.

All of these can be implemented as extended grouping operators, with rolling windows being just one of the elements on this list.

Surely they can, but if you will look at SO, and issues created in our repo, you will see that those few rolling functions here are responsible for 95+% of requests from users. I am happy to work on EMA and other MAs (although I am not sure if data.table is best place for those), but as a separate issue. Some users, me included, are waiting for just simple moving average in data.table for 4 years already.

Here's my take this, coming from a statistician's point-of-view

My point-of-view comes from Data Warehousing (where I used window function, at least once a week) and price trend analysis (where I used tens of different moving averages).

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Apr 24, 2018

rollmean draft is pushed to roll branch. I found most of other packages that implements rolling mean are not able to dealt well with na.rm=FALSE and NAs present in input. This implementation handles NA consistently to mean, which impose some extra overhead because of ISNAN calls. We could allow API to faster but less safe version if user is sure there are no NAs in input.
PR is in #2795

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Apr 27, 2018

@mattdowle answering questions from PR

Why are we doing this inside data.table? Why are we integrating it instead of contributing to existing packages and using them from data.table?

  1. There were 3 different issues created asking for that functionality in data.table. Also multiple SO questions tagged data.table. Users expects that to be in scope of data.table.
  2. data.table fits perfectly for time-series data and rolling aggregates are pretty useful statistic there.

my guess is it comes down to syntax (features only possible or convenient if built into data.table; e.g. inside [...] and optimized) and building data.table internals into the rolling function at C level; e.g. froll* should be aware and use data.table indices and key. If so, more specifics on that are needed; e.g. a simple short example.

For me personally it is about speed and lack of chain of dependencies, nowadays not easy to achieve.
Key/indices could be useful for frollmin/frollmax, but it is unlikely that user will create index on measure variable. It is unlikely that user will make index on measure variable, also we haven't made this optimization for min/max yet. I don't see much sense for GForce optimization because allocated memory is not released after roll* call but returned as answer (as opposed to non-rolling mean, sum, etc.).

If there is no convincing argument for integrating, then we should contribute to the other packages instead.

I listed some above, if you are not convinced I recommend you to fill a question to data.table users, ask on twitter, etc. to check response. This feature was long time requested and by many users. If response won't convince you then you can close this issue.

jangorecki added a commit that referenced this issue May 19, 2018
jangorecki added a commit that referenced this issue May 29, 2018
@harryprince
Copy link

@harryprince harryprince commented Aug 14, 2018

I found sparklyr can support rolling functions very well in a very large scale dataset.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Aug 15, 2018

@harryprince could put a little bit more light by providing example code of how you do it in sparklyr?
According to "Window functions" dplyr vignette

Rolling aggregates operate in a fixed width window. You won’t find them in base R or in dplyr, but there are many implementations in other packages, such as RcppRoll.

AFAIU you use custom spark API via sparklyr for which dplyr interface is not implemented, correct?

This issue is about rolling aggregates, other "types" of window functions are already in data.table for a long time.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 15, 2018

Providing some example so we can compare (in-memory) performance vs sparklyr/SparkR would also be helpful.

@st-pasha
Copy link
Contributor

@st-pasha st-pasha commented Aug 15, 2018

It just occurred to me that this question:

how to calculate different window sizes for different columns?

has in fact a broader scope, and does not apply to rolling functions only.

For example, it seems to be perfectly reasonable to ask how to select the average product price by date, and then by week, and then maybe by week+category -- all within the same query. If we ever to implement such functionality, the natural syntax for it could be

DT[, .( mean(price, by=date), 
        mean(price, by=week), 
        mean(price, by=c(week, category)) )]

Now, if this functionality was already implemented, then it would have been a simple leap from there to rolling means:

DT[, .( mean(price, roll=5), 
        mean(price, roll=20), 
        mean(price, roll=100) )]

Not saying that this is unequivocally better than rollmean(price, 5) -- just throwing in some alternatives to consider...

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Aug 15, 2018

@st-pasha

how to select the average product price by date, and then by week, and then maybe by week+category -- all within the same query.

AFAIU this is already possible using ?groupingsets, but not hooked into [.data.table yet.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jul 1, 2019

@randomgambit I would say it is out of scope, unless there will be high demand for that. It wouldn't be very difficult to do it to be faster than base R/zoo just by handling nested loop in C. But we should try to implement it using "online" algorithm, to avoid nested loop. This is more tricky, and we could eventually do it for any statistic, so we have to cut off those statistics at some point.

@randomgambit
Copy link

@randomgambit randomgambit commented Jul 1, 2019

@jangorecki interesting thanks. That means I will keep using tsibble to embed... DATA.TABLES in a tibble! mind blown :D

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 7, 2019

Tried to use frollmean to calculate a nonparametric "logistic curve" which shows P[y | x] for binary y using nearest neighbors of x. Was surprised y stored as logical was not cast automatically to integer:

DT = data.table(x = rnorm(1000), y = runif(1000) > .5)
DT[order(x), .(x, p_y = frollmean(y, 50L))]

Error in froll(fun = "mean", x = x, n = n, fill = fill, algo = algo, align = align, :
x must be of type numeric

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Aug 22, 2019

An example of how vectorized x/n arguments can impact performance.
AdrianAntico/RemixAutoML@d837071#r34784427
less loops, code easier to read, much faster (10x-36x speedup).

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Sep 1, 2019

frollapply ready: #3600

    ### fun             mean     sum  median
    # rollfun          8.815   5.151  60.175
    # zoo::rollapply  34.373  27.837  88.552
    # zoo::roll[fun]   0.215   0.185      NA
    # frollapply       5.404   1.419  56.475
    # froll[fun]       0.003   0.002      NA

@jerryfuyu0104
Copy link

@jerryfuyu0104 jerryfuyu0104 commented Oct 28, 2019

hi guys, will FUN(user defined) passed to frollapply be changed to return an R object or data.frame(data.table), x passed to frollapply could be data.table of character not coerced to numeric, then FUN could do on labels and frollapply return a list? then we can do rolling regression or rolling testing like doing Benford's testing or summary on labels.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Oct 28, 2019

It is always useful to provide reproducible example. To clarify... in such a scenario you would like to frollapply(dt, 3, FUN) return a list of length nrow(dt) where each list element will be data.table returned by FUN(dt[window])?
frollapply(x=dt, n=3, fun=FUN)[[3]] equals to FUN(dt[1:3])
frollapply(x=dt, n=3, FUN=FUN)[[4]] equals to FUN(dt[2:4])
is that correct? @jerryfuyu0104

Currently we support multiple columns passed to first argument but we process them separately, looping. We would probably need some extra argument multi.var=FALSE, when set to true it would not loop over x (as it does now: list(FUN(x[[1]]),FUN(x[[2]]))) but pass all columns FUN(x).

@waynelapierre
Copy link

@waynelapierre waynelapierre commented Jan 30, 2020

any update for this?

@eliocamp
Copy link
Contributor

@eliocamp eliocamp commented Mar 3, 2020

I second that previous request.

Furthermore, would it be possible to support a "partial" argument to allow for partial windows?

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Mar 3, 2020

@eliocamp could you elaborate on what a partial window is?

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Mar 3, 2020

@eliocamp it would be possible to support "partial" argument. You may know that already but support for this functionality is already there, using adaptive=TRUE argument, see examples for details.

@eliocamp
Copy link
Contributor

@eliocamp eliocamp commented Mar 3, 2020

It would mean computing the function from the beginning through the end instead than form the half-window point.
For example for a rolling mean of 11 width, the first element returned would be the mean of the elements 1 through 6. The second, the mean of the 1st through 7th, and so on.

@eliocamp
Copy link
Contributor

@eliocamp eliocamp commented Mar 3, 2020

@jangorecki oh, thanks, I didn't know that! I'll check it out.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Mar 3, 2020

Agree, we need partial argument, not just for convenient but also for speed. adaptive=TRUE adds an overhead.
And yes we also need rolling regression, so supplying multiple variables and rolling on them at once, not each one separately.
There is no update on the status of those.

@eliocamp
Copy link
Contributor

@eliocamp eliocamp commented Mar 3, 2020

I'd love to help but my C++ skills are utterly non-existent. 😓 Do you think it might be suitable for complete newbies?

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Mar 3, 2020

We don't code in C++ but in C. Yes it is good place to start with. I did exactly that on frollmean.

@eliocamp
Copy link
Contributor

@eliocamp eliocamp commented Mar 10, 2020

I look at the code and it seems daunting. But I'll update you in any case.

But now, for yet another request: frollmean(.SD) should preserve names. More generally, froll* should preserve names if the input is a list-like with names.

@ywhcuhk
Copy link

@ywhcuhk ywhcuhk commented Jun 20, 2020

As a frequent user of data.table, I find it extremely useful to have "time aware" features, as those currently offered in the package tsibble. Unfortunately this package is developed around dplyr. I wonder if a data.table implementation could be possible. The window functions proposed in this issue are a subset of those features.

@jangorecki
Copy link
Member Author

@jangorecki jangorecki commented Jun 20, 2020

@ywhcuhk Thanks for feedback, I was actually thinking this issue was already asking for too much. Most of that is well covered by still lightweight package roll which is very fast. As for the other features, I suggest to create new issue for each feature you are interested in, so discussion whether we want to implement/maintain can be decided for each separately. Just from looking at readme of tstibble I don't see anything new it offers...
Its title is "Tidy Temporal Data Frames" but it doesn't even seem to offer temporal joins.

@ywhcuhk
Copy link

@ywhcuhk ywhcuhk commented Jun 20, 2020

Thank you @jangorecki for the response. Maybe it's a context dependent issue. The data structure I deal with most frequently is known as "panel data", with an ID and time. If the program is "aware" of this data feature, a lot operations, especially time-series operations, will be made very easy. For someone who knows STATA, it's the operations based on tsset and xtset, such as lead, lag, fill gap, etc. I think the "index" in the data.table can be enhanced in someway to enable such operations.

Of course, these operations can be done in data.table functions like shift and by. I just thought the index in data.table has a lot potential to be explored. I agree this should belong to a different issue. But I don't know how to move it without loosing above discussions ...

@AdrianAntico
Copy link

@AdrianAntico AdrianAntico commented Aug 26, 2021

@jangorecki @st-pasha

Hey guys, I'm bringing up a possible feature request. For ML and Forecasting, I use the frollmean and shift functions quite a bit to generate useful features. In a scoring environment I typically only need to generate those rolling stat features for a handful of records from the data.table. I already created some functions for recreating rolling stats on subsets of a data.table using a bunch of lags and rowmean's from outside the data.table package. However, I began testing if I could generate them in faster time using shift and frollmean with a subset in i. When testing it out I realized that I have to include all the rows that need to be used to create the lags and rolling means in order to use the subset in i properly and I'm not sure if that is the intended way to do so.

I have a few examples below where I try to create a lag column and a 2-period moving average for a single record in the data.table. In the examples, I first use the subset in i how I would like to use it, and then show that if I include the other rows used in the lag and rolling mean calc that I get what I want. It would more ideal for me if I only had to specify the rows I want the lags and rolling stats for without having to include the other rows in i.

@st-pasha I included you in this because I know you have frollmean on the roadmap for the python version and you haven't gotten to it yet.

################################################################################
# Create fake data
################################################################################

N = 25116
data <- data.table::data.table(
  DateTime = as.Date(Sys.time()),
  Target = stats::filter(
    rnorm(N, mean = 50, sd = 20),
    filter=rep(1,10),
    circular=TRUE))
data[, temp := seq(1:N)][, DateTime := DateTime - temp]
data <- data[order(DateTime)]

         DateTime   Target  temp
1:     1952-11-20 511.1355 25116
2:     1952-11-21 497.5900 25115
3:     1952-11-22 467.2040 25114
4:     1952-11-23 446.4739 25113
5:     1952-11-24 436.8124 25112
---                          
25112: 2021-08-21 631.6011     5
25113: 2021-08-22 598.5684     4
25114: 2021-08-23 570.2574     3
25115: 2021-08-24 561.8330     2
25116: 2021-08-25 527.9720     1

################################################################################
# Goal: Generate a 1-period lag for a single record in a data.table (temp == 1)
################################################################################

# Shift with i
data[temp %in% c(1), newval := data.table::shift(x = .SD, n = 1, fill = NA, type = 'lag'), .SDcols = "Target"]

         DateTime   Target  temp newval
1:     1952-11-20 511.1355 25116     NA
2:     1952-11-21 497.5900 25115     NA
3:     1952-11-22 467.2040 25114     NA
4:     1952-11-23 446.4739 25113     NA
5:     1952-11-24 436.8124 25112     NA
---                                 
25112: 2021-08-21 631.6011     5     NA
25113: 2021-08-22 598.5684     4     NA
25114: 2021-08-23 570.2574     3     NA
25115: 2021-08-24 561.8330     2     NA
25116: 2021-08-25 527.9720     1     NA

data[temp %in% c(1,2), newval := data.table::shift(x = .SD, n = 1, fill = NA, type = 'lag'), .SDcols = "Target"]

         DateTime   Target  temp  newval
1:     1952-11-20 511.1355 25116      NA
2:     1952-11-21 497.5900 25115      NA
3:     1952-11-22 467.2040 25114      NA
4:     1952-11-23 446.4739 25113      NA
5:     1952-11-24 436.8124 25112      NA
---                                  
25112: 2021-08-21 631.6011     5      NA
25113: 2021-08-22 598.5684     4      NA
25114: 2021-08-23 570.2574     3      NA
25115: 2021-08-24 561.8330     2      NA
25116: 2021-08-25 527.9720     1 561.833

################################################################################
# Goal: Generate a 2-period moving average for a single record in a data.table (temp == 1)
################################################################################

# Create fake data
N = 25116
data <- data.table::data.table(
  DateTime = as.Date(Sys.time()),
  Target = stats::filter(
    rnorm(N, mean = 50, sd = 20),
    filter=rep(1,10),
    circular=TRUE))
data[, temp := seq(1:N)][, DateTime := DateTime - temp]
data <- data[order(DateTime)]

# frollmean with i
data[temp %in% c(1), newval := data.table::frollmean(x = .SD, n = 2), .SDcols = "Target"]

         DateTime   Target  temp newval
1:     1952-11-20 524.4159 25116     NA
2:     1952-11-21 497.6071 25115     NA
3:     1952-11-22 527.2184 25114     NA
4:     1952-11-23 486.7455 25113     NA
5:     1952-11-24 488.6396 25112     NA
---                                 
25112: 2021-08-21 474.2944     5     NA
25113: 2021-08-22 511.5723     4     NA
25114: 2021-08-23 535.1824     3     NA
25115: 2021-08-24 536.3908     2     NA
25116: 2021-08-25 536.3070     1     NA

data[temp %in% c(1,2), newval := data.table::frollmean(x = .SD, n = 2), .SDcols = "Target"]

         DateTime   Target  temp   newval
1:     1952-11-20 524.4159 25116       NA
2:     1952-11-21 497.6071 25115       NA
3:     1952-11-22 527.2184 25114       NA
4:     1952-11-23 486.7455 25113       NA
5:     1952-11-24 488.6396 25112       NA
---                                   
25112: 2021-08-21 474.2944     5       NA
25113: 2021-08-22 511.5723     4       NA
25114: 2021-08-23 535.1824     3       NA
25115: 2021-08-24 536.3908     2       NA
25116: 2021-08-25 536.3070     1 536.3489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests