-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zoo::na.locf equivalent #489
Comments
Nope, please use |
In collapse you can also use |
I didn't know set/na.fill had a locf mode. |
Hello, sorry to reopen the thread but I too would have loved to see a collapse style fast grouped locf na fill, so if it's any interest I recently wrote a method on the development version of timeplyr (on github) to do just that. Feel free to check it out if it's useful. @SebKrantz Thanks again for the amazing package, I find myself using it so often it's become a daily part of my workflow. |
Thanks @NicChr. I can think about it, but for that it would be good to know which functionality you are lacking from the data.table implementation, and how you would like to see it added to collapse. Currently, I don't really plan on adding new functions to collapse, so it would have to be an argument to |
It could be that I'm not using the correct code or something, but it seems that the data.table version isn't very fast when applied to many groups. Consider the below example: I ran each expression twice to get a more accurate memory allocation. library(timeplyr)
library(data.table)
library(bench)
x <- sample.int(10^2, 10^5, TRUE)
x[sample.int(10^5, round(10^5/3))] <- NA
groups <- sample.int(10^4, 10^5, TRUE)
dt <- data.table(x, groups)
## No groups
mark(timeplyr = dt[, filled1 := .roll_na_fill(x)][]$filled1,
timeplyr2 = dt[, filled2 := .roll_na_fill(x)][]$filled2,
data.table = dt[, filled3 := data.table::nafill(x, type = "locf")][]$filled3,
data.table2 = dt[, filled4 := data.table::nafill(x, type = "locf")][]$filled4)
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 timeplyr 694µs 865µs 1079. 2.23MB 10.7
#> 2 timeplyr2 695µs 860µs 1089. 423.19KB 10.9
#> 3 data.table 671µs 945µs 998. 844.3KB 20.4
#> 4 data.table2 669µs 969µs 990. 829.95KB 22.9
## With groups
mark(timeplyr = dt[, filled1 := roll_na_fill(x, g = groups)][]$filled1,
timeplyr2 = dt[, filled2 := roll_na_fill(x, g = groups)][]$filled2,
data.table = dt[, filled3 := data.table::nafill(x, type = "locf"),
by = groups][]$filled3,
data.table2 = dt[, filled4 := data.table::nafill(x, type = "locf"),
by = groups][]$filled4)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 4 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 timeplyr 1.71ms 2.08ms 423. 6.61MB 12.0
#> 2 timeplyr2 1.66ms 2.08ms 425. 852.97KB 16.0
#> 3 data.table 297.79ms 358.07ms 2.79 157.73MB 12.6
#> 4 data.table2 308.45ms 309.95ms 3.23 157.7MB 14.5 Created on 2023-11-12 with reprex v2.0.2 My method essentially uses the order of the groups and the sorted group sizes to perform a fast locf na fill. In any case I think the benchmark demonstrates that there is a potential for a significant improvement in performance, at least when there are large numbers of groups. |
Thanks again, I will implement an option |
I added a basic |
Small update: in addition to the above, I have made available baseline C implementations with |
Is there an equivalent of zoo::na.locf in collapse? If not it would be cool to implement a fast version.
Thanks for that great package
The text was updated successfully, but these errors were encountered: