-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using ".SDcols" and "by" in one call, and the function call produces NA for one group, but not the other, you get an error #5341
Comments
Maybe you can just return double type NA from your function instead of logical NA? Otherwise your function is not type stable. |
Yes, I am aware of workarounds that will make it work and I have implemented it in my use case - I was just wondering if this is not something that could (and maybe should) be done by data.table implicitly, in particular cause we see this type of dynamic type conversion pretty much everywhere else in R. One example is if I for example use I am aware that this is not a huge issue and with some programming knowledge it should be easy to fix, but I find the very strong typing slightly peculiar and it might throw off users with less coding experience. |
From the discussions on Github, automatic optimization of
My terminology might be inaccurate for some of the described operations, but I hope this is helpful. |
Instead of coercing to double you can always use NA_real_. This will not make it work well for mixed types on input, as example you asked. Best practice way to address this case is to have generic method and two methods, for double and for integers. Handling that in fifelse is another option, handling that in [ groupby call is probably not feasible because that means that for results of every group we have to check if types matches and then adjust them accordingly, which for low cardinality grouping will not be a big deal, but when there are many small groups will be definitely slowing down the aggregation. |
Alright, thank you both for your input! Then I will simply keep type stability better in mind in future functions and switch to fifelse where possible :) |
Easiest option is probably to make a cast, e. g. While as Jan pointed, it would be possible to do type checks and bump types if necessary, I think it would not really help users in the long run. Being aware of the different types of NA is something that useRs should learn imo. |
Closing. |
I have run into the following situation (simplified for reproducibility):
sumNA <- function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE))
dt <- data.table(col1 = c(rep(NA, 2), 1, 2), col2 = rep(c(1,2), each = 2), groupby = c(1,1,2,2))
dt[, lapply(.SD, sumNA), .SDcols = c("col1", "col2"), by = "groupby"]
Output of sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.1 cli_3.1.0 tools_4.1.1 data.table_1.14.2 rlang_1.0.
The text was updated successfully, but these errors were encountered: