Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

Closed
tdhock opened this issue Nov 3, 2022 · 3 comments · Fixed by #5549
Closed

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

tdhock opened this issue Nov 3, 2022 · 3 comments · Fixed by #5549
Labels
reshape dcast melt

Comments

@tdhock
Copy link
Member

tdhock commented Nov 3, 2022

Hi! I noticed a warning which I think is potentially confusing and should be removed:

> library(data.table)
data.table 1.14.5 IN DEVELOPMENT built 2022-11-01 15:48:48 UTC; root using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
> DT <- data.table(chr=c("a","b","b"), int=1:3)[, num := as.numeric(int)]
> dcast(DT, chr ~ ., list(min,max), value.var="int")
Key: <chr>
      chr int_min int_max
   <char>   <int>   <int>
1:      a       1       1
2:      b       2       3
Warning messages:
1: In dcast.data.table(DT, chr ~ ., list(min, max), value.var = "int") :
  NAs introduced by coercion to integer range
2: In dcast.data.table(DT, chr ~ ., list(min, max), value.var = "int") :
  NAs introduced by coercion to integer range

The result above is as expected, so I would suggest removing this warning to avoid confusion. (because of this warning, I mistakenly thought that the result was not computed correctly, but in fact the result is fine)

It seems that this issue is specific to when value.var is type integer, and fun.aggregate is min or max. No warning happens when we do min/max on numeric column:

> dcast(DT, chr ~ ., list(min,max), value.var="num")
Key: <chr>
      chr num_min num_max
   <char>   <num>   <num>
1:      a       1       1
2:      b       2       3

And no warning happens when we do mean/sd on int column:

> dcast(DT, chr ~ ., list(mean,sd), value.var="int")
Key: <chr>
      chr int_mean    int_sd
   <char>    <num>     <num>
1:      a      1.0        NA
2:      b      2.5 0.7071068

I searched the tests for this warning message and I found:

froll.Rraw:414:test(6000.114, frollmean(1:5, Inf), error="n must be positive integer values", warning="NAs introduced by coercion*")
froll.Rraw:416:test(6000.115, frollmean(1:5, c(5, Inf)), error="n must be positive integer values", warning="NAs introduced by coercion*")
nafill.Rraw:125:test(3.07, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))
@tdhock tdhock added the reshape dcast melt label Nov 3, 2022
@jangorecki
Copy link
Member

jangorecki commented Nov 3, 2022

Tests you found are different use cases, where this warning is good. They come from base R rather than data.table. I don't see exactly how we call R from dcast that those warnings are raised, that should be checked, but to suppress the warning we probably have to catch warnings, and re-raise any, excluding this particular one.

@shrektan
Copy link
Member

shrektan commented Nov 29, 2022

You may avoid the warning by explicitly providing the fill argument, e.g., dcast(DT, chr ~ ., list(min, max), value.var="int", fill = NA_integer_).

The reason for this warning is that data.table is guessing what should be filled using DT[0][, max(col)], which returns Inf, a double type. However, min/max applies on an integer actually gets integer type. So the C-level code will coerce the double Inf into an integer, which raises the warning.

The root cause of this is that, min or max returns different types for integer() and any non-zero-length integer, including NA_integer_.

Probably we should suppress the warning here:

data.table/src/fcast.c

Lines 28 to 30 in cb8aeff

if (TYPEOF(thisfill) != thistype) {
thisfill = PROTECT(coerceVector(thisfill, thistype)); nprotect++;
}

@tdhock
Copy link
Member Author

tdhock commented Nov 30, 2022

this is related to #5390 where there is an example with fill=NULL, where dcast uses fun.aggregate to compute a fill value, but the fill value is actually not used/needed.

For example, below the fill value is actually needed and used in three places, so the warning is reasonable.

> dcast(DT, num ~ chr, min, value.var="int")
Key: <num>
     num     a     b
   <num> <int> <int>
1:     1     1    NA
2:     2    NA     2
3:     3    NA     3
Warning message:
In dcast.data.table(DT, num ~ chr, min, value.var = "int") :
  NAs introduced by coercion to integer range

However in the code below the fill value is not actually used, so getting this warning is confusing, because the fill value is computed but not used:

> dcast(DT, . ~ chr, min, value.var="int")
Key: <.>
        .     a     b
   <char> <int> <int>
1:      .     1     2
Warning message:
In dcast.data.table(DT, . ~ chr, min, value.var = "int") :
  NAs introduced by coercion to integer range

Therefore I would say that the fix should be to update dcast so that it only computes the default fill value if it is actually needed in the output.

tdhock added a commit that referenced this issue Dec 1, 2022
MichaelChirico added a commit that referenced this issue Mar 14, 2024
* delete old commented code

* new test for no warning fails

* only compute default fill if missing cells present

* any_NA_int helper

* bugfix #5512

* Update src/fcast.c

Co-authored-by: Xianying Tan <shrektan@126.com>

* Update src/fcast.c

Co-authored-by: Xianying Tan <shrektan@126.com>

* mention warning text

* const int args

* add back ithiscol

* get pointer before for loop

* add test case from Michael

* test min(dbl) and no warning when fill specified

* Revert "delete old commented code"

This reverts commit 2886c4f.

* use suggestions from Michael

* rm inline any_NA_int since that causes install to fail

* clarify comment

* link 5390

* mymin test fails

* compute some_fill using anyNA in R then pass to C

* Update R/fcast.R

Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>

* Update R/fcast.R

Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>

* dat_for_default_fill is zero-row dt

* !length instead of length==0

* new dcast tests with fill=character

* dat_for_default_fill is dat again, not 0-row, because that causes some test failure

---------

Co-authored-by: Xianying Tan <shrektan@126.com>
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reshape dcast melt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants