un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

tdhock · 2022-11-03T00:19:04Z

Hi! I noticed a warning which I think is potentially confusing and should be removed:

> library(data.table)
data.table 1.14.5 IN DEVELOPMENT built 2022-11-01 15:48:48 UTC; root using 6 threads (see ?getDTthreads).  Latest news: r-datatable.com
> DT <- data.table(chr=c("a","b","b"), int=1:3)[, num := as.numeric(int)]
> dcast(DT, chr ~ ., list(min,max), value.var="int")
Key: <chr>
      chr int_min int_max
   <char>   <int>   <int>
1:      a       1       1
2:      b       2       3
Warning messages:
1: In dcast.data.table(DT, chr ~ ., list(min, max), value.var = "int") :
  NAs introduced by coercion to integer range
2: In dcast.data.table(DT, chr ~ ., list(min, max), value.var = "int") :
  NAs introduced by coercion to integer range

The result above is as expected, so I would suggest removing this warning to avoid confusion. (because of this warning, I mistakenly thought that the result was not computed correctly, but in fact the result is fine)

It seems that this issue is specific to when value.var is type integer, and fun.aggregate is min or max. No warning happens when we do min/max on numeric column:

> dcast(DT, chr ~ ., list(min,max), value.var="num")
Key: <chr>
      chr num_min num_max
   <char>   <num>   <num>
1:      a       1       1
2:      b       2       3

And no warning happens when we do mean/sd on int column:

> dcast(DT, chr ~ ., list(mean,sd), value.var="int")
Key: <chr>
      chr int_mean    int_sd
   <char>    <num>     <num>
1:      a      1.0        NA
2:      b      2.5 0.7071068

I searched the tests for this warning message and I found:

froll.Rraw:414:test(6000.114, frollmean(1:5, Inf), error="n must be positive integer values", warning="NAs introduced by coercion*")
froll.Rraw:416:test(6000.115, frollmean(1:5, c(5, Inf)), error="n must be positive integer values", warning="NAs introduced by coercion*")
nafill.Rraw:125:test(3.07, nafill(x, fill="asd"), x, warning=c("Coercing.*character.*integer","NAs introduced by coercion"))

The text was updated successfully, but these errors were encountered:

jangorecki · 2022-11-03T12:36:54Z

Tests you found are different use cases, where this warning is good. They come from base R rather than data.table. I don't see exactly how we call R from dcast that those warnings are raised, that should be checked, but to suppress the warning we probably have to catch warnings, and re-raise any, excluding this particular one.

shrektan · 2022-11-29T05:08:31Z

You may avoid the warning by explicitly providing the fill argument, e.g., dcast(DT, chr ~ ., list(min, max), value.var="int", fill = NA_integer_).

The reason for this warning is that data.table is guessing what should be filled using DT[0][, max(col)], which returns Inf, a double type. However, min/max applies on an integer actually gets integer type. So the C-level code will coerce the double Inf into an integer, which raises the warning.

The root cause of this is that, min or max returns different types for integer() and any non-zero-length integer, including NA_integer_.

Probably we should suppress the warning here:

data.table/src/fcast.c

Lines 28 to 30 in cb8aeff

    
           if (TYPEOF(thisfill) != thistype) { 
        
             thisfill = PROTECT(coerceVector(thisfill, thistype)); nprotect++; 
        
           }

tdhock · 2022-11-30T17:51:33Z

this is related to #5390 where there is an example with fill=NULL, where dcast uses fun.aggregate to compute a fill value, but the fill value is actually not used/needed.

For example, below the fill value is actually needed and used in three places, so the warning is reasonable.

> dcast(DT, num ~ chr, min, value.var="int")
Key: <num>
     num     a     b
   <num> <int> <int>
1:     1     1    NA
2:     2    NA     2
3:     3    NA     3
Warning message:
In dcast.data.table(DT, num ~ chr, min, value.var = "int") :
  NAs introduced by coercion to integer range

However in the code below the fill value is not actually used, so getting this warning is confusing, because the fill value is computed but not used:

> dcast(DT, . ~ chr, min, value.var="int")
Key: <.>
        .     a     b
   <char> <int> <int>
1:      .     1     2
Warning message:
In dcast.data.table(DT, . ~ chr, min, value.var = "int") :
  NAs introduced by coercion to integer range

Therefore I would say that the fix should be to update dcast so that it only computes the default fill value if it is actually needed in the output.

* delete old commented code * new test for no warning fails * only compute default fill if missing cells present * any_NA_int helper * bugfix #5512 * Update src/fcast.c Co-authored-by: Xianying Tan <shrektan@126.com> * Update src/fcast.c Co-authored-by: Xianying Tan <shrektan@126.com> * mention warning text * const int args * add back ithiscol * get pointer before for loop * add test case from Michael * test min(dbl) and no warning when fill specified * Revert "delete old commented code" This reverts commit 2886c4f. * use suggestions from Michael * rm inline any_NA_int since that causes install to fail * clarify comment * link 5390 * mymin test fails * compute some_fill using anyNA in R then pass to C * Update R/fcast.R Co-authored-by: Michael Chirico <michaelchirico4@gmail.com> * Update R/fcast.R Co-authored-by: Michael Chirico <michaelchirico4@gmail.com> * dat_for_default_fill is zero-row dt * !length instead of length==0 * new dcast tests with fill=character * dat_for_default_fill is dat again, not 0-row, because that causes some test failure --------- Co-authored-by: Xianying Tan <shrektan@126.com> Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>

tdhock added the reshape dcast melt label Nov 3, 2022

tdhock mentioned this issue Nov 30, 2022

dcast only computes default fill if necessary #5549

Merged

tdhock added a commit that referenced this issue Dec 1, 2022

bugfix #5512

258befb

MichaelChirico closed this as completed in #5549 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

tdhock commented Nov 3, 2022

jangorecki commented Nov 3, 2022 •

edited

Loading

shrektan commented Nov 29, 2022 •

edited by tdhock

Loading

tdhock commented Nov 30, 2022 •

edited

Loading

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

Comments

tdhock commented Nov 3, 2022

jangorecki commented Nov 3, 2022 • edited Loading

shrektan commented Nov 29, 2022 • edited by tdhock Loading

tdhock commented Nov 30, 2022 • edited Loading

jangorecki commented Nov 3, 2022 •

edited

Loading

shrektan commented Nov 29, 2022 •

edited by tdhock

Loading

tdhock commented Nov 30, 2022 •

edited

Loading