`first` function not working as `fun.aggregate` for `dcast` if `fill` argument is not provided #5390

iago-pssjd · 2022-05-23T10:55:11Z

I would expect I could do:

set.seed(1)
DT <- data.table(v1 = rep(1:2, each = 6),
                 v2 = rep(rep(1:3, 2), each = 2),
                 v3 = rep(1:2, 6),
                 v4 = rnorm(6))
dcast(DT, v1 + v2 ~ ., value.var = "v4", fun.aggregate = data.table::first)
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.

However I get an error instead of

dcast(DT, v1 + v2 ~ ., value.var = "v4", fun.aggregate = \(.).[1])
   v1 v2          .
1:  1  1 -0.6264538
2:  1  2 -0.8356286
3:  1  3  0.3295078
4:  2  1 -0.6264538
5:  2  2 -0.8356286
6:  2  3  0.3295078

Is it expected or a bug?

tdhock · 2022-05-25T20:17:55Z

Maybe the issue title could be misleading because first function works as fun.aggregate in dcast, as long as you provide your own fill argument:

> dcast(DT, v1 + v2  ~ ., value.var = "v4", fun.aggregate = data.table::first, fill=NA)
Key: <v1, v2>
      v1    v2          .
   <int> <int>      <num>
1:     1     1 -0.6264538
2:     1     2 -0.8356286
3:     1     3  0.3295078
4:     2     1 -0.6264538
5:     2     2 -0.8356286
6:     2     3  0.3295078

dcast is behaving as documented, although documentation could be clarified to indicate that only if fill=NULL then fun.aggregate is called on a 0-length vector to determine fill value. Current docs say:

    fill: Value with which to fill missing cells. If 'fun.aggregate' is
          present, takes the value by applying the function on a
          0-length vector.

In the example above it seems that since fill=NULL dcast uses fun.aggregate to compute a fill value, but the fill value is actually not used/needed. That may be a bug in dcast? I suggest updating dcast so that fun.aggregate is only used when needed:

    fill: Value with which to fill missing cells. If fill=NULL and 
         missing cells are present, then 'fun.aggregate' is
         used on a 0-length vector to obtain a fill value.

Another example would be where the fill argument is actually used/needed:

> (sub.DT <- DT[1:7])
      v1    v2    v3         v4
   <int> <int> <int>      <num>
1:     1     1     1 -0.6264538
2:     1     1     2  0.1836433
3:     1     2     1 -0.8356286
4:     1     2     2  1.5952808
5:     1     3     1  0.3295078
6:     1     3     2 -0.8204684
7:     2     1     1 -0.6264538
> dcast(sub.DT, v1 ~ v2, value.var = "v4", fun.aggregate = data.table::first, fill=NA)
Key: <v1>
      v1          1          2         3
   <int>      <num>      <num>     <num>
1:     1 -0.6264538 -0.8356286 0.3295078
2:     2 -0.6264538         NA        NA
> dcast(sub.DT, v1 ~ v2, value.var = "v4", fun.aggregate = data.table::first)
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately.

I would argue that stopping with an error is reasonable in this case, since the user asked to aggregate using the first item, but there is no first item for two combinations.

The data.table::first docs do not mention what happens when the input has length=0, so this is probably an un-anticipated use case. We could fix by either (1) documenting the current behavior,

> data.table::first(numeric(0))
numeric(0)

or (2) changing first to return NA instead of numeric(0) in this case.

iago-pssjd · 2022-05-25T20:41:36Z

Thanks! @tdhock, I modified the issue title.

tdhock · 2022-08-08T04:00:45Z

#5168 adds na.rm=T support to first/last, but does not fix this issue: numeric(0) is still returned, but to fix this issue NA would have to be returned.

tdhock · 2022-11-30T19:13:50Z

hey @ben-schwen do you think it makes sense for first/last to always return a vector with length=1, even in the case of an empty (length=0) input vector? If so, do you think it makes sense to add that change to #5168 or to create a new PR?

ben-schwen · 2022-11-30T20:36:24Z

We could add an extra argument for that behavior, although, I'm not quite convinced since first/last are similar to head/tail and those also return fewer values if not enough are available.

Moreover, I wouldn't add it to #5168, since I don't know what the return value for first(integer(0), na.rm=TRUE) should be :D

tdhock · 2022-11-30T20:42:23Z

ok.
according to the discussion in this issue, the return value of first(integer(0), na.rm=TRUE) should be NA_integer_.

tdhock added bug reshape dcast melt labels May 25, 2022

iago-pssjd changed the title ~~first function not allowed as fun.aggregate for dcast~~ first function not working as fun.aggregate for dcast if fill argument is not provided May 25, 2022

tdhock mentioned this issue Nov 30, 2022

un-necessary warning for dcast on int columns with fun.aggregate=min/max #5512

Closed

tdhock mentioned this issue Nov 23, 2023

dcast only computes default fill if necessary #5549

Merged

MichaelChirico closed this as completed in #5549 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`first` function not working as `fun.aggregate` for `dcast` if `fill` argument is not provided #5390

`first` function not working as `fun.aggregate` for `dcast` if `fill` argument is not provided #5390

iago-pssjd commented May 23, 2022 •

edited

Loading

tdhock commented May 25, 2022 •

edited

Loading

iago-pssjd commented May 25, 2022

tdhock commented Aug 8, 2022 •

edited

Loading

tdhock commented Nov 30, 2022

ben-schwen commented Nov 30, 2022

tdhock commented Nov 30, 2022

first function not working as fun.aggregate for dcast if fill argument is not provided #5390

first function not working as fun.aggregate for dcast if fill argument is not provided #5390

Comments

iago-pssjd commented May 23, 2022 • edited Loading

tdhock commented May 25, 2022 • edited Loading

iago-pssjd commented May 25, 2022

tdhock commented Aug 8, 2022 • edited Loading

tdhock commented Nov 30, 2022

ben-schwen commented Nov 30, 2022

tdhock commented Nov 30, 2022

`first` function not working as `fun.aggregate` for `dcast` if `fill` argument is not provided #5390

`first` function not working as `fun.aggregate` for `dcast` if `fill` argument is not provided #5390

iago-pssjd commented May 23, 2022 •

edited

Loading

tdhock commented May 25, 2022 •

edited

Loading

tdhock commented Aug 8, 2022 •

edited

Loading