-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first
function not working as fun.aggregate
for dcast
if fill
argument is not provided
#5390
Comments
Maybe the issue title could be misleading because first function works as fun.aggregate in dcast, as long as you provide your own fill argument: > dcast(DT, v1 + v2 ~ ., value.var = "v4", fun.aggregate = data.table::first, fill=NA)
Key: <v1, v2>
v1 v2 .
<int> <int> <num>
1: 1 1 -0.6264538
2: 1 2 -0.8356286
3: 1 3 0.3295078
4: 2 1 -0.6264538
5: 2 2 -0.8356286
6: 2 3 0.3295078 dcast is behaving as documented, although documentation could be clarified to indicate that only if fill=NULL then fun.aggregate is called on a 0-length vector to determine fill value. Current docs say: fill: Value with which to fill missing cells. If 'fun.aggregate' is
present, takes the value by applying the function on a
0-length vector. In the example above it seems that since fill=NULL dcast uses fun.aggregate to compute a fill value, but the fill value is actually not used/needed. That may be a bug in dcast? I suggest updating dcast so that fun.aggregate is only used when needed: fill: Value with which to fill missing cells. If fill=NULL and
missing cells are present, then 'fun.aggregate' is
used on a 0-length vector to obtain a fill value. Another example would be where the fill argument is actually used/needed: > (sub.DT <- DT[1:7])
v1 v2 v3 v4
<int> <int> <int> <num>
1: 1 1 1 -0.6264538
2: 1 1 2 0.1836433
3: 1 2 1 -0.8356286
4: 1 2 2 1.5952808
5: 1 3 1 0.3295078
6: 1 3 2 -0.8204684
7: 2 1 1 -0.6264538
> dcast(sub.DT, v1 ~ v2, value.var = "v4", fun.aggregate = data.table::first, fill=NA)
Key: <v1>
v1 1 2 3
<int> <num> <num> <num>
1: 1 -0.6264538 -0.8356286 0.3295078
2: 2 -0.6264538 NA NA
> dcast(sub.DT, v1 ~ v2, value.var = "v4", fun.aggregate = data.table::first)
Error: Aggregating function(s) should take vector inputs and return a single value (length=1). However, function(s) returns length!=1. This value will have to be used to fill any missing combinations, and therefore must be length=1. Either override by setting the 'fill' argument explicitly or modify your function to handle this case appropriately. I would argue that stopping with an error is reasonable in this case, since the user asked to aggregate using the first item, but there is no first item for two combinations. The data.table::first docs do not mention what happens when the input has length=0, so this is probably an un-anticipated use case. We could fix by either (1) documenting the current behavior, > data.table::first(numeric(0))
numeric(0) or (2) changing first to return NA instead of numeric(0) in this case. |
first
function not allowed as fun.aggregate
for dcast
first
function not working as fun.aggregate
for dcast
if fill
argument is not provided
Thanks! @tdhock, I modified the issue title. |
#5168 adds na.rm=T support to first/last, but does not fix this issue: numeric(0) is still returned, but to fix this issue NA would have to be returned. |
hey @ben-schwen do you think it makes sense for first/last to always return a vector with length=1, even in the case of an empty (length=0) input vector? If so, do you think it makes sense to add that change to #5168 or to create a new PR? |
We could add an extra argument for that behavior, although, I'm not quite convinced since Moreover, I wouldn't add it to #5168, since I don't know what the return value for |
ok. |
I would expect I could do:
However I get an error instead of
Is it expected or a bug?
The text was updated successfully, but these errors were encountered: