Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min/max fails on ordered factors when using by #1947

Closed
mcieslik-mctp opened this issue Dec 1, 2016 · 6 comments
Closed

min/max fails on ordered factors when using by #1947

mcieslik-mctp opened this issue Dec 1, 2016 · 6 comments
Milestone

Comments

@mcieslik-mctp
Copy link

mcieslik-mctp commented Dec 1, 2016

It seems as this is a regression in the latest version (data.table_1.9.8), as it worked before

library(data.table)
test <- data.table(V1=factor(rep(c("a","b"), 10), levels=c("a", "b"), ordered=TRUE), 
V2=rep(c("c","d", "e", "f"), 5))
test[,min(V1)]           # (1)
test[,min(V1),by=V2]     # (2)

(1) correctly works but (2) returns an error:
Error in gmin(V1) : min is not meaningful for factors.

@mattdowle mattdowle added this to the v1.10.0 milestone Dec 1, 2016
@mattdowle
Copy link
Member

mattdowle commented Dec 2, 2016

(1) dispatches to base R. (2) dispatches to GForce grouping.
(You can pass verbose=TRUE to queries to get more insight.)

GForce was changed here (a1b1c08) but I don't see any notes in NEWS or any tests added.

R's min treats ordered factors and non-ordered factors differently, as you nicely showed.

R --vanilla
> x = factor(letters)
> min(x)
Error in Summary.factor(1:26, na.rm = FALSE) : 
  ‘min’ not meaningful for factors
> x = factor(letters, ordered=TRUE)
> min(x)
[1] a
26 Levels: a < b < c < d < e < f < g < h < i < j < k < l < m < n < o < ... < z

Yes I guess we should be in line with base R in this regard. Thanks for highlighting.

@mattdowle mattdowle modified the milestones: v1.10.2, v1.10.0 Dec 2, 2016
@mattdowle mattdowle removed this from the Candidate milestone May 10, 2018
@mbacou
Copy link

mbacou commented Jun 18, 2018

Is there a status update about data.table's support for summarizing ordered factors? Won't be implemented? Thx.

@MichaelChirico
Copy link
Member

MichaelChirico commented Jun 20, 2018

@mbacou as luck would have it I think the fix was trivial (#2944). Thanks for the impetus!!

In any case, you can always set options(datatable.optimize = 0) to prevent GForce from failing on ordered factors, it'll just be slower.

@franknarf1
Copy link
Contributor

@mbacou
Copy link

mbacou commented Jun 20, 2018

Excellent thanks! Was surprised to see that feature work well in e.g. PostgreSQL and not in data.table. Glad that was an easy fix.

@MichaelChirico
Copy link
Member

MichaelChirico commented Jun 20, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants