Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset.data.table return wrong cols when there are duplicated column names #891

Closed
jjzz opened this issue Oct 15, 2014 · 1 comment
Closed
Assignees
Labels
Milestone

Comments

@jjzz
Copy link

jjzz commented Oct 15, 2014

library(data.table)
# data.table 1.9.4  For help type: ?data.table
d=data.table(rep(3,3))
d=data.table(rep(1,3),rep(2,3), d)
d
#    V1 V2 V1
#1:  1  2  3
#2:  1  2  3
#3:  1  2  3
subset(d,T,c(3,2))
#    V1 V2
#1:  1  2
#2:  1  2
#3:  1  2

When using subset(d,T,c(3,2)), I want to retrieve the 3rd and 2nd columns. But subset() return the 1st and 2nd columns. Seems it's because of the duplicated column names V1 in data.table d.

I don't know the internal logic about how to handle duplicated col names. But I supposed that if the sequence id is supplied, then the col names (even if there are duplicated col names) should not bother, is it right?

Or maybe there should be some kind of warning when there are duplicated column names ?

@arunsrinivasan
Copy link
Member

In data.table, providing column numbers should result in the right column even when there are duplicate column names. Providing column name will always return the first column (by order of occurrence) if there are more than one column with the same name.

EX:

d[, c(3,2), with=FALSE]
#    V1 V2
# 1:  3  2
# 2:  3  2
# 3:  3  2

So this'd be a bug. Thanks for the report.

@arunsrinivasan arunsrinivasan self-assigned this Oct 15, 2014
@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Oct 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants