Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope to include an argument on setcolorder to remove non-referenced columns #4030

Closed
KyleHaynes opened this issue Nov 7, 2019 · 4 comments

Comments

@KyleHaynes
Copy link
Contributor

setcolorder will currently append non-referenced variables to the end of the new defined ordering. This is great in some circumstances, but often I (and I assume other users(?)) need to reorder and drop non-referenced variables.

Is there potential scope to add an argument that allows this to occur with setcolorder?

If it's not clear what I mean...

dt = data.table(iris)

# Reorder and subset to just Species and Sepal.Width.
 dt[, c("Species", "Sepal.Width")]
#       Species Sepal.Width
#  1:    setosa         3.5
#  2:    setosa         3.0
#  3:    setosa         3.2
#  4:    setosa         3.1
# ....

# Suggested new argument (not sure on the best name ... obviously not the below).
setcolorder(dt, c("Species", "Sepal.Width"), new_argument_to_remove_non_referenced_vars = T)
dt
#       Species Sepal.Width
#  1:    setosa         3.5
#  2:    setosa         3.0
#  3:    setosa         3.2
#  4:    setosa         3.1
# ....
@KyleHaynes KyleHaynes changed the title Scope to include an argument on setcolorder to remove undefined columns Scope to include an argument on setcolorder to remove non-referenced columns Nov 7, 2019
@MichaelChirico
Copy link
Member

I don't follow why this belongs in setcolorder in particular, could you elaborate on that?

Since this is all done by reference, this approach seems natural and would be equally efficient:

keep = c('Species', 'Sepal.Width')
dt[ , setdiff(names(dt), keep) := NULL]
setcolorder(dt, keep)

Note that for efficiency we should drop first, otherwise setcolorder will have to do slightly more work.

Since it's setcolorder and not setorder, all that's happening is shuffling column pointers, so overall unless theres 10Ks of columns this will all be basically instant.

@KyleHaynes
Copy link
Contributor Author

@MichaelChirico, thanks for the quick response.

I guess my thought was that in setting the ordering of columns, it's not out-of-scope to think that only retaining a defined subset of columns could be an argument for setcolorder.

It's probably just me being daft, but, if it was the first time using the function and I hadn't read the help, I would have thought setcolorder would only retain the columns that were referenced (and not just append to the front, in that given order). (I am no way suggesting default behavior should be changed / or that you shouldn't read documentation!).

Your example is my current approach when I need to do this on large datasets, I guess the efficiency relates to potentially more concise code (i.e just an argument instead of having to write the setdiff step).

Happy for this issue to be closed.

Thanks for all your work on this package!

@MichaelChirico
Copy link
Member

Agree the setdiff part is a bit awkward; what do you think of the approach suggested in #4031 ?

@KyleHaynes
Copy link
Contributor Author

Sounds good and has more application. I'll close this issue. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants