Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patterns in .SDcols #1878

Closed
eantonya opened this issue Oct 14, 2016 · 8 comments
Closed

patterns in .SDcols #1878

eantonya opened this issue Oct 14, 2016 · 8 comments

Comments

@eantonya
Copy link
Contributor

@eantonya eantonya commented Oct 14, 2016

I thought I've seen this FR before, but couldn't find it.

Would be nice if we could specify column names using regex expressions in .SDcols. Currently one has to do something like .SDcols = grep("mypattern", names(myDT)), which you can't chain on, and is pretty fragile.

Perhaps the patterns function from melt can be reused here, making the syntax .SDcols = patterns("mypattern").

@ksavin
Copy link

@ksavin ksavin commented Oct 26, 2016

I'd like to add, that patterns would be super useful in j as well.

It is often needed to select columns with grep and the only way is to refer it via names(), e.g.
veryLongDataTableName[, grep('lag', names(veryLongDataTableName), with = FALSE]

or to remove multiple columns, e.g.
dt[, (grep('lag', names(dt)) := NULL]

or to make new column names:
dt[, paste0(grep('^test', names(dt), value = TRUE), '_sqrt') := lapply(.SD, sqrt), .SDcols = grep('^test', names(dt), value = TRUE)]

These would be much shorter with patterns available in j as well:
veryLongDataTableName[, patterns('lag'), with = FALSE]
dt[, patterns('lag') := NULL]
dt[, paste0(patterns('^test'), '_sqrt') := lapply(.SD, sqrt), .SDcols = patterns('^test')]

Alternatively, it would be handy to have a special symbol for column names, selected in .SDcols, e.g. .NM

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Oct 26, 2016

names(.SD) should suffice...

On Oct 26, 2016 12:37 PM, "ksavin" notifications@github.com wrote:

I'd like to add, that patterns would be super useful in j as well.

It is often needed to select columns with grep and the only way is to
refer it via names(), e.g.
veryLongDataTableName[, grep('lag', names(veryLongDataTableName), with =
FALSE]

or to remove multiple columns, e.g.
dt[, (grep('lag', names(dt)) := NULL]

or to make new column names:
dt[, paste0(grep('^test', names(dt), value = TRUE), '_sqrt') :=
lapply(.SD, sqrt), .SDcols = grep('^test', names(dt), value = TRUE)]

These would be much shorter with patterns available in j as well:
veryLongDataTableName[, patterns('lag'), with = FALSE]
dt[, patterns('lag') := NULL]
dt[, paste0(patterns('^test'), '_sqrt') := lapply(.SD, sqrt), .SDcols =
patterns('^test')]

Alternatively, it would be handy to have a special symbol for column
names, selected in .SDcols, e.g. .NM


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1878 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHQQdapOU-mUdVF-4ADOsKkuD4cSC3Lvks5q34G7gaJpZM4KXfit
.

@ksavin
Copy link

@ksavin ksavin commented Oct 26, 2016

Forgot I am in fact using names(.SD) for these cases :)
Still, way cleaner and shorter with patterns.

@MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Oct 26, 2016

I do regularly things like grep(pattern, names(.SD)) (and maybe add value = TRUE)... Maybe I'm just used to this setup.

@mbacou
Copy link

@mbacou mbacou commented Dec 8, 2016

Another one I tend to use is .SDcols=names(.SD) %like% "mypattern", a little ugly. Upvoting this FR as well.

@hannes101
Copy link

@hannes101 hannes101 commented Mar 24, 2017

Just as a reference to a SO question, please update it there also if it got implemented :-)
https://stackoverflow.com/questions/42999949/select-data-table-columns-with-grep-like-partial-matching

@HughParsonage
Copy link
Member

@HughParsonage HughParsonage commented May 18, 2018

I wrote select_grep in package hutils before realizing this was an outstanding issue:

library(hutils)
library(data.table)
dt <- data.table(x1 = 1, x2 = 2, y = 0)
select_grep(dt, "x")
#>    x1 x2
#> 1:  1  2
    select_grep(dt, "x", .and = "y")
#>    x1 x2 y
#> 1:  1  2 0
    select_grep(dt, "x", .and = "y", .but.not = "x2")
#>    x1 y
#> 1:  1 0

Created on 2018-05-19 by the reprex package (v0.2.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants