-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return a data.table without the key #981
Comments
.SD and .SDcol will do the job. R> head(dt)
ymd london pairs berlin
1: 1900-01-01 12:00:00 0.62158 0.8151 0.2893
2: 1900-01-02 12:00:00 0.09772 0.7228 0.5576
3: 1900-01-03 12:00:00 0.65804 0.8039 0.9895
4: 1900-01-04 12:00:00 0.87387 0.2731 0.1960
5: 1900-01-05 12:00:00 0.75414 0.4138 0.4678
6: 1900-01-06 12:00:00 0.60392 0.6056 0.2084
R> lm(london ~ ., dt[, .SD, .SDcol = -key(dt)])
Call:
lm(formula = london ~ ., data = dt[, .SD, .SDcol = -key(dt)])
Coefficients:
(Intercept) pairs berlin
0.50158 0.00231 -0.00230 enjoy native data.table :) |
yi-tang I just saw your response, and thank you! However, I still think it would still be useful (simpler and easy to read) to have a function that returns a data.table without the key; similar to the You make a great argument to simply rely on the native functionality, and this is probably a question of design. I think the coredata (or whatever) function would be nice, but I can see the other side here too. BUT, I would much prefer -Gene |
Gene, I've marked as FR, but at the moment, I don't see a reason "for". It seems reasonable to me to write your own function, as it's a very special case of a subset operation. Are there other compelling cases where you need this? |
@geneorama keyless <- function(x) x[, .SD, .SDcol = -key(x)] I understand there are cases where it is useful but data.table is still more focused on providing wide and efficient table data manipulation framework than direct function to achieve something basic as above. If you strongly believe it should be included in master you can try PR 👍 |
After six months I seem to be the only one who thinks this is a good idea, so I'll just stick with a custom function |
It doesn't seem like a bad idea to me. No objection to adding it. Not sure the best name. Would we need to select the key columns only sometimes as well - what would that function be called? |
I was going to explain how I thought it was a bad idea... but my rechanged (?) my mind, and the example I worked out turned out to validate my original suggestion. I think it could be confusing with This is an example of a pretty typical workflow for me; EDIT: Also, I called it library(data.table)
set.seed(1)
data_full <- data.table(mykey = letters,
group = c(rep("train", 10), rep("test", 16)),
x1 = rnorm(26), x2 = rnorm(26), x3 = rnorm(26), x4 = rnorm(26),
x5 = rnorm(26), x6 = rnorm(26), x7 = rnorm(26), x8 = rnorm(26),
y = sample(c(0,1), 26,replace=T),
key = c("mykey", "group"))
dekey <- function(x) x[, .SD, .SDcol = -key(x)]
## Regress on some different column subsets
## Perhaps create copies of the subsets for future plotting and analysis
d1 <- data_full[ , list(x2,x4,x6,x8,y), keyby=list(mykey, group)]
d2 <- data_full[ , list(x1,x3,x5,y), keyby=list(mykey, group)]
glm1 <- glm(y ~ ., data = dekey(d1[group=="test"]), family = "binomial")
glm2 <- glm(y ~ ., data = dekey(d2[group=="test"]), family = "binomial")
## To create a data.table of predictions the keys have to be added back,
## and we're relying on the data being in the same order
pred1 <- data.table(data_full[ , list(mykey, group)],
yhat = predict(glm1, data_full),
key = c("mykey", "group"))
pred2 <- data.table(data_full[ , list(mykey, group)],
yhat = predict(glm2, data_full),
key = c("mykey", "group"))
## Merge in predictions as needed
data_full[pred1]
data_full[pred2]
## Merge in predictions as needed e.g. for plotting
library(ggplot2)
ggplot(data_full[pred1]) + aes(x2, yhat, colour = group) + geom_point(size=9)
ggplot(data_full[pred2]) + aes(x2, yhat, colour = group) + geom_point(size=9) |
How about
|
Quick follow-up on this issue, does anyone have suggestions on how to best close this issue? |
I opened the issue to see what people thought, and a decade later I think it's safe to close the polls. |
It would be nice to be able to return a data.table on the fly without the key.
This would be useful for things like regressions where you might want to keep a key, but you don't want to include it in the regression. Of course I could use my own function, but I would prefer to use something standard.
Example function
Perhaps there's something more elegant / obvious?
Example usage:
I know that others have mentioned this, but I couldn't find an existing issue.
Thank you
The text was updated successfully, but these errors were encountered: