Return a data.table without the key #981

geneorama · 2014-12-08T22:43:43Z

It would be nice to be able to return a data.table on the fly without the key.

This would be useful for things like regressions where you might want to keep a key, but you don't want to include it in the regression. Of course I could use my own function, but I would prefer to use something standard.

Example function

Perhaps there's something more elegant / obvious?

keyless <- function(x){
    x[ , -which(colnames(x) %in% key(x)), with=FALSE]
}

Example usage:

library(data.table)
## Example using the rock data, with an additional column ID which 
## in a real example may be used to join different data sets.
dt <- data.table(id=paste0("rock", sprintf("%02d", 1:48)), rock)
setkey(dt, id)

## View the structure:
str(dt)

# Classes ‘data.table’ and 'data.frame':  48 obs. of  5 variables:
#  $ id   : chr  "rock01" "rock02" "rock03" "rock04" ...
#  $ area : int  4990 7002 7558 7352 7943 7979 9333 8209 8393 6425 ...
#  $ peri : num  2792 3893 3931 3869 3949 ...
#  $ shape: num  0.0903 0.1486 0.1833 0.1171 0.1224 ...
#  $ perm : num  6.3 6.3 6.3 6.3 17.1 17.1 17.1 17.1 119 119 ...
#  - attr(*, ".internal.selfref")=<externalptr> 
#  - attr(*, "sorted")= chr "id"

## Define "keyless"
keyless <- function(x){
    x[ , -which(colnames(x) %in% key(x)), with=FALSE]
}
## Do a regression
## Obviously we want to exclude the column of identifiers, so we use keyless
lm(area~., keyless(dt))

# Call:
# lm(formula = area ~ ., data = keyless(dt))
# 
# Coefficients:
# (Intercept)         peri        shape         perm  
#    -407.069        2.193     2992.314        2.549

I know that others have mentioned this, but I couldn't find an existing issue.

Thank you

yitang · 2014-12-08T22:55:35Z

.SD and .SDcol will do the job.

R> head(dt)
                   ymd  london  pairs berlin
1: 1900-01-01 12:00:00 0.62158 0.8151 0.2893
2: 1900-01-02 12:00:00 0.09772 0.7228 0.5576
3: 1900-01-03 12:00:00 0.65804 0.8039 0.9895
4: 1900-01-04 12:00:00 0.87387 0.2731 0.1960
5: 1900-01-05 12:00:00 0.75414 0.4138 0.4678
6: 1900-01-06 12:00:00 0.60392 0.6056 0.2084

R> lm(london ~ ., dt[, .SD, .SDcol = -key(dt)])

Call:
lm(formula = london ~ ., data = dt[, .SD, .SDcol = -key(dt)])

Coefficients:
(Intercept)        pairs       berlin  
    0.50158      0.00231     -0.00230

enjoy native data.table :)

geneorama · 2014-12-30T21:52:17Z

yi-tang

I just saw your response, and thank you! However, I still think it would still be useful (simpler and easy to read) to have a function that returns a data.table without the key; similar to the coredata function in the package zoo.

You make a great argument to simply rely on the native functionality, and this is probably a question of design. I think the coredata (or whatever) function would be nice, but I can see the other side here too.

BUT, I would much prefer dt[, .SD, .SDcol = -key(dt)] over dt[ , -which(colnames(dt) %in% key(dt)), with=FALSE], so thanks for that! I'll definitely use that over the original (but I would still personally prefer coredata(dt) or even keyless(dt).

-Gene

arunsrinivasan · 2015-01-03T20:45:34Z

Gene, I've marked as FR, but at the moment, I don't see a reason "for". It seems reasonable to me to write your own function, as it's a very special case of a subset operation. Are there other compelling cases where you need this?

jangorecki · 2015-06-06T20:07:24Z

@geneorama
What you suggests is a simple wrapper

keyless <- function(x) x[, .SD, .SDcol = -key(x)]

I understand there are cases where it is useful but data.table is still more focused on providing wide and efficient table data manipulation framework than direct function to achieve something basic as above. If you strongly believe it should be included in master you can try PR 👍

geneorama · 2015-06-09T11:35:51Z

After six months I seem to be the only one who thinks this is a good idea, so I'll just stick with a custom function

mattdowle · 2015-06-09T19:47:27Z

It doesn't seem like a bad idea to me. No objection to adding it. Not sure the best name. Would we need to select the key columns only sometimes as well - what would that function be called? key() already used so maybe keycolumns() and valuecolumns(), or keydata() and valuedata()? Hm.

geneorama · 2015-06-09T23:17:41Z

I was going to explain how I thought it was a bad idea... but my rechanged (?) my mind, and the example I worked out turned out to validate my original suggestion.

I think it could be confusing with .SDcols but it could be pretty useful otherwise.

This is an example of a pretty typical workflow for me;

EDIT: Also, I called it dekey... but I don't love that name either. You wouldn't want a devalue function, right? The zoo library uses coredata, which I don't like but can't beat.

library(data.table)
set.seed(1)
data_full <- data.table(mykey = letters,
                        group = c(rep("train", 10), rep("test", 16)),
                        x1 = rnorm(26), x2 = rnorm(26), x3 = rnorm(26), x4 = rnorm(26), 
                        x5 = rnorm(26), x6 = rnorm(26), x7 = rnorm(26), x8 = rnorm(26), 
                        y = sample(c(0,1), 26,replace=T), 
                        key = c("mykey", "group"))
dekey <- function(x) x[, .SD, .SDcol = -key(x)]

## Regress on some different column subsets
## Perhaps create copies of the subsets for future plotting and analysis 
d1 <- data_full[ , list(x2,x4,x6,x8,y), keyby=list(mykey, group)]
d2 <- data_full[ , list(x1,x3,x5,y), keyby=list(mykey, group)]

glm1 <- glm(y ~ ., data = dekey(d1[group=="test"]), family = "binomial")
glm2 <- glm(y ~ ., data = dekey(d2[group=="test"]), family = "binomial")

## To create a data.table of predictions the keys have to be added back,
## and we're relying on the data being in the same order
pred1 <- data.table(data_full[ , list(mykey, group)],
                    yhat = predict(glm1, data_full),
                    key = c("mykey", "group"))
pred2 <- data.table(data_full[ , list(mykey, group)],
                    yhat = predict(glm2, data_full),
                    key = c("mykey", "group"))

## Merge in predictions as needed
data_full[pred1]
data_full[pred2]
## Merge in predictions as needed e.g. for plotting
library(ggplot2)
ggplot(data_full[pred1]) + aes(x2, yhat, colour = group) + geom_point(size=9)
ggplot(data_full[pred2]) + aes(x2, yhat, colour = group) + geom_point(size=9)

raneameya · 2019-02-28T00:02:00Z

How about getDT? Would it be a good idea to have one function with the following arguments -

x: The data.table.
i: Rows to be subset, NULL by default indicating all rows.
j: Can be a character vector of column names or integer vector of column positions or one of "key" or "value".

joshhwuu · 2024-07-30T23:01:21Z

Quick follow-up on this issue, does anyone have suggestions on how to best close this issue?

geneorama · 2024-07-31T01:19:07Z

I opened the issue to see what people thought, and a decade later I think it's safe to close the polls.

arunsrinivasan added the feature request label Jan 3, 2015

geneorama closed this as completed Jun 9, 2015

mattdowle reopened this Jun 9, 2015

jangorecki added the beginner-task label Jan 29, 2019

jangorecki changed the title ~~[Request] Return a data.table without the key~~ Return a data.table without the key Apr 6, 2020

joshhwuu mentioned this issue May 27, 2024

Master List of data.table Issues for GSoC '24 (Josh) joshhwuu/gsoc-2024#1

Open

11 tasks

joshhwuu mentioned this issue Jun 10, 2024

Function wrappers to get a dt without its keys or just the keys #6175

Closed

1 task

geneorama closed this as completed Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return a data.table without the key #981

Return a data.table without the key #981

geneorama commented Dec 8, 2014

yitang commented Dec 8, 2014

geneorama commented Dec 30, 2014

arunsrinivasan commented Jan 3, 2015

jangorecki commented Jun 6, 2015

geneorama commented Jun 9, 2015

mattdowle commented Jun 9, 2015

geneorama commented Jun 9, 2015

raneameya commented Feb 28, 2019

joshhwuu commented Jul 30, 2024

geneorama commented Jul 31, 2024

Return a data.table without the key #981

Return a data.table without the key #981

Comments

geneorama commented Dec 8, 2014

Example function

Example usage:

yitang commented Dec 8, 2014

geneorama commented Dec 30, 2014

arunsrinivasan commented Jan 3, 2015

jangorecki commented Jun 6, 2015

geneorama commented Jun 9, 2015

mattdowle commented Jun 9, 2015

geneorama commented Jun 9, 2015

raneameya commented Feb 28, 2019

joshhwuu commented Jul 30, 2024

geneorama commented Jul 31, 2024