Keys

Stephanie Locke edited this page Apr 4, 2016 · 3 revisions

Glossary

  • KEY
  • COMPOSITE KEY

Key setting

You can set keys on data.tables to facilitate joins, improve querying speed, and to sort your data. You can set a key as you create a data.table with data.table(), and you can also set keys with dedicated functions, chiefly setkey() and set2key().

The iris dataset will be used throughout.

library(data.table)
head(setDT(copy(iris)))
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          5.1         3.5          1.4         0.2  setosa
## 2:          4.9         3.0          1.4         0.2  setosa
## 3:          4.7         3.2          1.3         0.2  setosa
## 4:          4.6         3.1          1.5         0.2  setosa
## 5:          5.0         3.6          1.4         0.2  setosa
## 6:          5.4         3.9          1.7         0.4  setosa

data.table()

You can create keys as you create data.tables.

data.table()

When you make a data.table object via data.table() there is an argument key=. key= allows you to set a key as you produce a data.table - this will perform sorting like setkey() would.

irisDT<-data.table(iris, key="Sepal.Width")
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1:          5.0         2.0          3.5         1.0 versicolor
## 2:          6.0         2.2          4.0         1.0 versicolor
## 3:          6.2         2.2          4.5         1.5 versicolor
## 4:          6.0         2.2          5.0         1.5  virginica
## 5:          4.5         2.3          1.3         0.3     setosa
## 6:          5.5         2.3          4.0         1.3 versicolor

setDT()

Alternatively, the fast setting of a data.frame to data.table function setDT() also has a key= argument.

irisDT<-setDT(copy(iris), key="Sepal.Width")
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1:          5.0         2.0          3.5         1.0 versicolor
## 2:          6.0         2.2          4.0         1.0 versicolor
## 3:          6.2         2.2          4.5         1.5 versicolor
## 4:          6.0         2.2          5.0         1.5  virginica
## 5:          4.5         2.3          1.3         0.3     setosa
## 6:          5.5         2.3          4.0         1.3 versicolor

setkey()

setkey() assigns a key and performs physical sorting on the table.

irisDT<-setDT(copy(iris))
setkey(irisDT,Sepal.Length)
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          4.3         3.0          1.1         0.1  setosa
## 2:          4.4         2.9          1.4         0.2  setosa
## 3:          4.4         3.0          1.3         0.2  setosa
## 4:          4.4         3.2          1.3         0.2  setosa
## 5:          4.5         2.3          1.3         0.3  setosa
## 6:          4.6         3.1          1.5         0.2  setosa

It's possible to make a composite key:

irisDT<-setDT(copy(iris))
setkey(irisDT, Sepal.Length, Sepal.Width)
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          4.3         3.0          1.1         0.1  setosa
## 2:          4.4         2.9          1.4         0.2  setosa
## 3:          4.4         3.0          1.3         0.2  setosa
## 4:          4.4         3.2          1.3         0.2  setosa
## 5:          4.5         2.3          1.3         0.3  setosa
## 6:          4.6         3.1          1.5         0.2  setosa

The setkey() function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant setkeyv():

irisDT<-setDT(copy(iris))
key<-c("Sepal.Width","Sepal.Length")
setkeyv(irisDT,key )
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1:          5.0         2.0          3.5         1.0 versicolor
## 2:          6.0         2.2          4.0         1.0 versicolor
## 3:          6.0         2.2          5.0         1.5  virginica
## 4:          6.2         2.2          4.5         1.5 versicolor
## 5:          4.5         2.3          1.3         0.3     setosa
## 6:          5.0         2.3          3.3         1.0 versicolor

set2key()

set2key() assigns a key and does not perform physical sorting on the table.

irisDT<-setDT(copy(iris))
set2key(irisDT,Sepal.Length)
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          5.1         3.5          1.4         0.2  setosa
## 2:          4.9         3.0          1.4         0.2  setosa
## 3:          4.7         3.2          1.3         0.2  setosa
## 4:          4.6         3.1          1.5         0.2  setosa
## 5:          5.0         3.6          1.4         0.2  setosa
## 6:          5.4         3.9          1.7         0.4  setosa

It's possible to make a composite key:

irisDT<-setDT(copy(iris))
set2key(irisDT, Sepal.Length, Sepal.Width)
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          5.1         3.5          1.4         0.2  setosa
## 2:          4.9         3.0          1.4         0.2  setosa
## 3:          4.7         3.2          1.3         0.2  setosa
## 4:          4.6         3.1          1.5         0.2  setosa
## 5:          5.0         3.6          1.4         0.2  setosa
## 6:          5.4         3.9          1.7         0.4  setosa

The set2key() function takes named arguments but sometimes you may want to dynamically pass in column names. For this you can use the "v" variant set2keyv():

irisDT<-setDT(copy(iris))
key<-c("Sepal.Width","Sepal.Length")
set2keyv(irisDT,key )
head(irisDT)
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1:          5.1         3.5          1.4         0.2  setosa
## 2:          4.9         3.0          1.4         0.2  setosa
## 3:          4.7         3.2          1.3         0.2  setosa
## 4:          4.6         3.1          1.5         0.2  setosa
## 5:          5.0         3.6          1.4         0.2  setosa
## 6:          5.4         3.9          1.7         0.4  setosa

Rmd file