Clone this wiki locally
data.table is one of the 9,800 add-on packages for the programming language R which is popular in these fields. It provides a high-performance version of base R's
data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.
Other features include :
fast and friendly delimited file reader:
?fread. It accepts system commands directly (such as
gunzip), has other convenience features for small data and is now parallelized in dev (announced here).
fast and parallelized file writer:
?fwriteannounced here and on CRAN in Nov 2016.
- parallelized row subsets - See this benchmark for timings
- fast aggregation of large data; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
- fast add/update/delete columns by reference by group using no copies at all
- fast ordered joins; e.g. rolling forwards, backwards, nearest and limited staleness
- fast overlapping range joins; similar to
findOverlapsfunction from IRanges/GenomicRanges Bioconductor packages, but not limited to genomic (integer) intervals.
- fast non-equi (or conditional) joins, i.e., joins using operators
>, >=, <, <=as well, available from v1.9.8+
- a fast primary ordered index; e.g.
automatic secondary indexing; e.g.
DT[col %in% vals,]
- fast and memory efficient combined join and group by;
- fast reshape2 methods (dcast and melt) without needing reshape2 and its dependency chain installed or loaded
- group summary results may be many rows (e.g. first and last row by group) and each cell value may itself be a vector/object/function (e.g. unique ids by group as a list column of varying length vectors - this is pretty printed with commas)
- special symbols built-in for convenience and raw speed by avoiding the overhead of function calls:
- any R function from any R package can be used in queries not just the subset of functions made available by a database backend
- has no dependencies at all other than base R itself, for simpler production/maintenance
- the R dependency is as old as possible for as long as possible and we test against that version; e.g., v1.9.8 released on 25-Nov-2016 bumped the dependency up from 4.5 year old R 2.14.0 to 3 year old R 3.0.0.
Version 1.0 was released to CRAN in 2006. In June 2014 we moved from R-Forge to GitHub.
Guidelines for filing issues / pull requests: Contribution Guidelines.
As of 30 Dec 2016, data.table was the 3rd largest Stack Overflow tag about an R package, the 8th most starred R package on GitHub, had 321 CRAN and Bioconductor packages using it and was the #1 most directly downloaded R package on RStudio's CRAN mirror.