Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Clone this wiki locally
v1.11.5 (dev) => Linux/Mac: Windows:
Latest news: NEWS
New presentations July 2018. Click Videos&Slides in sidebar =>
data.table is one of the 9,800 add-on packages for the programming language R which is popular in these fields. It provides a high-performance version of base R's
data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.
Other features include :
fast and friendly delimited file reader:
?fread. It accepts system commands directly (such as
gunzip), has other convenience features for small data and is now parallelized on CRAN May 2018 and presented earlier here.
fast and parallelized file writer:
?fwriteannounced here and on CRAN in Nov 2016.
- parallelized row subsets - See this benchmark for timings
- fast aggregation of large data; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
- fast add/update/delete columns by reference by group using no copies at all
- fast ordered joins; e.g. rolling forwards, backwards, nearest and limited staleness
- fast overlapping range joins; similar to
findOverlapsfunction from IRanges/GenomicRanges Bioconductor packages, but not limited to genomic (integer) intervals.
- fast non-equi (or conditional) joins, i.e., joins using operators
>, >=, <, <=as well, available from v1.9.8+
- a fast primary ordered index; e.g.
automatic secondary indexing; e.g.
DT[col %in% vals,]
- fast and memory efficient combined join and group by;
- fast reshape2 methods (dcast and melt) without needing reshape2 and its dependency chain installed or loaded
- group summary results may be many rows (e.g. first and last row by group) and each cell value may itself be a vector/object/function (e.g. unique ids by group as a list column of varying length vectors - this is pretty printed with commas)
- special symbols built-in for convenience and raw speed by avoiding the overhead of function calls:
- any R function from any R package can be used in queries not just the subset of functions made available by a database backend
- has no dependencies at all other than base R itself, for simpler production/maintenance
- the R dependency is as old as possible for as long as possible and we test against that version; e.g., v1.9.8 released on 25-Nov-2016 bumped the dependency up from 4.5 year old R 2.14.0 to 3 year old R 3.0.0.
Version 1.0 was released to CRAN in 2006. In June 2014 we moved from R-Forge to GitHub.
Guidelines for filing issues / pull requests: Contribution Guidelines.
As of 30 Dec 2016, data.table was the 3rd largest Stack Overflow tag about an R package, the 8th most starred R package on GitHub, had 321 CRAN and Bioconductor packages using it and was the #1 most directly downloaded R package on RStudio's CRAN mirror.