Matt Dowle edited this page Jul 20, 2018 · 132 revisions

v1.11.5 (dev) =>   Linux/Mac:   Windows:
Latest news: NEWS
New presentations July 2018. Click Videos&Slides in sidebar =>

data.table is one of the 9,800 add-on packages for the programming language R which is popular in these fields. It provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.


These queries can be chained together just by adding another one on the end:
See data.table compared to dplyr on Stack Overflow and Quora.

Other features include :

  • fast and friendly delimited file reader: ?fread. It accepts system commands directly (such as grep and gunzip), has other convenience features for small data and is now parallelized on CRAN May 2018 and presented earlier here.
  • fast and parallelized file writer: ?fwrite announced here and on CRAN in Nov 2016.
  • parallelized row subsets - See this benchmark for timings
  • fast aggregation of large data; e.g. 100GB in RAM (see benchmarks on up to two billion rows)
  • fast add/update/delete columns by reference by group using no copies at all
  • fast ordered joins; e.g. rolling forwards, backwards, nearest and limited staleness
  • fast overlapping range joins; similar to findOverlaps function from IRanges/GenomicRanges Bioconductor packages, but not limited to genomic (integer) intervals.
  • fast non-equi (or conditional) joins, i.e., joins using operators >, >=, <, <= as well, available from v1.9.8+
  • a fast primary ordered index; e.g. setkey(DT,col1,col2)
  • automatic secondary indexing; e.g. DT[col==val,] and DT[col %in% vals,]
  • fast and memory efficient combined join and group by; by=.EACHI
  • fast reshape2 methods (dcast and melt) without needing reshape2 and its dependency chain installed or loaded
  • group summary results may be many rows (e.g. first and last row by group) and each cell value may itself be a vector/object/function (e.g. unique ids by group as a list column of varying length vectors - this is pretty printed with commas)
  • special symbols built-in for convenience and raw speed by avoiding the overhead of function calls: .N, .SD, .I, .GRP and .BY
  • any R function from any R package can be used in queries not just the subset of functions made available by a database backend
  • has no dependencies at all other than base R itself, for simpler production/maintenance
  • the R dependency is as old as possible for as long as possible and we test against that version; e.g., v1.9.8 released on 25-Nov-2016 bumped the dependency up from 4.5 year old R 2.14.0 to 3 year old R 3.0.0.

Version 1.0 was released to CRAN in 2006. In June 2014 we moved from R-Forge to GitHub.

Guidelines for filing issues / pull requests: Contribution Guidelines.

As of 30 Dec 2016, data.table was the 3rd largest Stack Overflow tag about an R package, the 8th most starred R package on GitHub, had 321 CRAN and Bioconductor packages using it and was the #1 most directly downloaded R package on RStudio's CRAN mirror.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.