Matt Dowle edited this page Aug 7, 2018 · 219 revisions

These articles either focus on data.table (bold) or mention/use it (perhaps only briefly and you may need to search the article for "data.table"), ordered by date. If you know of an article that may be of interest to others, please add it here. You can also search all articles from the R blogosphere since c. 2009 on There is no filter applied: if the article exists and mentions data.table, positively or negatively, it is included on this page. Please watch out for benchmarks measured in microseconds on small data. Comparisons on such small scales often do not hold when scaled up to larger data because they over-represent call overhead. A test repetition count (e.g. ntimes) of 100 or more is often an indication that the test data size is too small. Please check that setkey() has been used and its time reported separately.

Date Title Author
2018.08 Meta-packages, nails in CRAN’s coffin John Mount
2018.07 EARL London interviews – Patrik Punco, NOZ Medien Mango Solutions
2018.07 Speed up your R Work John Mount
2018.06 R and Data – When Should we Use Relational Databases? Claude Seidman
2018.06 Re-referencing factor levels to estimate standard errors when there is interaction turns out to be a really simple solution Keith Goldfeld
2018.06 Most Starred R Packages on GitHub Steven Mortimer
2018.06 Melt and Cast The Shape of Your Data-Frame: Exercises sindri
2018.06 Sharpening The Knives in The data.table Toolbox: [Exercises] [Solutions] sindri
2018.06 rqdatatable: rquery Powered by data.table John Mount
2018.04 An R vlookup? Not so silly idea Hanjo Oden
2018.04 Benchmarking the six most used manipulations for data.tables in R Opremic
2018.04 Down the AUC Rabbit Hole and into Open Source: Part 2 Michael Frasco
2018.04 Down the AUC Rabbit Hole and into Open Source: Part 1 Michael Frasco
2018.03 pandas vs. data.table – A study of data-frames – Part 2 Tobias Krabel
2018.02 pandas vs. data.table – A study of data-frames Christian Moreau
2018.02 Julia vs R vs Python: string-sort performance + an unfinished journey to optimizing Julia's performance ZJ
2018.02 dplyr, (mc)lapply, for-loop and speed Mike Spencer
2018.02 Speeding up spatial analyses by integrating sf and data.table: a test case Lorenzo Busetto
2018.02 Packages for Getting Started with Time Series Analysis in R Abraham Mathew
2018.02 DataExplorer: Fast Data Exploration With Minimum Code Boxuan Cui
2018.01 Supercharge your R code with wrapr John Mount
2018.01 Tidyverse and data.table, sitting side by side… and then base R walks in Iñaki Úcar
2018.01 Tidyverse and data.table, sitting side by side (Part 1) Dirk Eddelbuettel
2018.01 Base R can be Fast John Mount
2018.01 Lightning fast serialization of datasets using the fst package Mark Klik
2018.01 rquery: Fast Data Manipulation in R John Mount
2017.12 A tour of the data.table package by creator Matt Dowle David Smith
2017.12 More Pipes in R John Mount
2017.12 Team Rtus wins Munich Re Datathon with mlr Jann Goschenhofer
2017.12 Correlated log-normal chain-ladder model Markus Gesmann
2017.11 How we built a Shiny App for 700 users Olga Mierzwa-Sulima
2017.11 Using data.table and Rcpp to scale geo-spatial analysis with sf Tim Appelhans
2017.11 Creating integer64 and nanotime vectors in C++ Dirk Eddelbuettel
2017.10 The Impressive Growth of R David Robinson
2017.10 Data.Table by Example – Part 3 atmathew
2017.09 Speed of data manipulations in Julia vs R ZJ
2017.09 Data.Table by Example – Part 2 atmathew
2017.09 Data.Table by Example – Part 1 atmathew
2017.09 Beyond the basics of data.table: Smooth data exploration Sindri
2017.09 Strategies to Speed-up R Code Selva Prabhakaran
2017.08 Is the Hadleyverse the only option? Billy Fung
2017.08 Basics of data.table: Smooth data exploration Sindri
2017.08 Polygenic Risks Scores with data.table in R Sahir Rai Bhatnagar
2017.08 July(ish) Update John MacKintosh
2017.08 R for System Adminstration Dirk Eddelbuettel
2017.06 data.table tutorial (with 50 examples) Deepanshu Bhalla
2017.06 The data.table R Package Cheat Sheet Karlijn Willems
2017.06 Data Manipulation with data.table (part 2) Biswarup Ghosh
2017.06 R in pRoduction: theRe be dRagons! Tim Sweetser and Kyle Schmaus
2017.06 Improving Zillow’s Zestimate with 36 Lines of Code Eduardo Ariño de la Rubia
2017.06 Data Manipulation with data.table (part 1) Biswarup Ghosh
2017.05 plotly 4.7.0 now on CRAN Carson Sievert
2017.05 R⁶ — Idiomatic (for the People) Bob Rudis
2017.05 Reading/writing biggish data, revisited Karl Broman
2017.05 dplyr in context John Mount
2017.05 Everyone knows that loops in R are to be avoided but vectorization is not always possible Keith Goldfeld
2017.04 R code to accompany Real-World Machine Learning (Chapter 6): Exploring NYC Taxi Data Paul Adamson
2017.04 Fast data loading from files to R Olga Mierzwa-Sulima
2017.03 Data Manipulation with Python Pandas and R Data.Table Fisseha Berhane
2017.03 Fast data lookups in R: dplyr vs data.table Marek Rogala
2017.02 Fitting logistic regression on 100gb dataset on a laptop Dmitriy Selivanov
2017.02 Large data, feature hashing and online learning Dmitriy Selivanov
2017.02 Moving largish data from R to H2O - spam detection with Enron emails Peter Ellis
2017.01 Discover your data (XGBoost vignette) Tianqi Chen, Tong He, Michaël Benesty, Yuan Tang
2017.01 fst: Fast serialization of R data frames David Smith
2017.01 fst: Lightning Fast Serialization of Data Frames Mark Klik
2017.01 R to the Rescue John Mackintosh
2016.12 Using R to prevent food poisoning in Chicago David Smith
2016.12 Behind the scenes of CRAN Matt Dowle
2016.12 nanotime 0.0.1: New package for Nanosecond Resolution Time for R Dirk Eddelbuettel
2016.12 Does replyr::let work with data.table? John Mount
2016.12 data.table: Where Have You Been All My Life? JoAnn Rudd Alvarez
2016.12 Organize your data manipulation in terms of “grouped ordered apply” John Mount
2016.12 Comparing a MySQL Query with a Data Table in R Douglas Rice
2016.11 data.table: squeeze the maximum speed when using data in R Stanislav Chistyakov
2016.10 Data Wrangling: Quick Guide for dplyr, data.table and R build-in data.frame Vincent Cao
2016.09 This Machine Learning Project on Imbalanced Data Can Add Value to Your Resume Manish Saraswat
2016.09 Rolling a join Will Rogers
2016.07 Winning approach of the Facebook V Kaggle competition Tom Van de Wiele
2016.07 New release of partools package Norm Matloff
2016.07 Bad Coder, Bad Coder! Norm Matloff
2016.06 Intro to the data.table package Steve Pittard
2016.06 Boost Your Data Munging with R Jan Gorecki
2016.06 Improving Season on Season James P. Curley
2016.06 Understanding data.table Rolling Joins Robert Norberg
2016.05 From a (set.)seed grows a mighty dataset Jonathan Carroll
2016.05 Feather: fast, interoperable data import/export for R David Smith
2016.05 Best packages for data manipulation in R Fisseha Berhane
2016.05 My Two favorite Packages for Data Manipulation in R Fisseha Berhane
2016.05 Use H2O and data.table to build models on large data sets in R Manish Saraswat
2016.05 The R Data I/O Shootout Eduardo Ariño de la Rubia
2016.05 Red herring bites Matt Dowle
2016.05 data.table() vs data.frame() – Learn to work on large data sets in R Manish Saraswat
2016.04 Feather: it's about metadata Wes McKinney
2016.04 Fast csv writing for R Matt Dowle
2016.04 I'll Keep Using R Michael Ekstrand
2016.04 data.table objects should not be considered data.frame instances in R [retracted] John Mount
2016.04 Learning R in Seven Simple Steps Martijn Theuwissen
2016.04 Collapsing lists of data.frames with data.table Steph Locke
2016.04 Working with databases in R Fisseha Berhane
2016.03 Data table exercises: keys and subsetting Han de Vries
2016.03 Performing SQL selects on R data frames Fisseha Berhane
2016.02 Read from hdfs with R. Brief overview of SparkR Dmitriy Selivanov
2016.02 Up to code? An algorithm is helping Chicago health officials predict restaurant safety violations (featured on TV at 06:40). [Tweet] [Code] PBS NewsHour
2016.01 Strategies to Speedup R Code Selva Prabhakaran
2015.12 Our R package roundup 2015 Christoph Safferling
2015.12 Who’s downloading the forecast package? Rob J Hyndman
2015.12 Solve common R problems efficiently with data.table Jan Gorecki
2015.11 Efficient aggregation (and more) using data.table David Kun
2015.11 Scaling data.table with index Jan Gorecki
2015.11 H2O World 2015 – Day 2 Highlights Anmol Rajpurohit, KDnuggets
2015.11 H2O World 2015 Joseph Rickert
2015.11 raises $20m series B to capitalize on rapid open source machine-learning growth Matt Aslett, 451 Research
2015.10 R and Impala: it's better to KISS than using Java Gergely Daroczi
2015.10 R: data.table – Finding the maximum row Mark Needham
2015.09 Querying a 20 million line CSV file – data.table vs data frame Mark Needham
2015.09 Data ergonomics with data.table, iHub Nairobi, with supporting materials Henk Harmsen
2015.09 R Stories from the Trenches [Video] [Slides] Szilard Pafka
2015.09 Advanced Tips and Tricks with data.table Andrew Brooks
2015.08 data.table cookbook Steph Locke
2015.07 Overlap joins in R: a speed comparison with packages sqldf and data.table Zev Ross
2015.06 Data Warehousing with R Jan Gorecki
2015.06 Auditing data transformation Jan Gorecki
2015.06 Back from R/Finance in Chicago Markus Gesmann
2015.05 Fast data munging in R Alexander Konduforov
2015.05 No THIS Is How You dplyr and data.table! Jeffrey Horner
2015.05 Comparing data frames, data.table and dplyr with random walks David Smith
2015.05 Working with "large" datasets, with dplyr and data.table Arthur Charpentier
2015.04 Comparing the execution time between foverlaps and findOverlaps [data.table vs GenomicRanges] Katarzyna Wręczycka
2015.04 Open Source Business Intelligence: Then and Now Steve Miller
2015.04 Mapping Flows in R with data.table and lattice Oscar Perpiñán Lamigueiro
2015.03 Need for Processing Speed: data.table OpenAnalytics
2015.03 Getting Data From An Online Source Robert Norberg
2015.02 A data.table R tutorial by DataCamp: intro to DT[i, j, by] DataCamp
2015.02 Minimal example for joining data.tables Markus Gesmann
2015.01 Using the microbenchmark package to compare the execution time of R expressions Stephen Turner
2015.01 Sessionizing Log Data Using data.table Randy Zwitch
2015.01 R in Business Intelligence Jan Gorecki
2014.12 dplyr and a very basic benchmark Szilard Pafka
2014.12 JOINing data in R using data.table Ronald Stalder
2014.12 Cheat Sheets for Data Science Steve Miller
2014.11 Partying R Style with Sqor Sports, R on Azure, and data.table Joseph Rickert
2014.11 The data.table Cheat Sheet DataCamp
2014.11 Some R Highlights from H20 World Joseph Rickert
2014.10 Complete data.table tutorial: data analysis the data.table way DataCamp
2014.10 data.table University Steve Miller
2014.10 Visualising the seasonality of Atlantic windstorms Markus Gesmann
2014.08 Scaling up data frames Ben Lorica
2014.08 data.table for R Grant Rettke
2014.08 MongoDB – State of the R Raffael Vogler
2014.08 VIDEO Matt Dowle's data.table talk from useR! 2014 Eduardo Ariño de la Rubia
2014.08 Pro Grammar and Devel Hoper Romain Francois
2014.08 Faster CSV Import with R Phill Clarke
2014.07 10 R Packages to Win Kaggle Competitions Xavier Conort
2014.07 R – Data.Table Rolling Joins Ben Gorman
2014.07 Dependencies of popular R packages Andrie de Vries
2014.07 2014 useR! conference, days 1-2 Karl Broman
2014.06 The joy of joining data.tables Markus Gesmann
2014.06 Concatenating a list of data frames Andrew
2014.05 R/Finance 2014 Steve Miller
2014.05 Working with large data sets in R - data.table and dcast Kamil Bartocha
2014.05 Reading large data tables in R Fabio Marroni
2014.04 Exploring US healthcare data Vik Paruchuri
2014.04 data.table vs dplyr in split apply combine style analysis Brodie G
2014.02 Dueling R and Python Followup Steve Miller
2014.02 Efficiency of Importing Large CSV Files in R statcompute
2014.01 Benchmark on baseball data: dplyr (0.1) and data.table (1.8.10) [tweet] Arun Srinivasan and Matt Dowle
2014.01 R: the good parts Jose Quesada
2014.01 Two of my favorite data.table features Brandon Le Beau
2014.01 When I use plyr/dplyr/data.table Educate-R
2013.12 Review: Kölner R Meeting 13 December 2013 Markus Gesmann
2013.09 A speed comparison of plyr, data.table and dplyr Jake Russ
2013.08 An R function like “order” from Stata Ananda Mahto
2013.07 Fig Data: 11 Tips on How to Handle Big Data in R (and 1 Bad Pun) Ulrich Atz
2013.07 A Bottom-up Start on Big Data Analytics Steve Miller
2013.06 Simulating Map-Reduce in R for Big Data Analysis Using Flights Data Jitender Aswani
2013.06 Improve The Efficiency in Joining Data with Index statcompute
2013.04 FasteR! HigheR! StrongeR! – A Guide to Speeding Up R Code for Busy People Noam Ross
2013.04 Using data.table for binning Oscar Perpiñán Lamigueiro
2013.03 RMark: data.table merge vs core merge Xachriel
2013.02 data.table or data.frame? DataParadigms
2013.01 Another Benchmark for Joining Two Data Frames statcompute
2013.01 Efficiecy of Extracting Rows from A Data Frame in R statcompute
2013.01 Efficiency in Joining Two Data Frames statcompute
2012.12 Surprising Performance of data.table in Data Aggregation Wensui Liu
2012.11 Data.table rocks! Data manipulation the fast way in R Markus Gesmann
2012.10 Generate a panel data.table or data.frame to fill with data Thiemo Fetzer
2012.06 Transforming subsets of data in R with by, ddply and data.table Markus Gesmann
2012.06 Access data quickly and easily: data.table package Anna Longari
2012.05 data.table 1.8.1 - Now allows numeric columns and big-number (via bit64) in keys! Branson Owen
2012.03 R code for Chapter 2 of Non-Life Insurance Pricing with GLM Allan Engelhardt
2012.02 Elegant & fast data manipulation with data.table Carl Boettiger
2012.01 Say it in R with "by", "apply" and friends Markus Gesmann
2011.08 Comparison of ave, ddply and data.table Paul Hiemstra
2011.04 Data Aggregation in R: plyr, sqldf and data.table Hayward Godwin
2011.03 Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ? altuna
2011.03 Fast(ish) extraction of exon locations from a BED12 file using data.table altuna
2011.03 data.table: an R package everyone should use Jason
2011.02 By-Group Processing, the R data.table and the Power of Open Source Steve Miller
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.