Skip to content
Matt Dowle edited this page Dec 18, 2018 · 38 revisions

This page is experimental. Here we collect the fringe public comments related to data.table, in date order. Fringe in the sense of peripheral or extreme but also subtle. These can often go quietly viral and gently sway a community over time. For those for who English is not their first language, sarcasm and jest are powerful tools sometimes on display here. We have always added all articles we are aware of to the articles page if they mention data.table (whether positive or negative) and will continue to do so. Even so, the sentiment of the articles page is overwhelming positive. The goal of this fringe page is to collect public comments (anything that is not an article, since that belongs on the articles page) with a bias towards the negative to aid potential new users in their quest to build a full unbiased picture of the data.table package.

Or, in other words, a problem shared is a problem halved.

18 Dec 2018

TFW when you first use 'bind_rows' instead of 'rbind' and the heavens open up and cherubs sing #rstats #tidyverse

Another example of RStudio/tidyverse getting the credit for a something I did first. Why can't they credit data.table? I am known for speaking up on the question of lack of credit, but I replied anyway expecting criticism :

[data.table::rbindlist] pre-dated bind_rows by years. But it's not in RStudio's interests to tell people that, is it. Most users don't care. I wasted my time.

This issue has built up over many years. It's continuing a pattern; a marketing strategy by tidyverse/RStudio which most users like and want.

13 Dec 2018 Oleksiy Anokhin

Dear @MattDowle , thanks for your attention. Again with all FULL RESPECT and ZERO INTENTION to undermine your great work! For many R users syntax is the key. IMHO data.table without more simple syntax reminds me a fantastic research report without good visualization. Warmly, OA

No comment really. Just logging it here as that's the point of this fringe page to collect negative opinions. He also made this interesting meme which I think captures quite well the situation.

12 Oct 2018 Roxana Noelia

What option is your favourite for data manipulation? Why? #R #data #rstats #rstatses #dplyr #datatable

data.table lost this poll of 82 votes by a large margin: 74% dplyr, 21% data.table. I retweeted it here to be fair not to ignore it. Note that the wrong data.table tag was used (#datatable should be #rdatatable). But polls like this work. They sway a community. The data.table project simply doesn't have the amount of sustained public support that dplyr does. Popularity is the most important metric for most people when choosing which project to support. Which is why popularity receives so much attention. Whereas creators, contributors and users of data.table tend to be independently minded, perhaps. I point to the Stack Overflow comparison here where data.table currently has 415 votes vs dplyr 318.

1 Sep 2018 Dorothy Bishop

the way R defaults to treating variables as factors causes me endless problems. just converted a numeric matrix to dataframe & all cols became factor. Are there any benefits of this, or is it just perversity of developers?

1 Sep 2018 Dorothy Bishop

There's also the way the default is to create the levels in alphabetic order. Whose bright idea was that?

I have things to say in this thread, but the derogatory language "perversity of developers" and "whose bright idea was that?" causes me to check-out. This kind of spear-throwing language works, though. It sways a community and people like it. Quietly choosing a better default (stringsAsFactors=FALSE) and having respect for the authors of R (as I did and do) does not work.

She is a professor of Oxford with 30k followers. Is she using the adjective "perverse" and "bright idea" with respect to Prof Brian Ripley also at Oxford, or would she change her language if she either i) knew the people she was referring to, and/or ii) subsequently discovered the reasons. I put my tin hat on and went with this reply.

11 May 2018 Matt Cowgill on Twitter

or do it in a slower, but more readable way with dplyr

8 May 2018 Matt Cowgill on Twitter

great to hear, that'll be simple to switch to dplyr::right_join() or whatever

Both were responses to Hugh Parsonage using data.table at Grattan Institute (see the threads on Twitter). The results were quoted several times in this Question Time article; i.e. a success for data.table that Matt Cowgill sought to shoot down. He has 7,600 followers at this time, more than twice mine. It's these kind of side-remarks that work very well and sway a community. Unless we spend more time demonstrating how rolling join was used and showing 2 secs vs 30 mins, the community will believe the more popular people speaking out against data.table, like Matt Cowgill. I filed issue 2862 to write up this work and compare and contrast to alternatives.

8 Mar 2018 Rachael Tatman on Twitter

Wickham: one of the reasons data.table is much faster than dplyr is because everything is modified in place, which gives me the heebie-jeebies

There is no hope if Wickham said that. What he says goes, for most people. Well, against my better judgement, I tried anyway and replied: part 1, part 2, part 3

23 Jan 2018 hrbrmstr on Twitter

I think the other bit is that data.table does (despite what some tribal members posit) whack some of the extract (i.e. [], [[]]) idioms. So some "data.frame" syntax does not work & causes confusion. That bit me when making widgets. I always normalize widget df input to data.frame

22 Jan 2018 hrbrmstr on Twitter

data.table is 👍 if you really need munging speed in R & can tolerate unreadable hieroglyphics. I’ll take tidyverse + ops in DBs any day.

To weigh any opinion you need to know what they do to know whether they do a similar thing to what you do and what their motivations are, first, before looking at the details of the claim. I have previously replied regarding the tribal adjective here on his article and here on twitter. Since he knows this is troubling to me, he continues to use the word to taunt. The derogatory tribal adjective is an ad hominem attack.

28 Aug 2017 Jose Manuel Vera‏ on Twitter

Use data.table for wrangling data without the ugly data.table syntax … #rstats

It doesn't feel good for your work to be called "ugly". My only response was to retweet it so that others see it and help/suggest. To engage to disagree (of course I disagree strongly) will waste time and likely end badly. To not engage leaves negative sentiment others will find. I have no idea how to handle people who use such hateful words for any work, let alone work that is offered freely.

25 April 2017 Adam T. Austin‏ on Twitter

Collaborating with someone who uses data.table. My loyalty to the #tidyverse grows greater by the second... cc @hadleywickham #rstats

This was copied to #rstats, retweeted by Hadley and intended to be widely seen. It is not pleasant to be denied the knowledge of why you're being criticized publicly. Unfortunately, this tactic does work and it does sway a community. Perhaps the tweeter or his collaborator have misunderstood something; we will never know. Replying to ask risks escalating and taking even more time. Time I can't make.

My response was to retweet this one from earlier in the month. I hadn't felt it was appropriate to retweet that before. Silly, isn't it.
8 April 2017 Thọ Duy Nguyễn‏ on Twitter

back to data.table after a long time with dplyr #rstats

25 Dec 2014 Hadley Wickham on Hacker News

Data tables are extremely fast but I think their concision makes it harder to learn and code that uses it is harder to read after you've written it. It's very reminiscent of APL.

Our response: See the hacker news item and comparing dplyr to data.table on Stack Overflow.
The word reminiscent was used to convey the notion of-the-past and is meant as criticism. Note that Hadley was responding to a positive post about data.table on Hacker News. The original item was :

Anyone doing R comparisons should use data.table instead of data.frame. More so for benchmarks. data.table is the best data structure/query language I have found in my career. It's leading the way in The R world, and in my way, in all the data-focused languages.

Hadley sought to shoot down this positive sentiment. His negative sentiment is what has stuck in the community rather than the original post which was positive. That's what works.

26 Jun 2014 Hadley Wickham on Stack Overflow

Also read.csv() reads everything into a big character matrix and then modifies that, does fread() do the same thing? In fastread we guess column types and then coerce as we go to avoid a complete copy of the df.

The Stack Overflow question is "Reason behind speed of fread in data.table package in R" and an implicit compliment to data.table. That's the context. The comment is a subtle way to i) create doubt about fread and ii) announce his new fastread package which had not been known before that. fastread subsequently became readr.