In [1]:
library(data.table)
library(magrittr)
library(ggplot2)
mydata = fread("https://raw.githubusercontent.com/arunsrinivasan/satrdays-workshop/master/flights_2014.csv")

## Examples for Practise

#### Q1. Calculate total number of rows by month and then sort on descending order


In [5]:
mydata[, .N, by = month] [order(-N)]

The <b>``.N operator``</b> is used to find count.

#### Q2. Find top 3 months with high mean arrival delay

In [6]:

mydata[, .(mean_arr_delay = mean(arr_delay, na.rm = TRUE)), by = month][order(-mean_arr_delay)][1:3]

#### Q3. Find origin of flights having average total delay is greater than 20 minutes</span><br />


In [7]:

mydata[, lapply(.SD, mean, na.rm = TRUE), .SDcols = c("arr_delay", "dep_delay"), by = origin][(arr_delay + dep_delay) > 20]


#### Q4. Extract average of arrival and departure delays for carrier == 'DL' by 'origin' and 'dest' variables



In [8]:
mydata[carrier == "DL", 
        lapply(.SD, mean, na.rm = TRUE),
        by = .(origin, dest),
        .SDcols = c("arr_delay", "dep_delay")]

#### Q5. Pull first value of '``air_time``' by '``origin``' and then sum the returned values when it is greater than 300.

In [9]:
mydata[, .SD[1], .SDcols="air_time", by=origin][air_time > 300, sum(air_time)]

## Endnotes

This package provides a one-stop solution for data wrangling in R. It offers two main benefits - less coding and lower computing time. However, it's not a first choice of some of R programmers. Some prefer <b>dplyr</b> package for its simplicity. I would recommend learn both the packages. Check out <a href="http://www.listendata.com/2016/08/dplyr-tutorial.html" target="_blank"><b>dplyr tutorial</b></a>. If you are working on data having size less than 1 GB, you can use dplyr package. It offers decent speed but slower than data.table package.</div>
</div>

### Gotchas with data.table()
DTs inherit from DF

* Not completely interchangeable for DF
* Compatibility usually fixed by DF casting
* Binary search relies on keys
* Proper use needs to be coded (e.g. \texttt{setkey()} function)
* Code breaks (1.8.x particularly notorious)
* New functionality detailed in NEWS file



### How to Get Started

data.table page on R-forge (***datatable.r-forge.r-project.org***)

* Documentation biggest issue
* Read quick introduction vignette (bit lightweight)
* FAQ is main manual
* Good questions on StackOverflow (\url{stackoverflow.com}) -- tag ``data.table``
* As of Feb 2015, proper vignettes being written (and look good)

