# Parallelism in R

R offers a number of alternative ways to exploit multiple cores and multiple machines.

In many ways, the support for parallel execution in R is richer and more flexible than in python

# R `parallel` package

For R users already using `apply`-type methods that "just work".  These methods have all of advantages (compact, terse syntax) and disadvantages (compact, terse syntax) as their standard library serial versions.

In [2]:
require(foreach)
require(parallel)
require(bit64)
require(data.table)

The following command sets us up to use the simple multi-core (MC) or parallel backend

on Windows use doParallel, on linux or OSX use doMC

In [3]:
require(doMC)
registerDoMC()
#require(doParallel)
#registerDoParallel()

Loading required package: doMC
Loading required package: iterators


Read in a data table...

In [4]:
d = fread('dat/2005_2009_ver2_42065_synth_people.txt')

In [11]:
head(d)

p_id,hh_id,serialno,stcotrbg,age,sex,race,sporder,relate,school_id,workplace_id
416175660,261526469,2005000002176,420659503002,86,2,1,1,0,,
416175661,261526469,2005000002176,420659503002,92,1,1,2,1,,
416175676,261533970,2005000002176,420659508002,86,2,1,1,0,,
416175677,261533970,2005000002176,420659508002,92,1,1,2,1,,
416175678,261526897,2005000002176,420659503003,86,2,1,1,0,,
416175679,261526897,2005000002176,420659503003,92,1,1,2,1,,


In [6]:
sample(nrow(d),10)

# Serial example

A nice feature of `foreach` is that you can easily test in serial and then convert to parallel by substituting `%dopar%`

In [21]:
foreach (i=seq(10)) %do% {
    mean(d[sample(nrow(d), 10)]$age)
}

# Parallel example

The default behavior of foreach is to return a `list`.  This can be changed by altering the `.combine` function (in this case, use the `c()` function to concatenate into a vector

In [22]:
foreach (i=seq(10), .combine=c) %dopar% {
    mean(d[sample(nrow(d), 10)]$age)
}

Next: [Parallel interaction with system calls](Parallel interaction with system calls.ipynb)