# Accessing R from Google Cloud Datalab

[iPython/Juypter](http://ipython.org/) versus [R](https://www.r-project.org/), that old chestnut.  It's all about preference.  I have to admit I prefer iPython and I really like [Google Cloud Datalab](https://cloud.google.com/datalab/) because it's fully managed and I spend my time writing code...  However, R has some really useful libraries, especially for financial analysis, so I love the idea of accessing R on demand from iPython using [rpy2](http://rpy.sourceforge.net/).  Here's my take on making that happen in Datalab.

First, make some assumptions about the VM running Datalab, then use those to make some updates in preparation for installing R and rpy2.

In [1]:
!apt-get upgrade -y
!apt-get update

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  apt base-files dpkg dpkg-dev libapt-pkg4.12 libc-bin libdpkg-perl
  libfreetype6 libfreetype6-dev libgssapi-krb5-2 libicu52 libk5crypto3
  libkrb5-3 libkrb5support0 libldap-2.4-2 libpng12-0 libpng12-dev libsasl2-2
  libsasl2-modules-db libssl1.0.0 libsystemd0 libudev1 libxml2 libxml2-dev
  linux-libc-dev multiarch-support openssl systemd systemd-sysv tzdata udev
  unzip
32 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 26.5 MB of archives.
After this operation, 280 kB of additional disk space will be used.
Get:1 http://security.debian.org/ jessie/updates/main dpkg amd64 1.17.26 [2991 kB]
Get:2 http://httpredir.debian.org/debian/ jessie/main base-files amd64 8+deb8u2 [77.9 kB]
Get:3 http://httpredir.debian.org/debian/ jessie/main libc-bin amd64 2.19-18+deb8u1 [1284 kB]
Get:4 http://httpredir.debia

In [2]:
!if [ `grep 'cran' /etc/apt/sources.list | wc -l` = 0 ]; then echo "deb http://cran.rstudio.com/bin/linux/debian lenny-cran/" >> /etc/apt/sources.list; fi
!apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9

Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --homedir /tmp/tmp.coQPu1407i --no-auto-check-trustdb --trust-model always --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-security-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-jessie-stable.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-squeeze-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-squeeze-stable.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-wheezy-automatic.gpg --keyring /etc/apt/trusted.gpg.d/debian-archive-wheezy-stable.gpg --keyserver keyserver.ubuntu.com --recv-keys E084DAB9
gpg: requesting key E084DAB9 from hkp server keyserver.ubuntu.com
gpg: key E084DAB9: "Michael Rutter <marutter@gmail.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1


Second, install R.

In [3]:
!apt-get install -y r-base r-base-dev

Reading package lists... Done
Building dependency tree       
Reading state information... Done
r-base is already the newest version.
r-base-dev is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.


Third, install rpy2.

In [4]:
!pip install rpy2

Cleaning up...


Fourth, and last, activate the R cell magic.

In [5]:
%load_ext rpy2.ipython

Now we're ready to start accessing R from Datalab.

We'll do something simple just to show the plumbing is in place...

In [15]:
import rpy2.interactive as r
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector

Install ggplot2 if required.

In [19]:
if not rpackages.isinstalled('ggplot2'):
  utils = rpackages.importr('utils')
  utils.chooseCRANmirror(ind=99)
  utils.install_packages(StrVector('ggplot2'))

In [20]:
%%R
library(ggplot2)
head(mtcars)


Error in library(ggplot2) : there is no package called 'ggplot2'


  library '/usr/lib/R/site-library' contains no packages

  res = super(Function, self).__call__(*new_args, **new_kwargs)

  res = super(Function, self).__call__(*new_args, **new_kwargs)
  library '/usr/lib/R/site-library' contains no packages

  res = super(Function, self).__call__(*new_args, **new_kwargs)

  res = super(Function, self).__call__(*new_args, **new_kwargs)


In [13]:
_ = %R --width 768 --height 768 plot(qplot(hp, mpg, color=factor(gear), data=mtcars, geom=c("point", "smooth")))


Error in plot(qplot(hp, mpg, color = factor(gear), data = mtcars, geom = c("point",  : 
  could not find function "qplot"



  res = super(Function, self).__call__(*new_args, **new_kwargs)
  installation of package 'ggplot2' had non-zero exit status

  res = super(Function, self).__call__(*new_args, **new_kwargs)
  installation of package 'Hmisc' had non-zero exit status

  res = super(Function, self).__call__(*new_args, **new_kwargs)


In [None]:
_ = %R --width 768 --height 768 plot(qplot(wt, mpg, color=factor(gear), data=mtcars, geom=c("point", "smooth")))

In [None]:
%%R
mtcars$am     <- as.factor(mtcars$am)
mtcars$cyl    <- as.factor(mtcars$cyl)
mtcars$vs     <- as.factor(mtcars$vs)
mtcars$gear   <- as.factor(mtcars$gear)

In [None]:
%%R
fit <- lm(mpg ~ ., data=mtcars)
summary(fit)

In [None]:
%%R
fit2 <- lm(mpg ~ hp + wt + gear, data=mtcars)
summary(fit2)

In [None]:
%%R
fit3 <- lm(mpg ~ hp + wt, data=mtcars)
summary(fit3)

In [None]:
%%R --width 768 --height 768
plot(fitted(fit3), resid(fit3))
abline(h = 0)