# R from Python with rpy2

In [1]:
import warnings
from rpy2.rinterface import RRuntimeWarning
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=RRuntimeWarning)

In [2]:
import numpy as np
import pandas as pd

from sklearn import datasets

The [rpy2](https://rpy2.bitbucket.io/) package exposes three interfaces:

- a low level interface
- a high level interface and
- a Jupyter (magic) interface.

Since the majority of people use Python via a Jupyter Notebook, it makes sense to focus on the latter option. Notes on the two other interfaces are included below.

In [17]:
# For (automatic) translation of Pandas objects to R.
#
from rpy2.robjects import pandas2ri
pandas2ri.activate()

First let's construct some data that will be used for illustrations.

In [4]:
age = np.random.uniform(0, 18, size = 50)
height = 22 + 8.6 * age + np.random.normal(size = 50, scale = 10)

growing = pd.DataFrame({'age': age, 
                        'height': height})

growing.head()

Unnamed: 0,age,height
0,7.297125,84.366891
1,1.575068,42.962815
2,11.267092,110.430804
3,6.770082,76.587833
4,6.046341,84.075431


## Jupyter Magic

There are *magic* commands which make it easier to mix R and Python in a Jupyter Notebook.

In [5]:
# Enable R magic.
#
%load_ext rpy2.ipython
#
# %R  - line magic (return value as Python object)
# %%R - cell magic (no return value but can pass data in and out)
#
# Arguments:
#
# -i - input variable(s)
# -o - output variable(s)
# -h - height of plot
# -w - width of plot

We'll be looking at the following applications:

- simple regression
- plotting and
- generating synthetic data.

Let's start by accessing some builtin variables in R.

In [6]:
%R R.version$version.string
#
# Don't be confused by the ".": this is not the Python dot operator! It's just part of the variable name in R.

array(['R version 3.4.4 (2018-03-15)'], dtype='<U28')

Now create a (vector) variable in R and return to Python (as an array).

In [7]:
%R heights = rnorm(20, 165, 10)

array([154.39743259, 156.54518395, 155.92501835, 168.92197258,
       167.20784727, 161.97035659, 180.34567222, 155.95823695,
       146.15362255, 171.51174064, 137.02324682, 165.58089877,
       153.31297452, 163.04505873, 172.10218105, 165.06395177,
       161.97982406, 168.51312253, 158.03684096, 183.64510819])

Vectors in R are translated into arrays in Python.

What about a `data.frame` in R? Does that translate into a Pandas `DataFrame`? Indeed it does!

In [8]:
%R head(mtcars)

Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,21.0,6.0,160.0,110.0,3.9,2.62,16.46,0.0,1.0,4.0,4.0
1,21.0,6.0,160.0,110.0,3.9,2.875,17.02,0.0,1.0,4.0,4.0
2,22.8,4.0,108.0,93.0,3.85,2.32,18.61,1.0,1.0,4.0,1.0
3,21.4,6.0,258.0,110.0,3.08,3.215,19.44,1.0,0.0,3.0,1.0
4,18.7,8.0,360.0,175.0,3.15,3.44,17.02,0.0,0.0,3.0,2.0
5,18.1,6.0,225.0,105.0,2.76,3.46,20.22,1.0,0.0,3.0,1.0


In [9]:
# We can also store the result from %R in a Python variable.
#
mtcars = %R mtcars
#
# [!] Test access to this variable from Python.

In [10]:
%%R

R.version
#
# The block magic is probably most useful for code that has side effects.

               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          4.4                         
year           2018                        
month          03                          
day            15                          
svn rev        74408                       
language       R                           
version.string R version 3.4.4 (2018-03-15)
nickname       Someone to Lean On          


### Application: Simple Regression

Create a simple linear regression model using the `age` and `height` variables defined in Python.

In [21]:
%%R -i age,height -o coefficients

fit <- lm(height ~ age)
coefficients <- coef(fit)

Take a look at the coefficients.

In [22]:
print(coefficients)

(Intercept)         age 
  24.458804    8.508659 



In [23]:
type(coefficients)

rpy2.robjects.vectors.FloatVector

In [24]:
# Where is this object stored?
#
coefficients.__repr__
#
# Hypothesis: This is mapped directly to the underlying R object.

<bound method Vector.__repr__ of R object with classes: ('numeric',) mapped to:
<FloatVector - Python:0x7ffa0ec9f888 / R:0x53a7388>
[24.458804, 8.508659]>

That's really still a R object. We can convert it explicitly to Python.

In [25]:
coefficients = pandas2ri.ri2py(coefficients)
coefficients

array([24.45880438,  8.5086588 ])

In [26]:
# Where is this object stored?
#
coefficients.__repr__
#
# Hypothesis: This is a copy of the data.

<method-wrapper '__repr__' of numpy.ndarray object at 0x7ffa0ecaa4e0>

What about accessing the detailed model summary?

In [None]:
%%R

summary(fit)

### Application: Plotting

R has a range of builtin plotting capabilities. There are also a number of libraries which provide alternative plotting interfaces.

In [None]:
%%R

options(bitmapType="cairo") # Choose a specific graphics device (https://www.cairographics.org/).

A simple plot using base graphics in R.

In [None]:
%%R

par(cex = 1.25)
plot(1:10, col = 'red', pch = 19)

Generate some diagnostic plots for the linear regression model.

In [None]:
%%R -w 900 -h 450

par(mfrow = c(1, 2), cex = 1.25)
#
# Object oriented behaviour in R: plot() is a generic function.
#
plot(fit, which = 1:2)

Generate a scatter plot with superimposed linear regression fit (and confidence intervals).

In [None]:
%%R -w 900 -h 400 -i growing

library(ggplot2)

ggplot(growing, aes(x = age, y = height)) +
    geom_point(size = 3, color = "blue") +
    geom_smooth(method = "lm", color = "red", lty = "dashed") +
    theme_classic(base_size = 16)

### Application: Synthetic Data

In [None]:
%%R -o synthetic_people

library(charlatan)

# Generate synthetic data for some French people.
#
(synthetic_people <- ch_generate("name", "phone_number", "job", locale = "fr_FR"))
#
# [!] Use ch_company() to add an 'employer' column. Don't forget to use appropriate locale!

Let's take a look at the returned object.

In [None]:
synthetic_people

That looks *almost* like a Pandas `DataFrame`.

In [None]:
type(synthetic_people)

Almost but not quite. We can explicitly convert it though!

In [None]:
pandas2ri.ri2py_dataframe(synthetic_people)

## rpy2: Low Level Interface

It's possible to interact directly with the embedded R interpreter.

In [None]:
from rpy2 import rinterface

In [None]:
rinterface.R_VERSION_BUILD

But why make our lives difficult? There's a high level interface which makes the interactions between Python and R much simpler.

**Caveat:** the low level interface comes into its own when moving large objects back and forth between R and Python.

## rpy2: High Level Interface

In [None]:
# This will also kick off an embedded R session.
#
from rpy2 import robjects
from rpy2.robjects import r as R
from rpy2.robjects import packages as r_packages
#
# R() - run (string) in R
# R[] - retrieve from R environment

### Packages

One of the best aspects of R is its extensive ecosystem of packages.

In [None]:
# You can import any R package.
#
base     = r_packages.importr('base')
utils    = r_packages.importr('utils')
#
# What's happening behind the scenes:
#
# - importing package into embedded R session and
# - exposing R objects as Python objects.

In [None]:
base.sqrt

In [None]:
# Objects names with a "." in R will be translated into a "_" in Python.
#
# For example, install.packages() becomes install_packages().
#
utils.install_packages

### Acquire Data

Brace yourself for something *new* and *exciting*: the iris data!

Why?

1. Because everybody knows iris (this is *not* about the data!).
2. Because we are just going to be using it to illustrate data transfer.

In [None]:
# Load the iris data using the datasets module in sklearn.
#
iris = datasets.load_iris()
#
iris = pd.DataFrame(iris.data, columns=iris.feature_names)

In [None]:
iris.head()

### Data: Python to R

Explicitly convert the Pandas `DataFrame` to a R `data.frame`.

In [None]:
r_iris = pandas2ri.py2ri(iris)

In [None]:
type(r_iris)

In [None]:
print(r_iris)

### Code: R in Python

In [None]:
R('3 + 3')

In [None]:
R('x <- 3')

### Data: R to Python

Everything in R is a vector.

In [None]:
R['x']

In [None]:
R['pi']

In [None]:
R['pi'][0]

What about accessing a R `data.frame`?

In [None]:
mtcars = R['mtcars']

The result is a Pandas `DataFrame`.

In [None]:
mtcars.head()

### Creating Vectors

Some functions for creating native R objects directly from Python.

In [None]:
names = robjects.StrVector(['Alice', 'Bob'])
#
# What is underlying representation in R?
#
names.r_repr()

In [None]:
ages = robjects.IntVector([23, 24])
#
# Converts to shorthand.
#
ages.r_repr()

### Creating functions in R

Functions can be accessed as attributes or dictionary. Let's start with an existing function.

In [None]:
R.sum

In [None]:
r_sum = R['sum']

Call it with some (Python) data. This effectively looks and behaves like a normal Python function.

In [None]:
r_sum(ages)

Or we can take the more direct route and just use the attribute on `R`.

In [None]:
R.sum(ages)

Create a new function.

In [None]:
R('square <- function(x) x**2')

In [None]:
R.square(4)

### Models

One of the major motivations for integrating with R is gaining access to its modelling capabilities.

In [None]:
model = R.lm('height ~ age', data = growing)
print(R.summary(model).rx2('coefficients'))

The `rx()` and `rx2()` delegators represent the `[` and `[[` R operators.