# Interleaving Python and R

Python is an excellent language for creating complex models, munging data, etc. But, the libraries avaliable for doing standard statistical analyses are somewhat lacking. Often, I like to do such analyses in R instead, and the "R magic" extension to IPython makes it incredibly easy to switch back and forth between Python and R.

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np

---

## Loading the R magic

To use the R magic, we need to have installed the `rpy2` library. In the notebook where we want to use the R magic, we also need to import a few things. This is the most annoying part about using the R magic, in that there's a lot of lines of code to remember to include---and the order you run these imports matters! After having figured out the right incantation once, I always just copy and paste the following code whenever I want to use the R magic:

In [None]:
import rpy2

# the following lines will allow us to convert between Pandas DataFrames and R DataFrames
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
from rpy2.robjects.conversion import ri2py

# this loads the R magic extension
%load_ext rpy2.ipython

Now, to actually use the R magic extension, we can create a cell with `%%R` at the top of it and then R code in the cell:

In [None]:
%%R

x <- c(1, 2)
y <- c(3, 4)
x + y

We can also use the "line magic" version, which allows us to use R inline with python code. This is particularly useful if we want to call an R function and return the value back to Python. The following code creates a vector in R, and then saves it to `arr`, which is a Python variable!

In [None]:
arr = %R c(1, 2, 3)
arr

---

## Passing values to R

We can pass Python variables to R, and R variables back to Python, within reason. In particular, we can almost always translate between Pandas DataFrames and R DataFrames. This makes it super easy to do data munging in Python, pass the DataFrame to R for analysis, and then get the results back in Python.

Let's take another look at our bouncing ball dataset.

In [None]:
data = pd.read_csv("data/ball.csv")

# filter out extreme response times
lo, hi = np.percentile(data["rt"], [0.5, 99.5])
data = data.query("rt > {} and rt < {}".format(lo, hi))

data.head()

Previously, we used statsmodels to do some basic analysis on this dataset. Now, let's do the same analysis in R instead! To pass in the DataFrame, we use the `-i` flag (for "***i***nput") when we invoke the R magic:

In [None]:
%%R -i data

model <- lm(log(rt) ~ hole_class * hole_width, data=data)
summary(model)

After having run R code, the results stay around (just as with the Python kernel). So, we can reference the same model to do an ANOVA, for example:

In [None]:
%%R

anova(model)

---

## Getting values out of R

Let's say we want to now get the ANOVA results back into Python. Just as we used the `-i` flag before, we can also use the `-o` flag (for "***o***utput"):

In [None]:
%%R -o result

result <- anova(model)

When we inspect the `result` variable, we wee that it is an R object:

In [None]:
result

That's not very useful. But, do not despair---this is where the `ri2py` function that we imported earlier comes in:

In [None]:
ri2py(result)

---

## Plotting with R

I'm not going to go into detail on creating plots in R, except just to demonstrate that you can also create plots inline in the notebook with R, just like you can in Python! For example, to create a Q-Q plot like we did previously (but now in R):

In [None]:
%%R

qqnorm(log(data$rt))