# Rpy2 Tutorial

This tutorial shows how to use rpy2 functionality in python scripts. The tutorial focuses on the high-level interface of rpy2.

## Introduction

rpy2 is a communication layer between python and R. When importing rpy2, an embedded R instance in started.

In [1]:
import rpy2.robjects as robjects
from rpy2.robjects import r as R

Objects in R can be accessed via the __get_item__ attributes:

In [2]:
pi = R['pi']
print pi

[1] 3.141593



R functions can be called as python funtions():

In [3]:
session_info = R.sessionInfo()
print session_info

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
[1] C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     



R functions can accept arguments:

In [4]:
print R.sqrt(pi)

[1] 1.772454



Some function names in R contain '.', which is not accepted python syntax. To work around this, create a python variable holding the reference to the function:

In [5]:
is_numeric = R['is.numeric']
print is_numeric(pi)

[1] TRUE



## R objects versus python objects

Return values from rpy2 are python objects that represent R data structures held in R. As R typically returns vectors, the same will be true for the R-like python objects. This might lead to unexpected behaviour:

In [6]:
pi = R['pi']
print pi
print pi + 2

[1] 3.141593

[1] 3.141593 2.000000



The reason becomes clear when inspecting the type of the python object pi:

In [7]:
print type(pi)

<class 'rpy2.robjects.vectors.FloatVector'>


*pi* is an R-like object of Type FloatVector. The `+ 2` was interpreted as a concatenation option. To increment pi by 2, do the following:

In [8]:
print pi[0] + 2

5.14159265359


Similarly, the return value of the sessionInfo() call is a nested list structure:

In [9]:
print type(session_info)

<class 'rpy2.robjects.vectors.ListVector'>


The python proxy objects representing R objects contain an interface that is very similar to the corresponding python types. Thus, printing produces familiar output or the return values can be used in many python expressions such as "sorted":

In [10]:
print sorted(R.c(4,3,2))

[2, 3, 4]


## Converting R objects into python objects

Conversion between R objects can be either manual or automatic.

### Manual conversion

By default, R return values are not converted into native python types but instead, proxy objects are returned that reference data held within the R interpreter. Generally, the interface of the R-like objects is very similar to the python equivalents. In the rare cases where this fails, explicit conversions are generally straight-forward:

In [11]:
c_list = R.c(2, 3, 4)
print "c_list=", type(c_list)
py_list = list(c_list)
print "py_list=", type(py_list)
print "py_list[0]=", type(py_list[0])

c_list= <class 'rpy2.robjects.vectors.IntVector'>
py_list= <type 'list'>
py_list[0]= <type 'int'>


For complex data types such as matrices and dataframes, rpy2 provides utility functions. For example, the interface of rpy2's r-like objects support numpy directly:

In [12]:
import numpy
numpy_array = numpy.array(R.c(2,3,4))
print type(numpy_array)
print numpy_array

<type 'numpy.ndarray'>
[2 3 4]


Note that converting to an array will duplicate the memory used. numpy.asarray() creates a view()

In [13]:
numpy_matrix = numpy.matrix(R.matrix(R.c(1,2,3,4), nrow=2))
print type(numpy_matrix)
print numpy_matrix

<class 'numpy.matrixlib.defmatrix.matrix'>
[[1 3]
 [2 4]]


To convert rpy2's data frames into python dataframes, use pandas2ri:

In [14]:
from rpy2.robjects import pandas2ri
R.data('iris')
df_iris = pandas2ri.ri2py(R['iris'])
print type(df_iris)
print df_iris.head()

<class 'pandas.core.frame.DataFrame'>
   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
1           5.1          3.5           1.4          0.2  setosa
2           4.9          3.0           1.4          0.2  setosa
3           4.7          3.2           1.3          0.2  setosa
4           4.6          3.1           1.5          0.2  setosa
5           5.0          3.6           1.4          0.2  setosa


### Automatic conversion

It is possible to configure rpy2 to perform automatic conversion of return values. This involves setting custom mappings between R objects and python types. See the chapter on [Mapping rpy2 objects to arbitrary python objects][http://rpy2.readthedocs.io/en/version_2.8.x/robjects_convert.html#mapping-rpy2-objects-to-arbitrary-python-objects].

In my opinion, the default manual conversion is to be preferred. It follows the python principle that explicit is better than implicit. It will also save on unnecessary conversions being performed. As a downside, developers need to be aware of rpy2 as a communication layer.

## Converting python objects into R objects

The conversion of python objects into R-like objects works similarly to the other directions. In most cases, the shared API of python and R-like data types makes the conversion implicit.

In [15]:
print R.matrix([1,2,3,4], nrow=2)

     [,1] [,2]
[1,] 1    3   
[2,] 2    4   



In [16]:
df = pandas.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C':[7,8,9]},
                      index=["one", "two", "three"])
print type(df)
pandas2ri.activate()
r_df = pandas2ri.py2ri(df)
pandas2ri.deactivate()
print type(r_df)
print r_df

<class 'pandas.core.frame.DataFrame'>
<class 'rpy2.robjects.vectors.DataFrame'>
      A B C
one   1 4 7
two   2 5 8
three 3 6 9



Note that pandas2ri needs to be activated for this to work. Presumably this is because the dataframe is converted as a collection of Series.

# Python and R Namespaces



In [17]:
py_var = R.c(1, 2, 3)
print py_var

[1] 1 2 3



The py_var variable is not part of the R namespace and thus the following fails:

In [18]:
'py_var' in robjects.globalenv

False

In [19]:
R.assign('r_var', py_var)

<IntVector - Python:0x2b60d8017998 / R:0x351aaf8>
[       1,        2,        3]

In [20]:
print 'r_var' in robjects.globalenv
print R('''r_var''')

True
[1] 1 2 3



In [21]:
py_var[1] = 5
print R('''r_var''')

[1] 1 5 3



An alternative way to transfer variables is to assign them to an R environment. Here, we assign to the global R environment. Note that this assignment creates a copy.

In [22]:
robjects.globalenv['r_var2'] = py_var
print R('''r_var2''')
py_var[1] = 4
print R('''r_var''')
print R('''r_var2''')

[1] 1 5 3

[1] 1 4 3

[1] 1 5 3



R objects that are held by python are anonymous. However, they do have an R-representation that can be used directly in R expressions.

In [23]:
py_var.r_repr()
R("""sum({})""".format(py_var.r_repr()))

<IntVector - Python:0x2b61735f3998 / R:0x45e8248>
[       8]

Note that this is textual representation and thus not a very effective way of transfering data between the python and R worlds.

# Element access in lists

Lists and arrays can be an issue as there are differences between python and R. For example, python lists start at element 0 while R vectors start at 1. Thus there are different ways to access elements in R-like vector objects.

## Python element acccess
The __get_item__() and __set_item__() and slicing accessors work from python as expected:

In [24]:
r_vector = R.c(0,1,2,3,4)
print r_vector[0]
print r_vector[-1]
print r_vector[1:3]
r_vector[2] = 5
print r_vector

0
4
[1] 1 2

[1] 0 1 5 3 4



## R-like element access

To use R-like indexing, use the .rx and .rx2 accessor functions. .rx corresponds to the '[' operator in R, while .rx2 corresponds to the '[[' operator. Thus, to extract the first element using R-based indexing, type:

In [25]:
r_vector = R.c(0,1,2,3,4)
print r_vector.rx(1)
print r_vector.rx(-1)  # in R, -1 is element exclusion
print r_vector.rx(robjects.IntVector((2, 4, 2)))

[1] 0

[1] 1 2 3 4

[1] 1 3 1



In [26]:
r_list = R('''list(a=2, b=3, c=4)''')
print r_list.rx('a')
print r_list.rx2('a')

$a
[1] 2


[1] 2



Note: there are additional ways for python and R to interact and pass variables and data around.

# Todo

* Notes on threading/multiprocessing
* Notes on efficiency.
* Notes on safe development in global R namespace.
* Notes on why?