# R for Pythonistas  

If you know `Python`, `matplotlib`, `pandas`, etc. and are looking to learn the `R` equivalents, this notebook is for you. It shows some common use cases in `Python` and then the equivalents in `R`. You'll use the [rpy2](http://rpy.sourceforge.net/rpy2/doc-2.4/html/index.html) package and the `%%R` magic throughout to keep the Python and R examples right next to one another for easy study.

Install `rpy2` for Python now using `pip` if you haven't already.

In [None]:
%%bash
pip install rpy2

Now enable the `rpy2.ipython` extension so that you can use the `%%R` magic.

In [None]:
%load_ext rpy2.ipython

## Plotting 

In Python, [matplotlib](http://matplotlib.org/) is the de facto standard for plotting. Pandas wraps it to support easy plotting of DataFrames. The IPython notebook has a magic to enable inline plots from matplotlib.

In [None]:
%matplotlib inline

Pandas can also improve the default styling of plots in one line.

In [None]:
pd.options.display.mpl_style = 'default'

Then load the `ggplot2` library.

In [None]:
%%R
library("ggplot2")

### Scatter Plot  Data Frame Columns

`pandas.DataFrame` can render a matplotlib scatter plot of multiple columns with the `plot` method.

In [None]:
import numpy as np
import pandas as pd
import string

# 10x10 grid of numbers from std normal distribution
mat = np.random.randn(10, 10)
# index by letters A-J, columns by letters K-T
df = pd.DataFrame(mat, columns=list(string.letters[10:20]), index=list(string.letters[:10]))

ax = df.plot(kind='scatter', x='K', y='L', c='r', label='K vs L')
df.plot(kind='scatter', x='M', y='N', c='b', label='M vs N', ax=ax)
ax.set_title('Scatter Plot')
ax.set_xlabel('K and M')
ax.set_ylabel('L and N')

`ggplot2` renders scatter plots using the `geom_point` function. Call `ggplot` and pass the `data.frame` to initialize the plot. Add additional objects to it (`geom_point`, `xlab`, etc.) to create the layers and plot annotations. Use the `aes` parameter, short for *aesthetic*, to control the data to plot, the colors, legends, etc.

**Note**: We'll ensure the same data appears in the R plot as in the Python plot by pushing the Python `numpy` matrix to R and build an R `data.frame` from it. We'll also pass additional arguments to the `%%R` magic to control the plot size and make it roughly equivalent to the Python version.

In [None]:
# share the Python variable mat with R so we can plot the same data
%Rpush mat

In [None]:
%%R -w 480 -h 300 -u px

df <- data.frame(mat, row.names=LETTERS[1:10])
colnames(df) <- LETTERS[11:20]

ggplot(df) +
    geom_point(aes(x=df$K, y=df$L, colour="K vs L")) +
    geom_point(aes(x=df$M, y=df$N, colour="M vs N")) +
    guides(col=guide_legend(title=NULL)) +
    xlab("K and M") +
    ylab("L and N") +
    ggtitle("Scatter Plot")

### Line Plot Data Frame Columns

`pandas.DataFrame.plot` produces line plots by default for all columns in a DataFrame.

In [None]:
ax = df[['L', 'N']].plot(title='Line Plot')

`ggplot` renders line graphs using the `geom_line` function. Again, pass the `data.frame` to `ggplot` and add the return values of additional function calls (e.g., `geom_line`, `guides`, `ggtitle`, etc.) to it.

In [None]:
%%R -w 480 -h 300 -u px

ggplot(df) + 
    geom_line(aes(x=rownames(df), y=L, group=1, col="L")) + 
    geom_line(aes(x=rownames(df), y=N, group=1, col="N")) +
    guides(col=guide_legend(title=NULL)) +
    xlab(NULL) +
    ylab(NULL) +
    ggtitle("Line Plot")

### Plot Formatting

`pandas.DataFrame.plot` passes most styling arguments through to matplotlib.

In [None]:
ax = df[['L', 'N']].plot(color=['g','b'], style='--')
ax.set_title('Line Plot', fontdict=dict(fontsize=20))
ax.set_xlabel('X Axis', fontsize=16, color='r')
ax.set_ylabel('Y Axis', fontsize=16, color ='m')

`ggplot2` uses the [aes](http://docs.ggplot2.org/0.9.3.1/aes_linetype_size_shape.html) and [theme](http://docs.ggplot2.org/0.9.3.1/theme.html) functions to control the styling of a plot.

In [None]:
%%R -w 480 -h 300 -u px

ggplot(df) + 
    geom_line(aes(x=rownames(df), y=L, group=1, col="L"), linetype="dashed") + 
    geom_line(aes(x=rownames(df), y=N, group=1, col="N"), linetype="dashed") +
    guides(col=guide_legend(title=NULL)) +
    theme(title = element_text(size=20),
          axis.title.x = element_text(colour="red", size=16),
          axis.title.y = element_text(colour="magenta", size=16)) +
    xlab('X Axis') +
    ylab('Y Axis') +
    ggtitle("Line Plot")

## References

For more information about R, check out the [documentation](http://cran.r-project.org/manuals.html), this [Quick-R blog](http://www.statmethods.net/) from the author of _R in Action_, and this [ggplot2 blog](http://www.cookbook-r.com/Graphs/) from the author of _R Graphics Cookbook_.

## Want to learn more?

<a href="http://bigdatauniversity.com/courses/introduction-to-data-analysis-using-r/?utm_source=tutorial-r-ggplot2&utm_medium=dswb&utm_campaign=bdu"><img src = "https://ibm.box.com/shared/static/1bzglzwk7nbjt2rflnnxth82n5bypxj6.png"> </a>

Created by: <a href="https://bigdatauniversity.com/?utm_source=bducreatedbylink&utm_medium=dswb&utm_campaign=bdu">The Cognitive Class Team</a>