### Setting up R in a Python 3 Jupyter Notebook
##### R and Python are both powerful tools when working with data, so why not learn to use both?


__R__ is a "statistical programming language" that has a large following in academic and research communities. R is built from an even older language named S, which was developed by the Bell Laboratories in 1975 [(S's Wikipedia entry)](https://en.wikipedia.org/wiki/S_(programming_language)). S was developed only 2-3 years after the C programming language (which was also developed by the Bell Labs)!

__Python__ is comparatively new to the scene, first released in 1991 by Guido van Rossum and updated to Python 3 in 2008 [(In case you want to see Python's Wikipedia entry as well)](https://en.wikipedia.org/wiki/Python_(programming_language)). Python quickly became favored for computer science and programming communities.

__There is, historically, a rift between these communities.__ Not necessarily because one is better than another, but because different people learned to do their jobs with different programming languages. Because of this, various add-ons (like packages and libraries) were developed specifically for Python or R. These gained traction within their communities because they were helpful to folks.

These community preferences and differences can be observed across the computational and data landscape. Some prefer SAS to SPSS, MPlus to HLM, or PowerBI to Tableau. All of these tools can be helpful to data analysts, and what people use largely depends on what their employers, teammates, and collaborators use.

##### Modern data analyses must be able to flex between these tools and work across community preferences.

The tools will change because of new jobs, new advancements, and new problems. A decade from now, both R and Python may be obsolete. Already, new frameworks and languages are being developed and deployed.

### In the face of change, what should you learn?

Change is constant. You need to know a few tools; in fact, you need to be an _expert_ in a few tools. However, it is not worth chasing all of them or defining your career too narrowly. If you understand the principles that form the foundation of all data-driven decisions, the winds of change will blow, and you will help direct them.

This class lets you put on your resume, "Can use Python and R for data analysis and decision making." Let's make that statement true.

### Combining the environments

Python and R are non-trivially different. However, enough people use both that a group of programmers invested time and energy to build a translator to make R accessible in Python. The result of that work is the [`rpy2` library](https://rpy2.github.io/doc/v3.4.x/html/index.html). We will rely on this library to make R accessible through a Python 3 notebook.

#### First, you need to install R in your Anaconda environment.
You can run shell commands from Jupyter Notebooks by prefacing the commands with a `!`. Because of how we are accessing Jupyter Notebooks, I suggest you run the following commands from the Anaconda Powershell(PC)/Terminal(Mac), not the notebook.

```
!conda activate envName
!conda install R
!pip install rpy2
```
Then open Jupyter Notebooks.
```
!jupyter notebook
```

#### Second, you need to specify where R is installed on your computer.
We are going to use the `os` library to define the file path for R. This path will depend on where you have installed Anaconda. If you have previously installed R, the path may be different.

After the path is set, you can import `rpy2` and specifically, the `robjects` module from `rpy2`.

In [1]:
import os
os.environ['R_HOME'] = r"D:/Anaconda/envs/teach/Lib/R"
#this works if R is installed outside #os.environ['R_HOME'] = r"H:/R-4.0.3"
print(os.environ['R_HOME'])

D:/Anaconda/envs/teach/Lib/R


In [2]:
import rpy2
print(rpy2.__version__)

3.4.2


In [3]:
import rpy2.robjects as robjects

```
!conda install -c r r-ggplot2
```

In [4]:
r_lm = robjects.r["lm"]

In [5]:
from rpy2 import robjects
from rpy2.robjects import Formula, Environment
from rpy2.robjects.vectors import IntVector, FloatVector
from rpy2.robjects.lib import grid
from rpy2.robjects.packages import importr, data
from rpy2.rinterface_lib.embedded import RRuntimeError
import warnings

# The R 'print' function
rprint = robjects.globalenv.find("print")
stats = importr('stats')
grdevices = importr('grDevices')
base = importr('base')
datasets = importr('datasets')

grid.activate()

In [6]:
import math, datetime
import rpy2.robjects.lib.ggplot2 as ggplot2
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
base = importr('base')

mtcars = data(datasets).fetch('mtcars')['mtcars']



In [15]:
gp = ggplot2.ggplot(mtcars)

#pp = (gp +
#      ggplot2.aes_string(x='wt', y='mpg') +
#      ggplot2.geom_point())

#pp.plot()

In [23]:
mtcars

mpg,cyl,disp,...,am,gear,carb
21.000000,6.000000,160.000000,...,1.000000,4.000000,4.000000
21.000000,6.000000,160.000000,,1.000000,4.000000,4.000000
22.800000,4.000000,108.000000,,1.000000,4.000000,1.000000
21.400000,6.000000,258.000000,,0.000000,3.000000,1.000000
...,...,...,,...,...,...
15.800000,8.000000,351.000000,,1.000000,5.000000,4.000000
19.700000,6.000000,145.000000,,1.000000,5.000000,6.000000
15.000000,8.000000,301.000000,,1.000000,5.000000,8.000000
21.400000,4.000000,121.000000,,1.000000,4.000000,2.000000
