# Python and R Indexing Basics

Using the rpy2 library for python, we can easily work with both R and python in the same notebook.  This allows us to avoid choosing between R or python as our data science language. This allows us to take advantage of some of R's beautiful graphical packages or advanced statistical packages when we want and use keras and gpu computing with python for heavy supervised learning tasks.  The main drawback to using both language is they have __different indexing__ which can make things unecessarily difficult. 

To clear up some of the confusion. Here is a brief overview of the indexing basics for each language.

Important Points:
-  Use the '-i' syntax to import a file from one language to the other.
-  Use the '-o' syntax to export a file from one language to the other.
-  Python's pandas has different indexing options than basic python.

In [None]:
%load_ext rpy2.ipython

In [None]:
import pandas as pd

In [None]:
%R iris = iris #We will use the iris dataset for our indexing tasks today.

In [40]:
%R str(iris) #Notice we have 5 columns and 150 observations

A common task is to split a dataframe between the independent and dependent variables.  In this case **Species** is our dependent variable.  Lets just isolate it into an x and a y set using R.

In [None]:
%R train_y <- iris['Species'] #Using name indexing.

In [45]:
%R iris$Species #Using the dollarsign operator

[setosa, setosa, setosa, setosa, setosa, ..., virginica, virginica, virginica, virginica, virginica]
Length: 150
Categories (3, object): [setosa, versicolor, virginica]

In [4]:
%R train_x <- iris[-5] #Using negative numeric indexing grabs every column except the 5th.

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
1,5.1,3.5,1.4,0.2
2,4.9,3.0,1.4,0.2
3,4.7,3.2,1.3,0.2
4,4.6,3.1,1.5,0.2
5,5.0,3.6,1.4,0.2
6,5.4,3.9,1.7,0.4
7,4.6,3.4,1.4,0.3
8,5.0,3.4,1.5,0.2
9,4.4,2.9,1.4,0.2
10,4.9,3.1,1.5,0.1


__Careful here!__ Try to do the same thing in python and you will retreive a nice key error unless you use .iloc in which case you will retreive the 6th row from the end.  Negative indexes in R are **unique** to the language.

In [5]:
%R -o iris #Kick the iris dataset to python

In [6]:
type(iris) #Take a look at the dataset.  It's a in a PD dataframe!
iris.dtypes #Take a look at the datatypes.

pandas.core.frame.DataFrame

Remember that pandas has your regular python indexing, but it also has some added features to assist in indexing:

-  Selection by position '.iloc', allows you to use the indices to select rows and columns.
-  Selection by label '.loc', allows you to use the labels of the row or column to select them. 

In [33]:
train_y = iris.loc[:,'Species'] #Using the label indexing, you still need to remember the ": ," 
train_y.head() #Notice that pandas did not implicitly turn the species into categories.

1    setosa
2    setosa
3    setosa
4    setosa
5    setosa
Name: Species, dtype: object

In [44]:
iris.Species.head() #you can also access an index using the dot operator in similar fashion to the $ in R

1    setosa
2    setosa
3    setosa
4    setosa
5    setosa
Name: Species, dtype: object

In [25]:
train_X = iris.iloc[:,0:4] #Using pandas position indexing to select from the 0th index to the 4th not inclusive!

Unnamed: 0,Sepal.Length,Sepal.Width,Petal.Length,Petal.Width
1,5.1,3.5,1.4,0.2
2,4.9,3.0,1.4,0.2
3,4.7,3.2,1.3,0.2
4,4.6,3.1,1.5,0.2
5,5.0,3.6,1.4,0.2
6,5.4,3.9,1.7,0.4
7,4.6,3.4,1.4,0.3
8,5.0,3.4,1.5,0.2
9,4.4,2.9,1.4,0.2
10,4.9,3.1,1.5,0.1


This brings up an important difference between R and most other languages. R indexes start counting from 1 whereas most other languages start counting from 0.  I know! This is getting deep!

1         setosa
2         setosa
3         setosa
4         setosa
5         setosa
6         setosa
7         setosa
8         setosa
9         setosa
10        setosa
11        setosa
12        setosa
13        setosa
14        setosa
15        setosa
16        setosa
17        setosa
18        setosa
19        setosa
20        setosa
21        setosa
22        setosa
23        setosa
24        setosa
25        setosa
26        setosa
27        setosa
28        setosa
29        setosa
30        setosa
         ...    
121    virginica
122    virginica
123    virginica
124    virginica
125    virginica
126    virginica
127    virginica
128    virginica
129    virginica
130    virginica
131    virginica
132    virginica
133    virginica
134    virginica
135    virginica
136    virginica
137    virginica
138    virginica
139    virginica
140    virginica
141    virginica
142    virginica
143    virginica
144    virginica
145    virginica
146    virginica
147    virginica
148    virgini