# Connecting to the RCC

If you don't want to work locally on your laptop, you can connect via ssh to the RCC to use the commandline, or via ThinLinc to use an ipython notebook:

[https://rcc.uchicago.edu/docs/connecting/index.html](https://rcc.uchicago.edu/docs/connecting/index.html)

```module avail``` will show you the list of RCC "modules" (not to be confused with python modules) that can be loaded.
```module load python/2.7-2015q1``` will make python available.  You'll also need to run ```module load node/0.10.29``` for notebook conversion, which we'll do at the end.
```ipython notebook``` will start you an ipython notebook.

# Imports

A python script always begins with importing modules, like the ```library``` function in R.  There are various ways to do this in python.

In [None]:
import numpy # this is an import

All objects are available as ```numpy.```, like ```numpy.array```.

In [None]:
numpy.array([[1,2], [3,4]])

In [None]:
import numpy as np
import pandas as pd

All object are available as ```np.```, like ```np.array``` or ```pd.DataFrame```.  This will be our preferred method for these two specifically.

In [None]:
np.array([[1,2], [3,4]])

In [None]:
from numpy import array

This only imports individual functions or objects that you care about, available by name:

In [None]:
array([[1,2], [3,4]])

In [None]:
from numpy import *

This imports all functions and objects from the module, available by name.  This is generally discouraged.

In R, packages are installed with the ```install.packages``` command.  In python, they're installed on the command line with ```pip install package``` or ```conda install package``` if you're using Anaconda.  If you're working at the RCC, you shouldn't have to worry about that.

# Python Basics

In [None]:
my_list = [1,2,3,4]
my_list

In [None]:
list_2 = ["hello", 1, 5]
list_2

In [None]:
list_3 = [list_2, "string"]
list_3

In [None]:
type(my_list)

INDEXING STARTS AT 0 IN PYTHON INSTEAD OF 1 IN R!!!!

In [None]:
my_list[0] # a<-c(1, 2, 3, 4) a[1]

In [None]:
my_dictionary = {"a": 1, "b": 2, "c": 3}
my_dictionary

In [None]:
type(my_dictionary)

In [None]:
my_dictionary["a"]

In [None]:
my_dictionary["c"]

In [None]:
my_dictionary["z"] = my_list
my_dictionary

In [None]:
for element in my_dictionary:
        print "%s is %s " % (element, my_dictionary[element])

In [None]:
my_tuple = (my_list, my_dictionary)
my_tuple

In [None]:
my_tuple[0]

In [None]:
my_tuple[1]

Once you assign something to a tuple, you can't change it.

In [None]:
my_tuple[0]="a"

In [None]:
type(my_tuple)

# Numpy

In [None]:
matrix = np.array([[1,2], [3,4]]) # m<-matrix(c(1,3, 2,4), nrow=2, ncol=2)
matrix

In [None]:
type(matrix)

In [None]:
matrix.shape # dim(m)

In [None]:
matrix[:, 0] # m[, 1]

In [None]:
matrix[0, :] # m[1, ]

In [None]:
r = np.random.rand(10, 10)
r

In [None]:
r.shape

In [None]:
?np.random.rand

In [None]:
r[5:8, :]

In [None]:
r[:, :-2] # everything but the last two columns

In [None]:
r[:, -5:] # starting from 5 columns from the end to the end

# Pandas

In [None]:
df = pd.DataFrame(r)

In [None]:
df.describe()

In [None]:
df.head() # head(10)

In [None]:
df.tail() # tail(10)

In [None]:
df.columns # colnames(df)
#df.columns = ["col_0", "col_1", "col_3"]

In [None]:
df.index # rownames(df)

In [None]:
df.columns = ["col_%s" %s for s in range(10)] # you can also rename rows with df.index
df.head()

In [None]:
df = df.rename(columns={"col_2": "special_col"})
df.head()

In [None]:
my_data = {"col_a": ["a", "b", "a", "c", "d"], "col_b": [1, 2, 3, 4, 5], "col_c": np.random.rand(5)}
my_data

In [None]:
df2 = pd.DataFrame(my_data)
df2.head()

In [None]:
len(df) #nrow

In [None]:
df.shape

In [None]:
boston = pd.read_csv("./Boston.csv")
boston.head()

In [None]:
boston.describe()

In [None]:
boston[0:10]

In [None]:
boston["crim"] # will return a series

In [None]:
boston.crim # will return a series

In [None]:
boston[["crim", "zn"]] # will return a dataframe

In [None]:
boston.loc[0:10, ["crim", "zn"]]

In [None]:
boston.iloc[0:10, [0, 2]]

In [None]:
boston[boston.medv>10]

In [None]:
boston[(boston.medv>10) & (boston.tax<300)] # be careful with the parentheses

In [None]:
boston[(boston.medv>10) | (boston.dis<4)] # be careful with the parentheses

In [None]:
boston.head()

In [None]:
boston.pop("nox")
boston.head()

In [None]:
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
?plt.scatter
#plt.scatter(boston.medv, boston.dis)
#plt.show()

In [None]:
plt.scatter(boston.medv, boston.dis, s=boston.crim)
plt.show()

In [None]:
plt.boxplot(boston.medv)
plt.show()

In [None]:
plt.hist(boston.medv)
plt.show()

In [None]:
axes = boston[["crim", "indus", "age", "rad", "tax", "medv"]].hist()

In [None]:
axes = pd.tools.plotting.scatter_matrix(boston[["crim", "indus", "age", "rad", "tax", "medv"]])

See [http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html](http://pandas.pydata.org/pandas-docs/version/0.15.0/visualization.html) for more info on plotting in pandas.

# Saving Notebooks to HTML

**You can also use File => Download As from the ipython notebook menu.**

Make sure that you've run<br><br> ```module load node/0.10.29``` <br><br>first if you're working at the RCC.

```ipython nbconvert --help```

```ipython nbconvert lecture_1_intro.ipynb```

```ipython nbconvert --to html lecture_1_intro.ipynb```

```ipython nbconvert --to python lecture_1_intro.ipynb```

```ipython nbconvert --to latex --post PDF lecture_1_intro.ipynb```

# More Resources

[https://github.com/addfor/tutorials](https://github.com/addfor/tutorials)

[http://synesthesiam.com/posts/an-introduction-to-pandas.html](http://synesthesiam.com/posts/an-introduction-to-pandas.html)

[http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/](http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/)

[https://github.com/savarin/pyconuk-introtutorial](https://github.com/savarin/pyconuk-introtutorial)