# Preface: Using Jupyter Notebooks

---

Jupyter Notebooks support 3 different languages: **Ju**lia, **Pyt**hon, and **R**. These notebooks are interactive enviroments specially designed to clearly communicate processes of scientific computing. These are **not** meant to be used in large codebases, Object Oriented Programming, or anything other than scripting.

Really successful usecases:
   * Data cleaning
   * Presentation of data analysis
   * Educational purposes
   * Collaborative datascience\*
   
\* This is still yet to be seen. But see <a href="https://colab.research.google.com/">Google's Collab</a>, <a href="http://jupyterhub.readthedocs.io/en/latest/">JupyterHub</a>, and <a href="https://www.google.com/aclk?sa=l&ai=DChcSEwjjloCzrLjZAhWFPGkKHf6MCOAYABAAGgJpcQ&sig=AOD64_30M1dzRytT0y2Bo-l13r7tTZxYqg&q=&ved=0ahUKEwiykPiyrLjZAhVszlQKHV9RB-kQ0QwIJQ&adurl=">AWS's Sagemaker</a>.

Different coding environments can be run by specifying which kernels to run with Jupyter. The name of your current kernel can be viewed at the top right of your notebook, next to a circle. To see the list of all your available kernels, run:
```sh
jupyter-kernelspec list
```

You can generally run `bash` commands inside cells like normal code, prefixed with `!`

In [23]:
!ls   # list all items in your directory
!echo # newline
!pwd  # print current working directory

Numpy and Pandas.ipynb py0.html
py0.Rmd                python1.ipynb

/Users/codetesting/Desktop/Deep_Learning_Code/SUSA/crash-course/Python


There's a lot to learn about Jupyter notebooks, and it's still being actively developed––here at Berkeley!

Now, onto the main topic.

# Numpy
---

@author = Ajay Raj

An informal introduction to your least-worst enemy in the realm of data science. Numpy is an optimized math library for Python. Most of the optimization occurs in C, and some neat Python backend tinkering allows us to interface with it in Python. The code is vectorized as much as possible, which means that there's a heavy focus on using arrays (treated as n-dimensional vectors) to do operations. This is a shift away from doing looped operations.

For example:
If you wanted to compute the dot product of two arrays [1, 2, 3, 4, 5] and [5, 4, 3, 2, 1], you could either loop through the arrays

```python
sum = 0
for v1, v2 in zip(arr1, arr2):
    sum += v1*v2
```

Or, you could perform all the multiplications at once, and then add them together. That's basically what Numpy does behind the scenes. So doing the dot product in numpy is very simple:

```python

arr1 = np.array(arr1) # casting lists to np arrays
arr2 = np.array(arr2)
dp = arr1.dot(arr2)
```

Now! Onwards! With our install!

## Installation

In [None]:
!conda install scipy

In [24]:
import matplotlib.pyplot as plt

In [25]:
import numpy as np

## Basic Operations

NumPy is a Python library that is used to handle linear algebra operations. It does a couple amazing things under the hood that make certain operations lightning fast, and makes large scale data processing possible (like Pandas –– covered later).

NumPy holds data in **arrays**.

In [2]:
v = np.array([1, 2, 3, 4])
v

array([1, 2, 3, 4])

Numpy arrays function similary to vanilla python arrays. For example, you can index through them one at a time.

In [3]:
for x in np.nditer(v):
    print(x)

1
2
3
4


However, the real power of numpy arrays comes through when you do math. Numpy has interfaced with special python keywords to allow us to add and multiply arrays as if they were numbers. This is a really nice feature that enhances the readability of Python/Numpy code.

Here we're going to compare multiply arrays with Python loops against doing the same process in Numpy.

In [4]:
from time import time

# Python test
start1 = time()
A = list(range(1000000))
B = list(range(1000000, 0, -1))
dotprod = 0
for a,b in zip(A,B):
    dotprod += a*b
t1 = time()-start1
print("Python gives us time of {}".format(t1))

# Numpy test
start2 = time()
A = np.arange(0,1000000,1)
B = np.arange(1000000,0,-1)
print(dotprod == A.dot(B))
t2 = time()-start2
print("Numpy gives us time of {}".format(t2))

print("Numpy is faster: {}".format(t2 < t1))
print("Speedup factor: {}".format(t1/t2))

Python gives us time of 0.26851582527160645
True
Numpy gives us time of 0.02663111686706543
Numpy is faster: True
Speedup factor: 10.082784984646237


### Indexing 2-D Arrays in Numpy

What is a 2-D array? It's an array of arrays. Also referred to as a matrix. 

This is what a 2D list looks like in vanilla Python.
```python
A = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
    ]
```

Accessing the number 5 is not too easy however. There are no built-in routines to help you index 2 layers deep into a list. So you have to index into multiple arrays one at a time:

```python
# getting the number 5 from A
# A[1] = [4, 5, 6]
# A[1][1] = 5
five = A[1][1]
```

When you store an array as an np.array, you are not only gaining a runtime speedup, you're also getting a speedup in writing your code because you now have advanced indexing! 

Now, we'll show how to index in a similar 4x4 array in numpy's array format.

In [14]:
A = np.random.randn(4,4)
A

array([[ 0.01359636, -1.11467875, -0.63780558,  0.18332114],
       [ 0.34182303,  1.17361324, -0.66276945,  1.51996386],
       [ 0.61256402, -0.78131902, -0.96756168, -1.6533884 ],
       [ 0.96244877, -2.83869135,  0.42021725,  1.20581674]])

In [15]:
A[2, 2] # element selection

-0.9675616772835679

In [16]:
A[1, :] # second row of the matrix

array([ 0.34182303,  1.17361324, -0.66276945,  1.51996386])

In [17]:
A[:, 2] # third column of the matrix

array([-0.63780558, -0.66276945, -0.96756168,  0.42021725])

In [18]:
A.shape

(4, 4)

In [19]:
A.reshape((8,2)) # will reshape and fill it in by rows

array([[ 0.01359636, -1.11467875],
       [-0.63780558,  0.18332114],
       [ 0.34182303,  1.17361324],
       [-0.66276945,  1.51996386],
       [ 0.61256402, -0.78131902],
       [-0.96756168, -1.6533884 ],
       [ 0.96244877, -2.83869135],
       [ 0.42021725,  1.20581674]])

In [21]:
A.reshape((2, 8))

array([[ 0.01359636, -1.11467875, -0.63780558,  0.18332114,  0.34182303,
         1.17361324, -0.66276945,  1.51996386],
       [ 0.61256402, -0.78131902, -0.96756168, -1.6533884 ,  0.96244877,
        -2.83869135,  0.42021725,  1.20581674]])

## Broadcasting

The most important thing NumPy does is **broadcasting**, which means that it allows for arithmetic operations on arrays of different shapes.

In [11]:
# See https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

a = np.array([1.0, 2.0, 3.0])
b = 2.0
a * b

array([ 2.,  4.,  6.])

In [12]:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
a * b

array([ 2.,  4.,  6.])

The rule of thumb is that NumPy does arithmetic operations pairwise, but if a certain dimension is 1, then it will **broadcast** that effect across the dimension.

In [13]:
a = np.array([1.0, 2.0, 3.0])

B = np.zeros((3, 3)) # means a 3x3 matrix of all zeros

a + B

array([[ 1.,  2.,  3.],
       [ 1.,  2.,  3.],
       [ 1.,  2.,  3.]])

In [14]:
a = np.array([[1.0], [2.0], [3.0]])

B = np.zeros((3, 3))

a + B

array([[ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

In [15]:
1 + np.zeros((3, 3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

# Linear Algebra

NumPy also supports many linear algebra operations.

In [16]:
A = 10 * np.random.rand(3, 3)
A = A.astype(int)
A

array([[8, 9, 4],
       [8, 2, 7],
       [2, 0, 6]])

In [17]:
A.T

array([[8, 8, 2],
       [9, 2, 0],
       [4, 7, 6]])

In [18]:
x = np.ones(3)
x

array([ 1.,  1.,  1.])

In [19]:
A @ x # Matrix vector multiplication

array([ 21.,  17.,   8.])

In [20]:
np.dot(A, x) # equivalent to above

array([ 21.,  17.,   8.])

In [21]:
A * x # does not work as expected! see the broadcasting section

array([[ 8.,  9.,  4.],
       [ 8.,  2.,  7.],
       [ 2.,  0.,  6.]])

In [22]:
def generate_vector_in_subspace(A):
    return np.dot(A, np.random.rand(A.shape[1], 1))

In [23]:
b = generate_vector_in_subspace(A)
b

array([[ 14.73690914],
       [  9.74420931],
       [  2.81615397]])

In [24]:
np.linalg.solve(A, b)

array([[ 0.86042444],
       [ 0.79147891],
       [ 0.18255085]])

In [25]:
np.dot(np.linalg.inv(A), b)

array([[ 0.86042444],
       [ 0.79147891],
       [ 0.18255085]])

## Conditions

In [26]:
A = np.arange(1, 10).reshape(3, 3) # arange is similar to range()

In [27]:
cond = (A < 5)
A[cond]

array([1, 2, 3, 4])

In [28]:
# np.random.rand generates a random matrix of some shape
B = np.random.rand(1, 9).reshape(3, 3)
B

array([[ 0.62031159,  0.93959651,  0.09240734],
       [ 0.90931146,  0.54706804,  0.61100535],
       [ 0.72553682,  0.60442829,  0.77494305]])

In [29]:
B[cond] # selects the first four elements of the matrix (by row)

array([ 0.62031159,  0.93959651,  0.09240734,  0.90931146])

## Other Operations

In [30]:
a = np.random.rand(100)

In [31]:
a.mean()

0.52788969202690161

In [32]:
a.sum()

52.788969202690161

In [33]:
np.median(a)

0.48371040961201928

## Exercises

### Broadcasting

In [None]:
x = np.array([1, 2])
y = np.array([[3], [4]])
x + y # what does this output

### Linear Algebra

In [34]:
x = np.arange(1000).reshape(1000, 1)
b = np.ones((1000, 1))
X = np.append(x, b, axis=1)

Y = 2 * x[:,0] + 4*b + np.random.random()

Use Least Squares Linear Regression to find $\hat{\theta}$, weights on each column of $X$ such that it models $Y$. Remember, the formula for Least Squares Linear Regression is: 

$$X^TX\hat{\theta} = X^TY$$

In [None]:
theta_hat = ...
theta_hat

Find the loss of your model.

In [None]:
loss = ...
loss

## Final Notes on Numpy

Numpy makes scientific computing in Python possible. It's pretty fantastic. But there are many tiny details that might trip you up when using it in a practical setting. Sometimes it will have to with using functions properly, othertimes it will be low-level messups.

For a common one that often gets me annoyed, see <a href="http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html">this link</a>.

For readers who are interested in speeding up these operations across multiple computers, or on graphics cards, see --
* <a href="https://github.com/cupy/cupy">CuPy</a> - Used in popular ML libraries
* <a href="https://github.com/enthought/distarray">DistArray</a> - I really don't know much about this

Also, whenever you cry for help with a numpy function, remember you can always call
    ```help(np.arange) ``` in a cell

# Pandas

Pandas is a commonly used data processing library. 

Data is stored in **DataFrame** objects, which is a collection of **Series** objects, which represent columns.

We'll go over an example EDA (exploratory data analysis) and feature engineering process on some data in Pandas.

In [35]:
titanic_train = pd.read_csv('data/titanic/train.csv')
titanic_test = pd.read_csv('data/titanic/test.csv')

First, let's look at the data itself.

In [36]:
titanic_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Next, let's do some data cleaning. Are there any missing values?

In [37]:
titanic_train.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

The first column with missing values is **Age**. One way we can deal with missing *quantitative* data is **imputing** the missing values with the mean of the column.

We use the <a href=https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html>**.fillna**</a> function of Pandas to do this.

In [38]:
titanic_train['Age'] = titanic_train['Age'].fillna(titanic_train['Age'].mean())

The next column with missing values is **Cabin**. In general, the **Cabin** column is weird, so let's investigate it further. We use the <a href=https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html>.unique</a> to look at the different values of the column.

In [39]:
titanic_train['Cabin'].unique()

array([nan, 'C85', 'C123', 'E46', 'G6', 'C103', 'D56', 'A6', 'C23 C25 C27',
       'B78', 'D33', 'B30', 'C52', 'B28', 'C83', 'F33', 'F G73', 'E31',
       'A5', 'D10 D12', 'D26', 'C110', 'B58 B60', 'E101', 'F E69', 'D47',
       'B86', 'F2', 'C2', 'E33', 'B19', 'A7', 'C49', 'F4', 'A32', 'B4',
       'B80', 'A31', 'D36', 'D15', 'C93', 'C78', 'D35', 'C87', 'B77',
       'E67', 'B94', 'C125', 'C99', 'C118', 'D7', 'A19', 'B49', 'D',
       'C22 C26', 'C106', 'C65', 'E36', 'C54', 'B57 B59 B63 B66', 'C7',
       'E34', 'C32', 'B18', 'C124', 'C91', 'E40', 'T', 'C128', 'D37',
       'B35', 'E50', 'C82', 'B96 B98', 'E10', 'E44', 'A34', 'C104', 'C111',
       'C92', 'E38', 'D21', 'E12', 'E63', 'A14', 'B37', 'C30', 'D20',
       'B79', 'E25', 'D46', 'B73', 'C95', 'B38', 'B39', 'B22', 'C86',
       'C70', 'A16', 'C101', 'C68', 'A10', 'E68', 'B41', 'A20', 'D19',
       'D50', 'D9', 'A23', 'B50', 'A26', 'D48', 'E58', 'C126', 'B71',
       'B51 B53 B55', 'D49', 'B5', 'B20', 'F G63', 'C62 C64', 'E24',

We can also look at the counts of each value in the column.

In [40]:
titanic_train['Cabin'].value_counts().head()

B96 B98        4
G6             4
C23 C25 C27    4
F2             3
C22 C26        3
Name: Cabin, dtype: int64

Seems like each entry has maybe a Floor and a room number: however, some entries seem to have multiple cabins, and some entries are even more interesting: "T", "F E69". There are many ways to approach this data, but for now, let's just take the Floor letter from each cabin and place it into a new column.

Note: this may not be the best way to use the Cabin column: if the goal is to predict if a person survived, it may be important to save not just the floor but also the cabin number---i.e. if different people stay in the same room, maybe they all survived or all died.

In [41]:
titanic_train['Floor'] = titanic_train['Cabin'].apply(lambda cabin: cabin[0] if type(cabin) != float else cabin)

In [42]:
titanic_train['Floor'].value_counts()

C    59
B    47
D    33
E    32
A    15
F    13
G     4
T     1
Name: Floor, dtype: int64

In [43]:
titanic_train['Floor'].unique()

array([nan, 'C', 'E', 'G', 'D', 'A', 'B', 'F', 'T'], dtype=object)

Let's also take a look at the types of data in some of the rest of the columns.

In [44]:
titanic_train['Sex'].unique()

array(['male', 'female'], dtype=object)

In [45]:
titanic_train['SibSp'].unique()

array([1, 0, 3, 4, 2, 5, 8])

In [46]:
titanic_train['Pclass'].unique()

array([3, 1, 2])

In [47]:
titanic_train['Parch'].unique()

array([0, 1, 2, 5, 3, 4, 6])

In [48]:
titanic_train['Embarked'].unique()

array(['S', 'C', 'Q', nan], dtype=object)

## Dropping Columns, Inplace

Above, when we said:

In [49]:
titanic_train['Age'] = titanic_train['Age'].fillna(titanic_train['Age'].mean())

We had to set it equal to the column after we called **.fillna**: this is because almost all Pandas functions are **non-destructive** by default---if you're performing an operation on the column, Pandas will create a new column, rather than replace an old column.

For example, the <a href=https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html>**.drop**</a> method will not remove a column from a DataFrame, it will create a copy of the DataFrame without that column:

In [50]:
titanic_train['dummy'] = 1
titanic_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Floor,dummy
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,,1
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,C,1
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,,1
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,C,1
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,,1


In [51]:
titanic_train.drop('dummy', axis=1).head() # axis = 1 means drop columns, not rows: if you wanted to drop rows, pass in the row index

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Floor
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,C
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,


However, if we pass in **inplace=True**, then Pandas will delete the column in the original DataFrame: many other functions in Pandas have this functionality.

In [52]:
titanic_train['dummy'] = 1
titanic_train.drop('dummy', inplace=True, axis=1)
titanic_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Floor
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,C
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,


Be warned: doing things inplace is dangerous! Say, for example, it took a really long time to load in your database (maybe you had to do some web scraping, or you downloaded it directly from a URL and you lost Internet connection). If you do **drop** operations inplace, without saving the original state of the DataFrame, you could lose data.

In general, it is usually a good idea to save your DataFrame in states throughout your EDA.

## One-hot encoding

A lot of the Titanic data is **categorical**: one way to deal with this kind of data so that we can do predictive modeling is **one-hot encoding**, which means we transform a column, "Pclass" for example, which has 3 different values into 3 different columns with 0 or 1 values, e.g. the values are 1, 2, 3, so 2 turns into [0 1 0].

We use the <a href=https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html>**get_dummies**</a> function in Pandas.

Let's do this for some of the columns in the data.

In [53]:
titanic_train_copy = titanic_train.copy() # save the state of your DF

def one_hot(df, columns):
    for column in columns:
        # this means one-hot encode the column, and make the column title Pclass_{value}, for example
        col_onehot = pd.get_dummies(df[column], prefix=column) 
        df.drop(column, axis=1, inplace=True)
        df = df.join(col_onehot)
    return df

titanic_train_one_hot = one_hot(titanic_train_copy, ['Pclass', 'Sex', 'SibSp', 'Parch'])

In [54]:
titanic_train_one_hot.head()

Unnamed: 0,PassengerId,Survived,Name,Age,Ticket,Fare,Cabin,Embarked,Floor,Pclass_1,...,SibSp_4,SibSp_5,SibSp_8,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6
0,1,0,"Braund, Mr. Owen Harris",22.0,A/5 21171,7.25,,S,,0,...,0,0,0,1,0,0,0,0,0,0
1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,PC 17599,71.2833,C85,C,C,1,...,0,0,0,1,0,0,0,0,0,0
2,3,1,"Heikkinen, Miss. Laina",26.0,STON/O2. 3101282,7.925,,S,,0,...,0,0,0,1,0,0,0,0,0,0
3,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,113803,53.1,C123,S,C,1,...,0,0,0,1,0,0,0,0,0,0
4,5,0,"Allen, Mr. William Henry",35.0,373450,8.05,,S,,0,...,0,0,0,1,0,0,0,0,0,0


NOTE: The .get_dummies function will do nothing with missing values, so when one-hot encoding columns with missing values, create a dummy value for these missing values, so it will turn into a category that .get_dummies will create a column for.

In [55]:
titanic_train['Floor'] = titanic_train['Floor'].fillna('null')
pd.get_dummies(titanic_train['Floor'], prefix='Floor').head()

Unnamed: 0,Floor_A,Floor_B,Floor_C,Floor_D,Floor_E,Floor_F,Floor_G,Floor_T,Floor_null
0,0,0,0,0,0,0,0,0,1
1,0,0,1,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,1
3,0,0,1,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,1


### Exercise

Clean the rest of the columns of the Titanic data set and use <a href=http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>sklearn.LogisticRegression</a> to create a model for the 'Survived' column.

# Answers to Exercises

### Broadcasting

In [56]:
x = np.array([1, 2])
y = np.array([[3], [4]])
x + y # what does this output

array([[4, 5],
       [5, 6]])

### Linear Algebra

In [57]:
x = np.arange(1000).reshape(1000, 1)
b = np.ones((1000, 1))
X = np.append(x, b, axis=1)

Y = 2 * np.arange(1000).reshape(1000, 1) + 4 + np.random.random()

In [58]:
theta_hat = np.linalg.solve(np.dot(X.T, X), np.dot(X.T, Y))
theta_hat

array([[ 2.        ],
       [ 4.41398437]])

In [59]:
loss = np.dot((Y - np.dot(X, theta_hat)).T, (Y - np.dot(X, theta_hat)))
loss

array([[  8.10227167e-19]])