# Lecture 3. numpy, pandas, matplotlib

<center>
<img src="https://camo.githubusercontent.com/2eb0e1096ba504bf9e1446db83a6d1753c3af994/68747470733a2f2f6b6f7a696b6f772e66696c65732e776f726470726573732e636f6d2f323031362f30372f73637265656e73686f74312e706e673f773d31313430" alt="Drawing" style="width: 700px;"/>
</center>

<font color='orange'>
Seunghyeon Yu
</font>

# Overview

* intro
* numpy
* pandas
* matplotlib

## Why we use packages?
> "Stand on the shoulders of giants"
> -Issac Newton

* Time Saving
* Stability (by many open source contributors, [GitHub](https://github.com/))
* Nice Documentations ([Read the Docs](https://readthedocs.org/))
* Have Fun!

## Package Dependencies

<center>
<img src="https://qph.ec.quoracdn.net/main-qimg-fad2e6702b134e3853707daa53214314" alt="Drawing" style="width: 600px;"/>

## Package Structure
```
game/
    __init__.py
    sound/
        __init__.py
        echo.py
        wav.py
    graphic/
        __init__.py
        screen.py
        render.py
    play/
        __init__.py
        run.py
        test.py
```
* `__init__.py` : Required to make the directories as pakcages.

# NumPy

<center>
<img src="https://www.kdnuggets.com/wp-content/uploads/numpy-logo.jpg" alt="Drawing" style="width: 300px;"/>
</center>

* Similar to Matlab
* N-dimensional object
* Mathematical functions
* Fast calculation 
* Linear algebra, Fourier transform, random number generation, Einstein summation ...

In [None]:
import numpy as np

### Arrays
**1D Array **

**2D Array**

Numpy also provides many functions to create arrays:

### Array Indexing

In [None]:
A = np.array([[1, 2, 3, 4],  # ----> x-axis
              [5, 6, 7, 8],  # |
              [9,10,11,12]]) # v
                             #   y-axis

### Array Math

In [None]:
X = np.array([[1, 2],
              [3, 4]])
Y = np.array([[5, 6],
              [7, 8]])

** Elementwise Math **

**Matrix Operations**

** [EX 1] Ordinary Least Square **
$$
    \beta = (X'X)^{-1}X'y
$$

In [None]:
X = np.array([[1, 2, 1],
              [3, 4, 2],
              [1, 2, 3],
              [1, 0, 5]])
y = np.array([2, 4, 2, 0])

** Random Number Generation **

In [None]:
u = np.array([1, 2, 3])    # mean vector
S = np.array([[7, 1, 3],   # covariance matrix
              [1, 5, 1],
              [3, 1, 4]])

## Tensor Operation (Optional)

** [About Tensors](https://www.tensorflow.org/versions/r0.12/resources/dims_types) ** 

| Rank | Math entity            | Python Example      | Tensor Notation
|------|------------------------|---------------------|--------------------
| 0    | Scalar (rank 0 Tensor) | `s = 483 `          |$T$             |
| 1    | Vector (rank 1 Tensor) |` v = [1.1, 2.2, 3.3]` |$T_{i}$  |
| 2    | Matrix (rank 2 Tensor) | `m = [[1, 2, 3], [4,5,6]]`     |$T_{ij}$       |
| 3    | rank 3 Tensor          |` t = [[[1,2], [4,5]], [[6,7], [8,9]]]`|$T_{ijk}$    |
| n    | rank n Tensor          | ....                |$T_{ijkl...}$        |

### einsum (Einstein Summation Convention) 


In [None]:
x, y = np.array([1,2,3,4]), np.array([1,2,1,2])

** vector sum **

** dot product **

** elementwise product **

**outer product**

In [None]:
X = np.array([[0,1,1,1],
             [2,3,1,4], 
             [1,2,1,4], 
             [5,2,1,2]])
Y = np.array([[4,2,3,1],
              [1,4,2,3],
              [1,2,1,1],
              [2,1,4,2]])

$X\vec{y}$

$\vec{y}^T X$

$XY$

** Tensor Vector Product **

<center>
<img src="https://i.stack.imgur.com/5QsMD.png" alt="Drawing" style="width: 600px;"/>
</center>

For more detail, see [here](http://venus.unive.it/r.casarin/PhDEco/Mat/Algebra.pdf).

In [None]:
T = np.random.randn(4,4,4)

$T_{ijk} y^i$

In [None]:
np.einsum('ijk,i->jk',T, y)

# Pandas

<center>
<img src="https://pandas.pydata.org/_static/pandas_logo.png" alt="Drawing" style="width: 600px;"/>
</center>
* Similar to R
* DataFrame object
* Apply functions
* Data merge, join, concatenate ...
* Data operations ...

In [None]:
import pandas as pd

## Series
(= 1D Array + index (or key))

** Index **

** Indexing **

** Elementwise Math **

** Missing Value **

## DataFrame

In [None]:
data = {'state' : ['VA', 'VA', 'VA', 'MD', 'MD'],
        'year' : [2012, 2013, 2014, 2014, 2015],
        'popul' : [5.0, 5.1, 5.2, 4.0, 4.1]}

In [None]:
df = pd.DataFrame(data)                     # dict -> pd.DataFrame
df

In [None]:
pd.DataFrame([[5.0, 'VA', 2012],           # np.ndarray + columns - > pd.DataFrame
              [5.1, 'VA', 2013],
              [5.2, 'VA', 2014],
              [4.0, 'MD', 2014],
              [4.1, 'MD', 2015]],
             columns = ['popul', 'state', 'year'])

** Retreiving **

> **<font color='red'> Caution ! </font>** **Not Recommend**

> attribute access using (**.**) can collide with other class methods. 

** Add, Delete Columns **

** Add, Delete Rows **

** Set Index **

**Transpose **

** Selection, Filtering **

** Apply Functions **

<center>
<img src="https://i.stack.imgur.com/DL0iQ.jpg" alt="Drawing" style="width: 500px;"/>

In [None]:
def my_func(col):
    return np.mean(col)

mask = [i for i in df.columns if i != 'state']
df[mask].apply(my_func, axis=0)

In [None]:
df[mask].apply(lambda col: np.mean(col), axis=0)

** Elementwise Apply Function **

In [None]:
df[mask].applymap(lambda x: x/10 + 2)

** Sorting **

In [None]:
df.sort_values(by=['year', 'popul'])

## Data Merge, Join

<center>
<img src="https://i.stack.imgur.com/hMKKt.jpg" alt="Drawing" style="width: 500px;"/>
</center>
See [here](https://pandas.pydata.org/pandas-docs/stable/merging.html).

**Concatenate**
<center>
<img src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_basic.png" alt="Drawing" style="width: 400px;"/>
</center>

In [None]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                     index=[0, 1, 2, 3])
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']},
                    index=[4, 5, 6, 7])

df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],     
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']},
                   index=[8, 9, 10, 11])

In [None]:
pd.concat([df1, df2, df3])      # axis=0

**Outer Join**
<center>
<img src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_axis1.png" alt="Drawing" style="width: 600px;"/>
</center>

In [None]:
df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
                    'D': ['D2', 'D3', 'D6', 'D7'],
                    'F': ['F2', 'F3', 'F6', 'F7']},
                   index=[2, 3, 6, 7])

In [None]:
pd.concat([df1, df4], axis=1) 

**Inner Join**
<center>
<img src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_axis1_inner.png" alt="Drawing" style="width: 600px;"/>
</center>

In [None]:
pd.concat([df1, df4], axis=1, join='inner') 

**Left Join**
<center>
<img src="https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_axis1_join_axes.png" alt="Drawing" style="width: 600px;"/>
</center>

In [None]:
pd.concat([df1, df4], axis=1, join_axes=[df1.index]) 

** More about SQL-like structures**

| **Merge method** | **SQL Join Name**  | **Descriptions**                          |
| :- | :- | :-
|      `left`      |  `LEFT OUTER JOIN` |       Use keys from left frame only       |
|      `right`     | `Right OUTER JOIN` |       Use keys from right frame only      |
|      `outer`     |  `FULL OUTER JOIN` |     Use union of keys from both frames    |
|      `inner`     |    `INNER JOIN`    | Use intersection of keys from both frames |

For detail, see [here](https://pandas.pydata.org/pandas-docs/stable/merging.html).

# Matplotlib
<center>
<img src="https://matplotlib.org/_static/logo2.png" alt="Drawing" style="width: 500px;"/>
</center>

* 2D, 3D plotting
* With jupyter, interactive mode

In [None]:
import matplotlib.pyplot as plt

### Basic Line Plot

In [None]:
x = np.array([0.1*i for i in range(100)])
y = np.sin(x)
plt.plot(x, y)
plt.show()

In [None]:
plt.plot(x, y, 'r--')
plt.title('x vs y')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

## Scatter Plot

In [None]:
x = np.random.randn(100)
y = np.random.randn(100)
colors = np.random.randn(100)
area = np.pi * (15*np.random.rand(100))**2
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

## Bar Chart

In [None]:
x = [1, 2, 3, 4]
y = [0.7, 0.3, 0.5, 0.2]
plt.figure(figsize=(10,5))
plt.bar(x, y, align='center', alpha=0.5)
plt.xticks(x, ['Value', 'Momentum', 'Low PER', 'High B/M'])
plt.ylabel('Sharpe Ratio')
plt.show()

## Distribution Plot

In [None]:
x = 0.5 + np.random.randn(10000)
plt.hist(x, alpha=0.5)
plt.show()

In [None]:
y = -0.5 + 0.2*np.random.randn(10000)
plt.hist(x, alpha=0.5, color='b', label='x', normed=True, bins=30)
plt.hist(y, alpha=0.5, color='r', label='y', normed=True, bins=10)
plt.legend()
plt.show()

## 3D Plot

In [None]:
from mpl_toolkits.mplot3d import Axes3D
%matplotlib notebook

** 3D Line **

In [None]:
fig = plt.figure()
ax = fig.gca(projection='3d')
theta = np.linspace(-4 * np.pi, 4 * np.pi, 100)
z = np.linspace(-2, 2, 100)
r = z**2 + 1
x = r * np.sin(theta)
y = r * np.cos(theta)
ax.plot(x, y, z)

** 3D Scatter **

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
x = np.random.randn(30)
y = np.random.randn(30)
z = 10*np.random.randn(30)
ax.scatter(x, y, z, marker = '.')
plt.show()

** 3D Surface **

In [None]:
from matplotlib import cm

In [None]:
fig = plt.figure()
ax = fig.gca(projection='3d')
x = np.arange(-5, 5, 0.1)
y = np.arange(-5, 5, 0.1)
x, y = np.meshgrid(x, y)
r = np.sqrt(x**2 + y**2)
z = np.sin(r)/r
surf = ax.plot_surface(x, y, z, linewidth=0, cmap=cm.coolwarm)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

## Useful Resources

* [Python Packages Binaries for Windows](https://www.lfd.uci.edu/~gohlke/pythonlibs/)
* [Data-science-notebooks](https://github.com/donnemartin/data-science-ipython-notebooks#deep-learning)

# END