![data-x](http://oi64.tinypic.com/o858n4.jpg)

---
# Data-X Crash Course


**Author:** Alexander Fred-Ojala


**License Agreement:** Feel free to do whatever you want with this code

___

# Outline

> ### 0. Note on tools
> ### 1. Intro to Jupyter Notebooks
> ### 2. Numpy, pandas and matplotlib
> ### 3. Scikit-learn and Machine Learning
> ### 4. Tensorflow / Keras for Deep Learning

-----

# 1. Jupyter Notebooks

1. Markdown cells to annotate work
2. Code cells to run code

----
### 1.1 Markdown cells

Format text **bold**, _italic_, `code`, 

#### Latex:
$$\int^{\inf}_{-\inf}\text{erf}^{2\pi x}\text{dx} = \frac{1}{\sqrt{2\pi}}$$

#### Images:
<img src='imgs/dx_logo.png' width='300px'>

#### Bullet lists:
- Unordered
- list

1. Ordered
2. list

<br><br>

**Find a lot of useful Markdown commands here:** https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

---

## 1.2 Code cells

Run Python code sequentially

In [None]:
1+2
3+2

In [None]:
def add(x,y):
    return(x+y)

add(2,2)

In [None]:
import time
s = 'Data-X is the best UC Berkeley class!! '
l = len(s)

while True:
    for i in range(l):
        print(s[-i:]+s[:-i],end='\r',flush=True)
        time.sleep(.1)
    

##### For more info about Jupyter Notebooks and Python see: [The Intro Notebook](../01-intro/python-jupyter-basics_short_afo.ipynb)

---

## 2. Python's data science trinity:
### Numpy & Pandas (& Matplotlib)

In [None]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

## 2.1 Numpy
Python's Numerical computation library. Great for multidim arrays, data manipulation, generation, and broadcasting.

In [None]:
np.array([[1,2,3],[4,5,6]])

In [None]:
np.random.randn(10).reshape(5,2)

In [None]:
np.eye(5)

In [None]:
x = np.linspace(0,3*np.pi,7)
x

In [None]:
np.round(np.sin(x),10)

In [None]:
x = np.linspace(0,10,100)
y = np.sin(x) + np.random.normal(0,.1,100)
plt.plot(x,y);

##### For more info about Jupyter Notebooks and Python see: [The Numpy Notebook](../02-AI-stack/numpy.ipynb)

---

## 2.2 Pandas
Python's tabular data library (my favorite!)

In [None]:
df = pd.DataFrame(np.random.randn(120).reshape(20,6))
df.head()

In [None]:
dfg = pd.read_csv('./02-AI-stack/data/googl.csv').drop('Unnamed: 0',axis=1) # Google stock data

In [None]:
dfg.head()

In [None]:
dfg = dfg.set_index(pd.to_datetime(dfg['Date'])).drop('Date',axis=1)

In [None]:
dfg.head()

In [None]:
dfg['2017-04']

In [None]:
dfg.describe()

In [None]:
np.mean(dfg['2017-02']['High']-dfg['2017-02']['Low'])

In [None]:
dfg['2017-02'][['Low','High']].plot()
plt.title('Feb 2017 Low High for Google')
plt.xlabel('Date')
plt.ylabel('Price');

##### For more info about Pandas see: [The Pandas Notebook](../02-AI-stack/pandas.ipynb)

##### For more info about Pandas see: [The Matplotlib Notebook](../02-AI-stack/matplotlib.ipynb)

---

## 3. Machine Learning with Scikit-Learn

In [None]:
from sklearn import datasets

In [None]:
data = datasets.load_iris()

In [None]:
print(data.DESCR[:1050])

In [None]:
X = pd.DataFrame(data.data,columns=data.feature_names)
X.head()

In [None]:
Y = pd.Series(data.target,index=np.repeat(data.target_names,50))
Y.value_counts()

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_val, Y_train, Y_val = train_test_split(X,Y,test_size=0.4,shuffle=True,random_state=1)

In [None]:
X_train.shape, Y_train.shape

In [None]:
X_val.shape, Y_val.shape

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC

#### Logistic Regression (Softmax)

In [None]:
model = LogisticRegression() # instantiate
model.fit(X_train,Y_train) # train
print('Accuracy on validation:',np.round(model.score(X_val,Y_val)*100,2),'%') # predict and evaluate

#### Random Forest

In [None]:
model = RandomForestClassifier() # instantiate
model.fit(X_train,Y_train) # train
print('Accuracy on validation:',np.round(model.score(X_val,Y_val)*100,2),'%') # predict and evaluate

#### Support Vector Machine

In [None]:
model = LinearSVC() # instantiate
model.fit(X_train,Y_train) #train
print('Accuracy on validation:',np.round(model.score(X_val,Y_val)*100,2),'%') # predict and evaluate

##### For more info about Scikit-learn & ML workflow see: [The Titanic Notebook](../04-tools-prediction-titanic/titanic.ipynb)

----

## Neural Networks / Deep Learning: Keras (TensorFlow backend)

In [None]:
from keras.models import Sequential
from keras.layers import Dense

In [None]:
model = Sequential()
model.add(Dense(100,input_shape=[4],activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(3,activation='softmax'))

In [None]:
model.summary()

In [None]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

In [None]:
Y_train = pd.get_dummies(Y_train)
Y_val = pd.get_dummies(Y_val)

In [None]:
model.fit(X_train,Y_train,batch_size=10,epochs=100,verbose=0)

In [None]:
_,acc = model.evaluate(X_val,Y_val,verbose=0)
print('Accuracy',np.round(acc*100),'%')

##### For more info about Keras / TensorFlow see: [TensorFlow](../05-deep-learning/) and [Keras](../05-deep-learning/catsVSdogs)