# I. Outline key points for neural network

### perceptron
1. Onehot encode (for input)  
pd.get_dummies(data)  
sklearn.preprocessing.OneHotEncoder(sparse = False).fit_transform(data)
2. Normallization (for input)
3. linear combination (for output)
4. step function (discrete output)
5. sigmoid function (continuous output)
$$S(t)=\frac{1}{1+e^{-t}}$$
$$S'(t)=S(t)(1-S(t))$$
6. softmax function (multi-classification problem)
$$Softmax(x_i) = \frac{e^{x_i}}{\sum_i e^{x_i}}$$
7. percrptron algo (adjustment)
8. Maximum Likelihood (log transformation)
9. cross entropy ~ $\frac{1}{correct\space probability}$  
For 0-1 problem: 
$$Cross-Entropy(y, p) = -\sum_{i=1}^m y'_iln(y_i) + (1-y'_i)ln(1-y_i)$$
where $y_i$ is the predicted probability value for class i and $y'_i$ is the true probability for that class. m is data size.  
For multi-class problem:
$$Cross-Entropy = \sum_{i=1}^m \sum_{j=1}^n -y'^{(j)}_i ln(y^{(j)}_i)$$
where n is classes
10. Error Function
$$E(w, b) = -\frac{1}{m} \sum_{i=1}^m y'_i ln(\sigma(wx_i+b)) + (1-y'_i)ln(1-\sigma(wx_i+b))$$
11. Gradient Descent

### Neural Network
1. Feedforward Neural Network
2. Backpropagation

In [37]:
# 1. Onehot
# example 1:
import pandas as pd
from sklearn import preprocessing
data = pd.read_excel('data.xlsx')

method_pd = pd.get_dummies(data, columns = ['y'])
sklearn_onehot = preprocessing.OneHotEncoder(sparse = False)
method_sklearn = sklearn_onehot.fit_transform(data['y'].values.reshape(-1, 1))

# example 2:
print('Example 2:')
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
print(enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]))
print(enc.n_values_)
print(enc.feature_indices_)
print(enc.transform([[0, 1, 1]]).toarray())

Example 2:
OneHotEncoder(categorical_features='all', dtype=<class 'numpy.float64'>,
       handle_unknown='error', n_values='auto', sparse=True)
[2 3 4]
[0 2 5 9]
[[1. 0. 0. 1. 0. 0. 1. 0. 0.]]


### Problems when training neural network
1. overfitting & underfitting  
(1) Model complexity graph (determine epoch, early stopping)  
(2) L1(feature selection), L2(use most) Regularization (Penalize for large weights)  
(3) Dropout: randomly close some nodes in each epoch  
prob each node will be dropout = 0.2 20% nodes will be turned off
2. bias & variance
3. gradient descent(local minimum, gradient disappear)  
(1) activate function(tanh, relu)  
(2) stochastic gradient descent(mini-batch, different dataset for each epoch)  
(3) learning rate(decrease learning rate when model not good)
(4) random restart
(5) momentum $\beta$
$$STEP(n)=STEP(n)+\beta STEP(n-1)+\beta^2STEP(n-2)+...$$


# II. Algorithms

### 1. perceptron algorithm
* step 1: start with random weights: w1, ..., wn, b  
* step 2: for every misclassified point  
```
if prediction = 0:
    for i in range(n)
        change wi to wi + axi
        change b to b + a
where a is learning rate
if prediction = 1:
    for i in range(n)
        change wi to wi - axi
        change b to b - a```

In [1]:
import numpy as np
# Setting the random seed, feel free to change it and see different solutions.
np.random.seed(42)

def stepFunction(t):
    if t >= 0:
        return 1
    return 0

def prediction(X, W, b):
    return stepFunction((np.matmul(X,W)+b)[0])

# TODO: Fill in the code below to implement the perceptron trick.
# The function should receive as inputs the data X, the labels y,
# the weights W (as an array), and the bias b,
# update the weights and bias W, b, according to the perceptron algorithm,
# and return W and b.
def perceptronStep(X, y, W, b, learn_rate = 0.01):
    for i in range(len(X)):
        y_hat = prediction(X[i],W,b)
        if y[i]-y_hat == 1:
            W[0] += X[i][0]*learn_rate
            W[1] += X[i][1]*learn_rate
            b += learn_rate
        elif y[i]-y_hat == -1:
            W[0] -= X[i][0]*learn_rate
            W[1] -= X[i][1]*learn_rate
            b -= learn_rate
    return W, b
    
# This function runs the perceptron algorithm repeatedly on the dataset,
# and returns a few of the boundary lines obtained in the iterations,
# for plotting purposes.
# Feel free to play with the learning rate and the num_epochs,
# and see your results plotted below.
def trainPerceptronAlgorithm(X, y, learn_rate = 0.01, num_epochs = 25):
    x_min, x_max = min(X.T[0]), max(X.T[0])
    y_min, y_max = min(X.T[1]), max(X.T[1])
    W = np.array(np.random.rand(2,1))
    b = np.random.rand(1)[0] + x_max
    # These are the solution lines that get plotted below.
    boundary_lines = []
    for i in range(num_epochs):
        # In each epoch, we apply the perceptron step.
        W, b = perceptronStep(X, y, W, b, learn_rate)
        boundary_lines.append((-W[0]/W[1], -b/W[1]))
    return boundary_lines

### 2. Gradient Descent
* step 1: start with random weights: w1, ..., wn, b  
* step 2: for every point $(x_1, x_2, ..., x_n)$  
```
For i in range(n):
    Update wi to wi - a*partial(E)/partial(wi)
    Update b to b - a*partial(E)/partial(b)
```
* Workshop: Implementing the Gradient Descent Algorithm

### 3. Backpropagation
* Workshop: Student Admissions
* Reference:  
(1) https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/  
(2) http://neuralnetworksanddeeplearning.com/chap2.html