# Basic Neural Network Model
## Keras

In this lab, we are going to drill down into some Neural Network basics using the Keras package with the TensorFlow backend.

## Artificial neuron

Recall the concept of a [neuron](https://en.wikipedia.org/wiki/Artificial_neuron) based on its mathematical formula.

$$ y_k = \varphi \left( \sum_{j=0}^{m}{w_{kj}x_j} +b_k \right) $$

This is a simple linear neuron.

Keras, as well as other NN packages, support numerous types of neurons.
Typically, neurons are composed into layers, and a single layer has only a single type of neuron.

In this lab, we are going to look at some data that is, first, easily separable; then later less separable.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import scale, LabelBinarizer
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Random seed for numpy
np.random.seed(18937)

## Consider data that is easy to divide

First, we will generate some data that is easily separated.
This data is easily separated by a decision along the first axis.

In [None]:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=300, centers=2, n_features=2)
X = scale(X, with_mean=False, with_std = False) # Center X
plt.figure(figsize=(7,7))
plt.scatter(X[:,0], X[:,1], c=y)

### Construct a neural network

Now we will construct a basic Neural Network with
 * One hidde layer fed by 2 input values
 * One output later 
 
##### Note: The summary will show that we have 5 total learnable parameters:
  * 3 for the hidden layer ($X_0$, $X_1$, and bias) 
  * 2 for the output layer (Hidden ($H_0$) and bias) 
  

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

# Build a mode that is composed of this list of layers
model = Sequential(
    [
          # This specifies a single neuron, and the input is 2 numbers.
    Dense(1, input_dim=2),  # a dense layer, every neuron is connected to all points from the lower layer (input)
    Activation('linear'),   # Specify the type of decision surface, i.e., simple linear regression
    Dense(1),               # another dense layer, input_dim is inferred from the previous layer's output
    Activation('sigmoid')   # Specify the type of decision surface, i.e., simple logistic regression
    ]
)
model.summary()

This number of trainable parameters highlights is the power and cost of the NN models.
We can see that clearly, a two parameter mode should be sufficient (think decision tree with feature specification and decision point on feature).
However, we are trainng 5 or 150% increace over the parameters of a decision tree that would achieve the same.

In [None]:
# For a binary classification problem we are defining our loss as "binary" and the measurement as cross-entropy
model.compile(optimizer='rmsprop',  # this is an optimizer name, we will revisit this part later!
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
# Train the model, iterating on the data in batches of 4 samples
model.fit(X, y, epochs=10, batch_size=4)

#  recall, an epoch is a round of training in which the model sees all the training data one time
#  Epoch = all 300 training sample here
#  Batch is the number of feed forward training samples pushed through the network before the 
#          accumulated error is pushed back


# Jiggle the data

Below we jiggle the data a little bit to create a test set.
This is done by generating some random noise and adding it to the existing data points.

In [None]:
X_test = X + np.random.normal(0.0, 0.5, X.size).reshape(300,2)
plt.figure(figsize=(7,7))
plt.scatter(X_test[:,0], X_test[:,1], c=y)

In [None]:
score = model.evaluate(X_test, y, batch_size=4, verbose=1)

### To understand what we get from the model evaluation, let's look at the function through help.

In [None]:
help(model.evaluate)

In [None]:
model.metrics_names

In [None]:
score

So, our loss was very small and the accuracy was 1.0.
We should have expected this!
The data was easy!!!

---
## Consider data that is less easy to divide

In [None]:
from sklearn.datasets.samples_generator import make_blobs
X, y = make_blobs(n_samples=1000, centers=2, n_features=2, random_state = 76533)
X = scale(X, with_std = False) # Center X
plt.figure(figsize=(7,7))
plt.scatter(X[:,0], X[:,1], c=y)

In [None]:
# Test data
X_test = X + np.random.normal(0.0, 0.5, X.size).reshape(int(X.size/2),2)
plt.figure(figsize=(7,7))
plt.scatter(X_test[:,0], X_test[:,1], c=y)

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

modelV = Sequential(
    [
    Dense(1, input_dim=2),
    Activation('linear'),
    Dense(1),
    Activation('sigmoid')
    ]
)
# For a binary classification problem
modelV.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [None]:
# Train the model, iterating on the data in batches of 4 samples
modelV.fit(X, y, epochs=10, batch_size=4)

#### Notice the accuracy takes longer to get above 90%

In [None]:
score = modelV.evaluate(X_test, y, batch_size=4)
score

### Now lets adjust the hidden layer from 1 to 2 neurons

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

modelW = Sequential(
    [
    Dense(2, input_dim=2),
    Activation('linear'),
    Dense(1),
    Activation('sigmoid')
    ]
)
# For a binary classification problem
modelW.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
# Train the model, iterating on the data in batches of 4 samples
modelW.fit(X, y, epochs=10, batch_size=4)

In [None]:
score = modelW.evaluate(X_test, y, batch_size=4)
score

### Now lets adjust the hidden layer from 1 layer with 2 neurons

### To 2 layers with 2 neurons

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation

modelY = Sequential(
    [
    Dense(2, input_dim=2),
    Activation('linear'),
        # Notice we are adding a new hidden layer
    Dense(2, input_dim=2),
    Activation('linear'),
    Dense(1),
    Activation('sigmoid')
    ]
)
# For a binary classification problem
modelY.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
# Train the model, iterating on the data in batches of 4 samples
modelY.fit(X, y, epochs=10, batch_size=4)

In [None]:
score = modelY.evaluate(X_test, y, batch_size=4)
score

---
# Keras API and helpful links

 * Layers: https://keras.io/layers/core/
 * Loss / Loss Functions : https://keras.io/losses/
 * Optimizers (learning algorithm) : https://keras.io/optimizers/
 * Neuron Activation Functions : https://keras.io/activations/
 
#### Now, look at using a customized optimizer:

We will specify the Stochastic Gradient Descent optimizer (vector calculus fun)
```
sgd = optimizers.SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
```

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import optimizers

modelZ = Sequential(
    [
    Dense(2, input_dim=2),
    Activation('linear'),
    Dense(1),
    Activation('sigmoid')
    ]
)

# Changing learning_rate and moments from default
sgd = optimizers.SGD(lr=0.001, momentum=0.1)

# For a binary classification problem
modelZ.compile(optimizer=sgd,  # previous we used a string, rmsprop that got us that optimizer with default values!
              loss='binary_crossentropy',
              metrics=['accuracy'])
# Train the model, iterating on the data in batches of 4 samples
modelZ.fit(X, y, epochs=10, batch_size=4)

In [None]:
score = modelZ.evaluate(X_test, y, batch_size=4)
score

### Please restart the kernel and clear all output, then play around with parameters or add cells and create additional notebooks

# Save your notebook