# Lecture 14 Neural Network Example
__MATH 3480__ - Dr. Michael Olson

Reading:
* Geron, Chapter 10

## Building an ANN

To build an ANN, we first create our input and output layers.

### Input
How many datapoints are going into the model? 

In the MNIST dataset, each image had a resolution of 28x28, or a total of 28*28=784 pixels. So, our input layer has a width of 784, one node for each pixel.

### Output
What is the desired output? How many nodes are needed to represent that output?

In the MNIST dataset, the images are of numerical digits. So, the output is going to be any of the 10 digit. If we create 10 nodes (one for each digit), then we can use a Logistic/Softmax activation function to get a probability between 0 and 1. So, our output layer will have a width of 10.

### Hidden layers
Now, we determine the hidden layers. We have to decide how many hidden layers are needed and the width of each hidden layer.

#### Number of hidden layers
One hidden layer is generally enough, but deep neural networks (more than one hidden layer) have a higher parameter efficiency. Deep networks can use exponentially fewer neurons than shallow networks. Some rules of thumb:
* lower hidden layers (layers near the input) model low-level structures (e.g., line segments of various shapes and orientations)
* intermediate hidden layers combine these low-level structures to model intermediate-level structures (e.g., squares, circles)
* Higher hidden layers (layers near the output) model high-level structures (e.g., faces)

*Transfer learning*: using lower layers from one model in another situation. For example, using the lower layers in the above example to recognize line segments, then build a network to identify animals instead of faces.

#### Width of hidden layers
Early models used a pyramid structure - larger layers leading to gradually smaller layers. Experience has shown this really doesn't perform any better than layers with the same number of neurons. In fact, equally-sized layers tend to perform slightly better than decreasing layers.

Generally, we do start with a larger first hidden layer, then shrink to a smaller layer and keep other hidden layers roughly the same size.

In the past, programmers would retrain NNs, gradually increasing the number of neurons, until the model starts to overfit. More modern models start with large numbers of neurons and use early-stopping techniques to prevent the models from overfitting.

## Example with Fashion MNIST dataset

* Create a new virtual environment for Tensorflow and Keras
* Install Tensorflow and Keras
* Check for the version of Tensorflow and Keras that you are using

In [None]:
import tensorflow as tf
from tensorflow import keras

print(tf.__version__)
print(keras.__version__)

#### Load the data

In [None]:
# Load the dataset
fashion_mnist = keras.datasets.fashion_mnist
(X, y), (X_test, y_test) = fashion_mnist.load_data()

# There are 60,000 images of size 28x28
X.shape

In [None]:
X_test.shape

In [None]:
# NOTE! Each pixel of the image is represended as a value from 0 to 255.
# Two problems
   # We want a value from 0 to 1
   # It is an integer, not a float value
# To fix both, divide by 255.0


# We have a test set, but we need a validation set:
X_valid, X_train = X[:5000] / 255.0 , X[5000:] / 255.0
y_valid, y_train = y[:5000], y[5000:]
X_test = X_test / 255.0

In [None]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
[class_names[y_train[i]] for i in range(20) ]

#### Plot the training images

In [None]:
from matplotlib.image import imread
import matplotlib.pyplot as plt

ax = ['ax1','ax2','ax3','ax4','ax5','ax6','ax7','ax8','ax9','ax10','ax11','ax12','ax13','ax14','ax15','ax16','ax17','ax18','ax19','ax20']
f, ax = plt.subplots(1, 20, sharey=True, figsize=(25,6))
for i in range(20):
    img = ax[i].imshow(X_train[i])
    cmap = plt.cm.get_cmap('gray_r')
    img.set_cmap(cmap)
    ax[i].axis('off')

#### Create the model

In [None]:
model = keras.models.Sequential()                        # Input layer - Single stack of layers
model.add(keras.layers.Flatten(input_shape=[28,28]))     # Some Preprocessing - converts each input image into a 1D array
model.add(keras.layers.Dense(300, activation="relu"))    # Hidden layer
model.add(keras.layers.Dense(100, activation="relu"))    # Hidden layer
model.add(keras.layers.Dense(10, activation="softmax"))  # Output layer

In [None]:
# This cell is the same as the previous cell

from keras.layers import Dense
model = keras.models.Sequential([                        # Input layer - Single stack of layers
    keras.layers.Flatten(input_shape=[28,28]),           # Some Preprocessing - converts each input image into a 1D array
    Dense(300, activation="relu"),                       # Hidden layer
    Dense(100, activation="relu"),                       # Hidden layer
    Dense(10, activation="softmax")                      # Output layer
])

In [None]:
model.summary()
# "Param #" is the number of weights and biases leading into that layer
# The first layer has 784 neurons
# That is 784*300 connections, so 784*300 weights and 300 biases
# total of 235200 + 300 = 235500

In [None]:
# Get a list of the layers
model.layers

In [None]:
# Get values for all weights leading to a layer
weights, biases = model.layers[1].get_weights()
weights

In [None]:
biases

#### Train the model

In [None]:
# Compile the model
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",                           # sgd = Stochastic Gradient Descent
              metrics=["accuracy"])

history = model.fit(X_train, y_train, epochs = 30, validation_data=(X_valid, y_valid))

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

pd.DataFrame(history.history).plot(figsize=(8,5))
plt.grid(True)
plt.gca().set_ylim(0,1)
plt.show()

In [None]:
model.evaluate(X_test, y_test)

#### Test the model

In [None]:
ax = ['ax1','ax2','ax3']
f, ax = plt.subplots(1, 3, sharey=True, figsize=(6,6))
for i in range(3):
    img = ax[i].imshow(X_test[i])
    cmap = plt.cm.get_cmap('gray_r')
    img.set_cmap(cmap)
    ax[i].axis('off')

In [None]:
### Classes:  
# ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
y_prob = model.predict(X_test[:3])
y_prob.round(2)

In [None]:
# Get values for all weights leading to a layer
weights, biases = model.layers[1].get_weights()
weights

In [None]:
biases

## Saving the model

In [None]:
# To save a model for future use
model.save("FashionMNIST.h5")

In [None]:
# To load a model that was previously saved
loaded_model = keras.models.load_model('FashionMNIST.h5')

# Then, use as normal
starting_image = 17
y_prob = loaded_model.predict(X_test[starting_image:starting_image+3])
print(y_prob.round(2))

f, ax = plt.subplots(1, 3, sharey=True, figsize=(6,6))
for i in range(3):
    img = ax[i].imshow(X_test[starting_image+i])
    cmap = plt.cm.get_cmap('gray_r')
    img.set_cmap(cmap)
    ax[i].axis('off')

### Classes:  
# ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]