<a href="https://colab.research.google.com/github/TylerJSimpson/ML_neural_networks/blob/main/MNIST_classification_NN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Neural Network - Classification Problem**
## MNIST

### **Import Packages**

In [173]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

### **Import Data**

In [174]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

### **Preprocess**

#### Split and Scale

In [175]:
#mnist has train and test datasets built in but no validation set
#must split manually
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test'] 
#train 60,000 test 10,000, take from train (larger set)

num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples #take 10% of train data for validation
num_validation_samples = tf.cast(num_validation_samples, tf.int64) #ensure value is integer

num_test_samples = mnist_info.splits['test'].num_examples
num_test_samples = tf.cast(num_validation_samples, tf.int64)

#scale inputs
def scale(image, label):
  #make sure image is a float
  #then scale 0-255 grey shades to 0-1 (float)
  image = tf.cast(image, tf.float32)
  image /= 255.
  return image, label

scaled_train_and_validation_data = mnist_train.map(scale) #scale train and validation data
test_data = mnist_test.map(scale) #scale test data

#### Shuffle
Allow for randomization between batches.  
Avoids confusion of SGD.

In [176]:
BUFFER_SIZE = 10000 
#buffer_size = 1, no shuffling will actually happen
#buffer_size >= num_samples, shuffling will happen at once (uniformly)
#1 < buffer_size < num_samples, optimizing computational power

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)

#### Batching
mini-batch GD

In [177]:
BATCH_SIZE = 100
#batch size = 1, stochastic gradient descent (SGD)
#batch size = # samples, single batch GD
#1 < batch size < # samples, mini-batch GD

train_data = train_data.batch(BATCH_SIZE)
#don't have to batch validation data because there won't be back propogation
#still overwrite as TF expects data to be in batch form
validation_data = validation_data.batch(num_validation_samples)
test_data = test_data.batch(num_test_samples)

#MNIST data is iterable and in 2-tuple format (as_supervised=True)
validation_inputs, validation_targets = next(iter(validation_data))

### **Model**

#### Outline  
*   784 inputs (28x28 grid)
*   10 outputs (1 for each digit 0-9)
*   2 hidden layers (node size arbitrary, needs optimized)

In [178]:
input_size = 784
output_size = 10
hidden_layer_size = 100 #assumption is that all hidden layers are the same size

model = tf.keras.Sequential([
                             tf.keras.layers.Flatten(input_shape=(28,28,1)),
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'), #rectify linear unit
                             tf.keras.layers.Dense(hidden_layer_size, activation='tanh'),
                             tf.keras.layers.Dense(output_size, activation='softmax')
                             ])

#### Optimizer and Loss Function

In [179]:
#optimizer: adaptive moment estimation (adam) is currently meta
#loss function: cross-entropy usually good for classifier problems, 3 built in variations
#binary_crossentropy, categorical_crossentropy, sparse_categorical_crossentropy
#sparse_categorical_crossentropy applies one-hot encoding
model.compile(
    optimizer='adam', 
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
    )


### **Training**


Each epoch:

1.   At the beginning of each epoch, the training loss will be set to 0
2.   The aglorithm will iterate over a preset number of batches, all from *train_data*
1.   The weights and biases will be updated as many times as there are batches
1.   Obtain value for the loss function indicating training progress
2.   See training accuracy
2.   At the end of each epoch, the algorithm will forward propagate the whole validation set
1.   Upon reaching the maximum number of epochs training will be over










In [180]:
NUM_EPOCHS = 10

model.fit(train_data, epochs = NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose=2)

#loss: decreasing, no large changes due to 540 bias and weight changes across first epoch (due to batches)
#accuracy: increasing, in what % of cases outputs = targets, average accuracy
#val_loss: check if model overfitting
#val_accuracy: true accuracy of the model

Epoch 1/10
540/540 - 9s - loss: 0.3222 - accuracy: 0.9071 - val_loss: 0.1465 - val_accuracy: 0.9543 - 9s/epoch - 17ms/step
Epoch 2/10
540/540 - 4s - loss: 0.1306 - accuracy: 0.9611 - val_loss: 0.1132 - val_accuracy: 0.9652 - 4s/epoch - 8ms/step
Epoch 3/10
540/540 - 4s - loss: 0.0921 - accuracy: 0.9720 - val_loss: 0.0821 - val_accuracy: 0.9748 - 4s/epoch - 8ms/step
Epoch 4/10
540/540 - 4s - loss: 0.0697 - accuracy: 0.9786 - val_loss: 0.0645 - val_accuracy: 0.9798 - 4s/epoch - 8ms/step
Epoch 5/10
540/540 - 4s - loss: 0.0555 - accuracy: 0.9825 - val_loss: 0.0585 - val_accuracy: 0.9828 - 4s/epoch - 8ms/step
Epoch 6/10
540/540 - 4s - loss: 0.0447 - accuracy: 0.9866 - val_loss: 0.0437 - val_accuracy: 0.9855 - 4s/epoch - 8ms/step
Epoch 7/10
540/540 - 5s - loss: 0.0372 - accuracy: 0.9880 - val_loss: 0.0413 - val_accuracy: 0.9875 - 5s/epoch - 9ms/step
Epoch 8/10
540/540 - 4s - loss: 0.0279 - accuracy: 0.9915 - val_loss: 0.0330 - val_accuracy: 0.9892 - 4s/epoch - 8ms/step
Epoch 9/10
540/540 - 4s

<keras.callbacks.History at 0x7f794f506650>

### **Testing**

In [181]:
test_loss, test_accuracy = model.evaluate(test_data)

