# Assignment 2: Using Tenslorflow library to build and optimize neural networks with MNIST

### Ph22 / Caltech / Spring 2024

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import gzip
import pandas as pd

from timeit import default_timer as timer

import tensorflow as tf
import tensorflow_datasets as tfds

# Part 1: Load and normalize the MNIST data set using Tensorflow-Datasets
1. Use tfds.load() to get MNIST data; you can refer to https://www.tensorflow.org/datasets/api_docs/python/tfds/load for how tfds.load() works. In particular, make sure that tfds.load() returns `(img, label)` instead of a dictionary `{'image': img, 'label': label}`.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import gzip
import pandas as pd

from timeit import default_timer as timer

import tensorflow as tf
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
    'mnist', 
    split=['train', 'test'],
    as_supervised=True,
    with_info=True
)

2. Use subroutines from https://www.tensorflow.org/api_docs/python/tf/data/Dataset in order to normalize the data and put it into the TF format. In particular, for training data, do the following transformations: 
- tdfs.load() provide images of type tf.uint8, while the model expects tf.float32. Therefore, you need to normalize images. For this, first write a function normalize_img(image, label) that converts the image to `uint8` -> `float32`. 
- As you fit the dataset in memory, cache it before shuffling for a better performance. For true randomness, set the shuffle buffer to the full dataset size.
- Batch elements of the dataset after shuffling to get unique batches at each epoch.
- It is good practice to end the pipeline by prefetching for performance. 

and similarly for test data, except that we don't need to shuffle. 

In [None]:
def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  image = tf.image.convert_image_dtype(image, tf.float32) 
  return image, label

ds_train = ds_train.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()  
ds_train = ds_train.shuffle(buffer_size=ds_info.splits['train'].num_examples)  
ds_train = ds_train.batch(32)  
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)  


ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.cache()  
ds_test = ds_test.batch(32)  
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)

3. Learn about and describe (in a sentence or two) what tf.data.AUTOTUNE and preftech does. 


tf.data.AUTOTUNE enables TensorFlow to adjust parallel operations dynamically for optimal performance, while prefetch() allows the dataset to prepare future batches during current batch processing, and that significantly improving training efficiency by reducing wasted time.

# Part 2: Build and compile the model
1. Below is a neural network with a simple architecture with the hidden layer missing. Add code to implement an hidden layer with 128 neurons. How does this compare to the neural network you implemented in Assignment 1?

In [None]:
# this is the design architecture of the neural network
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='sigmoid'),
  tf.keras.layers.Dense(10)
])

# here we specify the optimizer and loss function which can dramatically effect the training
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001), #learning rate
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

# here we train the neural network 
history = model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)

# here we use Pandas to plot the training and validation
pd.DataFrame(history.history).plot(figsize=(8,5))
plt.yscale('log')
plt.grid()
plt.xlabel('Epoch #')
plt.show()

In this neural network we use TensorFlow and it handles the complexities of differentiating and optimizing the network, making it easy to experiment with different architectures and hyperparameters. It uses a high level API that simplifies many operations, such as layer creation, model compilation, and training. 

2. Now simply convert the above code to work with fashion_mnist (https://www.tensorflow.org/datasets/catalog/fashion_mnist) instead of MNIST. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import gzip
import pandas as pd

from timeit import default_timer as timer

import tensorflow as tf
import tensorflow_datasets as tfds
(ds_train, ds_test), ds_info = tfds.load(
    'fashion_mnist', 
    split=['train', 'test'],
    as_supervised=True,
    with_info=True
)
def normalize_img(image, label):
    image = tf.image.convert_image_dtype(image, tf.float32) 
    return image, label

ds_train = ds_train.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(buffer_size=ds_info.splits['train'].num_examples)
ds_train = ds_train.batch(32)
ds_train = ds_train.prefetch(tf.data.AUTOTUNE)

ds_test = ds_test.map(normalize_img, num_parallel_calls=tf.data.AUTOTUNE)
ds_test = ds_test.cache()
ds_test = ds_test.batch(32)
ds_test = ds_test.prefetch(tf.data.AUTOTUNE)
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='sigmoid'),
    tf.keras.layers.Dense(10)  
])


model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)
history = model.fit(
    ds_train,
    epochs=6,
    validation_data=ds_test,
)
pd.DataFrame(history.history).plot(figsize=(8, 5))
plt.yscale('log')
plt.grid(True)
plt.xlabel('Epoch #')
plt.show()

3. If your loss is getting much smaller than your validation loss, then your network is overfitting. To tackle this, go over https://www.comet.com/site/blog/dropout-regularization-with-tensorflow-keras/ to learn about dropout, a technique to prevent overfitting and implement the above network with dropout. How does that change things?  Change the size of layers, number of layers, dropout, etc. to try to improve the training speed and the final validation accuracy. 

It s not getting much smaller than the validation loss.

4. Explore a few optimizers (https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) and cost functions (https://www.tensorflow.org/api_docs/python/tf/keras/losses). Which ones give the best performance? Try to come up with an intuitive explanation for why. 

I have reached a 97.73% accuracy rate in networkfashionmnist.py as I chose to work with Nadam as an optimizer. Nadam’s incorporation has the potential to lead to faster convergence in this setting, which helps the model learn quicker in a scenario where gradients can be sparse. I kept the same cost because I found it working well.

# (Optional) Part 3: Start the adventure of hyper-parameter tuning!

1. As you have seen, trying out different model features and properties is a tedious process but there is where the "magic" lies: optimal model design is usually very specific to the task at hand (but for a counterexample of this, see the field of transfer learning) and usually depends on domain knowledge. There are more systematic ways of tuning our parameters compared to just trial and error that we have tried till now. For this part, let's fix the model architecture and only consider the learning rate as it is a one-dimensional parameters for simplicity. Go over https://neptune.ai/blog/hyperparameter-tuning-in-python-complete-guide and implement one of the methods of hyperparamter tuning described.