## Part 0: Importing dependencies

In [1]:
import numpy as np
import datetime
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

### Show the installed version of tensorflow

In [2]:
tf.__version__

'2.6.2'

## Part 1: Data preprocessing

### Load the fashion mnist dataset into train and test data

In [3]:
#Loading the Fashion Mnist dataset
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


### Image normalization

Each image in both training and testing dataset is divided by the number of pixels (255).
This brings all values between 0 and 1, ensuring faster training

In [5]:
X_train = X_train / 255.0
X_test = X_test / 255.0

### Dataset reshaping

The data needs to be reshaped into a vector format in order to be used with the fully connected neural network.

In [6]:
#Since each image is 28x28, we simply use reshape the full dataset to [-1 (all elements), height * width]
X_train = X_train.reshape(-1, 28*28) # -1 to reshape the full dataset, height*width
X_test = X_test.reshape(-1, 28*28)
X_train.shape

(60000, 784)

## Part 2: Build the Artificial Neural network
### Defining the model

For this exercise I use a simple sequantial neural network.
More complex examples will come at later exercises.

In [8]:
model = tf.keras.models.Sequential()

### First layer (Dense layer)

I start with the following hyper-parameters:
- number of neurons: 128
- activation function: Rectivied Linear Unit
- input shape: (784, ) (The shape of X_ )

In [9]:
model.add(tf.keras.layers.Dense(units=128, activation='relu', input_shape=(784, )))

### Add a Dropout layer

Dropout is a Regularization technique, where neurons in a layer are randomly set to zero, which means they are not updated.
This results in a smaller chance for overfitting.
For an example see [the tensorflow tutorial for over- and underfitting](https://www.tensorflow.org/tutorials/keras/overfit_and_underfit#add_dropout)
Following the tensorflow tutorial, I set the dropout rate to 0.2.

In [10]:
model.add(tf.keras.layers.Dropout(0.2))

### Add the output layer

- number of neurons: 10 (10 possible classes in Fashion MNIST)
- activation function = 'softmax'

In [11]:
model.add(tf.keras.layers.Dense(units=10, activation='softmax'))

### Compile the model

- Optimizer: Adam
- Loss function: Sparse softmax (categorical) crossentropy

I went with the Adam optimizer, since it is better than simple SGD.
For more information see [this](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf) and [this](https://arxiv.org/abs/1412.6980)

In [12]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])

### Show the model's architecture

In [13]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               100480    
_________________________________________________________________
dropout (Dropout)            (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________


## Part 3: Train the model

### I stayed at only 5 epochs

In [14]:
model.fit(X_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7fb7e8e61b38>

### Evaluate the network on the test data

In [19]:
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print("Test accuracy: {}".format(test_accuracy))

Test accuracy: 0.8686000108718872


## Part 4 : Save the model

Since training can take a while for larger models/datasets, saving the network (sometimes during training) is important.

### Save the architecture of the network as .json

In [17]:
model_json = model.to_json()
with open("fashion_model.json", "w") as json_file:
    json_file.write(model_json)

### Save the network weights

In [18]:
model.save_weights("fashion_model.h5")