## PART 1 - Deep Neural Networks

In [0]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

print(tf.__version__)

In [0]:
fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

In [0]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

**Step 4. Preprocessing: Subtract the mean value of training images from every pixel in
every image in both train and test data to shift the total mean to 0. **

In [0]:
train_image_list = []
for image in train_images:
    mean = np.mean(image)
    train_image_list.append(np.subtract(image, np.full((28, 28), mean)))
  
test_image_list = []
for image in test_images:
  mean = np.mean(image)
  test_image_list.append(np.subtract(image, np.full((28, 28), mean)))
  
train_images = np.stack(train_image_list)
test_images = np.stack(test_image_list)

**Step 6. Training & Validation: Split the training data into 30% validation and 70% training.**

 The following cell which splits validation and training data is used only for this step. Since ImageDataGenerator can split validation and training data, it is used for splitting from Step 7 on.

Uncomment it for Step 6.

In [0]:
"""
from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=5, random_state=0, test_size=0.3)

train_index, valid_index = next(sss.split(train_images, train_labels))

valid_images, valid_labels = train_images[valid_index], train_labels[valid_index]
train_images, train_labels = train_images[train_index], train_labels[train_index]

print(train_images.shape, valid_images.shape, test_images.shape)

"""

**Step 5. Convolutional Layer: For adding a convolutional layer you need to reshape your
data to 4 dimensions (60000, 28,28) -> (60000, 28, 28, 1).**

In [0]:
train_images = np.expand_dims(train_images, axis=3)  
print("New shape of train_images is ", train_images.shape)
test_images = np.expand_dims(test_images, axis=3)
print("New shape of test_images is ", test_images.shape)

In [0]:
train_images = train_images / 255.0

test_images = test_images / 255.0

**Step 3. Weight Initializer: Compare using random_normal and random_uniform as your kernel_initializer parameters in your Dense layer.** Here are the models that I have tried and their accuracies:

1.   model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax, kernel_initializer="random_normal")
])   
Test accuracy: 0.8622
2.   model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax, kernel_initializer="random_uniform")
])    
Test accuracy: 0.8708
3.  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu, kernel_initializer="random_normal"),
    keras.layers.Dense(10, activation=tf.nn.softmax, kernel_initializer="random_normal")
])       
Test accuracy: 0.8749
4.  model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu, kernel_initializer="random_uniform"),
    keras.layers.Dense(10, activation=tf.nn.softmax, kernel_initializer="random_uniform")
])       
Test accuracy: 0.8718


Model 3 is used since it has the highest accuracy among others.



---

**Step 5. Convolutional Layer: As the first layer add a new 3x3 convolutional layer with 128 filters.**


---



**Step 9. Regularization: Add DropOut layer(s) for regularization where it makes sense.**


---



**Step 10.Batch Normalization: Add BatchNormalization layer(s) where it makes sense.**

In [0]:
model = keras.Sequential([
    keras.layers.Conv2D(filters=128, kernel_size=3, input_shape=(28, 28, 1)),
    keras.layers.Dropout(0.25),
    keras.layers.Flatten(),
    keras.layers.Dense(128, kernel_initializer="random_normal"),
    keras.layers.BatchNormalization(),
    keras.layers.Activation('relu'),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(10, kernel_initializer="random_normal"),
    keras.layers.BatchNormalization(),
    keras.layers.Activation('softmax')
])

**Step 11. Loss Function: Change sparse_categorical_crossentropy loss to categorical_crossentropy loss first. Experiment. Then change it to cosine_proximity loss. Experiment. Explain which loss function fits the data best and why?**

For TPU but without SGD optimizer.

In [0]:
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    )
)
tpu_model.compile(optimizer=tf.train.AdamOptimizer(),
              loss='sparse_categorical_crossentropy',
              #loss='categorical_crossentropy',
              #loss='cosine_proximity',
              metrics=['accuracy'])

**Step 8. Different Optimizer: Change the optimizer to use Stochastic Gradient Descent
(SGD) with the following parameters: (lr=0.1, momentum=0.7, decay=0.01, nesterov=True) Explain each of these parameters. **   
Arguments:
* lr: float >= 0. Learning rate. Gradient descent algorithms multiply the gradient by a scalar known as the learning rate (also sometimes called step size) to determine the next point.
* momentum: float >= 0. Parameter updates momentum. SGD with momentum is method which helps accelerate gradients vectors in the right directions, thus leading to faster converging.
* decay: float >= 0. Learning rate decay over each update. Weight update rule that causes the weights to exponentially decay to zero
* nesterov: boolean. Whether to apply Nesterov momentum. Nesterov momentum has slightly less overshooting compare to standard momentum since it takes the "gamble->correction" approach.

Use CPU this step on.

---

**Step 11. Loss Function: Change sparse_categorical_crossentropy loss to
categorical_crossentropy loss first. Experiment. Then change it to
cosine_proximity loss. Experiment. Explain which loss function fits the data best
and why?**

For CPU

In [0]:
model.compile(keras.optimizers.SGD(lr=0.1, momentum=0.7, decay=0.01, nesterov=True),
              loss='sparse_categorical_crossentropy',
              #loss='categorical_crossentropy',
              #loss='cosine_proximity',
              metrics=['accuracy'])

**Step 6. Training & Validation: Add an EarlyStopping Keras callback with appropriate parameters to
stop the training epochs if validation accuracy has not improved for 3 epochs (run
your network training for 50 epochs if no early stop happens. Use the
fit_generator function instead of fit).**

The following cells which take validation and training data seperately are used only for this step. Since ImageDataGenerator can split validation and training data, it is used for splitting from Step 7 on.

Uncomment them for Step 6.

In [0]:
callback = [keras.callbacks.EarlyStopping(monitor='loss', patience=3)]

In [0]:
def train_gen(batch_size):
  while True:
    offset = np.random.randint(0, train_images.shape[0] - batch_size)
    yield train_images[offset:offset+batch_size], train_labels[offset:offset + batch_size]

In [0]:
"""
# For CPU usage
model.fit_generator(
    train_gen(512),
    epochs=50,
    steps_per_epoch=100,
    validation_data=(valid_images, valid_labels),
    callbacks = [keras.callbacks.EarlyStopping(monitor='loss', patience=3)]
)
"""

In [0]:
"""
# For TPU usage
tpu_model.fit_generator(
    train_gen(512),
    epochs=50,
    steps_per_epoch=100,
    validation_data=(valid_images, valid_labels),
    callbacks = [keras.callbacks.EarlyStopping(monitor='loss', patience=3)]
)
"""

**Step 6. Training & Validation: Split the training data into 30% validation and 70% training.**



---



**Step 7. Data Augmentation: Augment the training data by adding only horizontal flips of
the training images. Also experiment with augmenting data with only vertical flips.
Explain which one gives the best results. Explain your reasoning. You may use
keras.preprocessing.image.ImageDataGenerator class. After this step continue
with augmenting the data with horizontal flips only.**


Data augmentation is used for basically enlargening the dataset. For instance, as we flipped the images, we created "new" data with different orientations. Data augmentation is useful when the dataset variety is limited.

For vertical flip, delete  ```horizontal_flip=True``` and add 
```vertical_flip=True```.


Commented part (TPU) can be used for Step 7 - 10.

In [0]:
datagen = keras.preprocessing.image.ImageDataGenerator(horizontal_flip=True, validation_split=0.3)
datagen.fit(train_images)

"""
tpu_model.fit_generator(datagen.flow(train_images, train_labels, batch_size=512),
                       epochs=50,
                       steps_per_epoch=100,
                       callbacks = callback)
"""

model.fit_generator(datagen.flow(train_images, train_labels, batch_size=512),
                       epochs=50,
                       steps_per_epoch=100,
                       callbacks = callback)

**Step 6. Training & Validation​: Split the training data into 30% validation and 70% training.**

In [0]:
model.fit(train_images, train_labels, validation_split=0.3, epochs=5)

In [0]:
test_loss, test_acc = tpu_model.evaluate(test_images, test_labels)

print('Test accuracy for TPU:', test_acc)

In [0]:
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy for CPU:', test_acc)