### Step 1 -- Load necessary Python packages:

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras

### Step 2 -- Load the Fashion MNIST dataset:

In [2]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


### Step 3 -- Data preparation:

Only two things to prepare:
* Create variable `class_names` and fill it with the names of the 10 types of clothes
* Normalize the data

In [3]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

### Step 4 -- Build the model and name it `model1`

Two parts:
* Set up the layer structures of the NN model
* Compile the model (a.k.a., specify the necessary hyperparameters)

Name it `model1` so we can differentiate between various models we'll create.

In [4]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

Step 5 -- Train the model, and evaluate accuracy on the test dataset





In [10]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Test accuracy: 0.1001


What is the test accuracy? Also, what is the difference between the train accuracy and the test accuracy? You can manually write down the answers by copying from the outputs above.

.1001. The difference is .7559.

### Step 6 -- Make predictions on the first 15 images in the test dataset

Please print out the predictions -- in terms of names of the type of clothes, not numbers -- on the first 15 images in the test dataset. Also please print out the ground truth names of these 15 images for comparison. No need to visualize anything.

Hint: the code in the lecture note can only make prediction over a single image. To make predictions on multiple images, you can try either of the follow methods (or other ways you can think of):
* Use [for loop](https://www.w3schools.com/python/python_for_loops.asp)
* Use [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp)


In [11]:
test_images = test_images / 255.0
first_15_test_images = test_images[:15]
predictions = model.predict(first_15_test_images)
predicted_labels = np.argmax(predictions, axis=1)
for i in range(15):
    print(f"Image {i+1}:")
    print(f"  Predicted: {class_names[predicted_labels[i]]}")
    print(f"  Ground truth: {class_names[test_labels[i]]}")

Image 1:
  Predicted: Bag
  Ground truth: Ankle boot
Image 2:
  Predicted: Bag
  Ground truth: Pullover
Image 3:
  Predicted: Bag
  Ground truth: Trouser
Image 4:
  Predicted: Bag
  Ground truth: Trouser
Image 5:
  Predicted: Bag
  Ground truth: Shirt
Image 6:
  Predicted: Bag
  Ground truth: Trouser
Image 7:
  Predicted: Bag
  Ground truth: Coat
Image 8:
  Predicted: Bag
  Ground truth: Shirt
Image 9:
  Predicted: Bag
  Ground truth: Sandal
Image 10:
  Predicted: Bag
  Ground truth: Sneaker
Image 11:
  Predicted: Bag
  Ground truth: Coat
Image 12:
  Predicted: Bag
  Ground truth: Sandal
Image 13:
  Predicted: Bag
  Ground truth: Sneaker
Image 14:
  Predicted: Bag
  Ground truth: Dress
Image 15:
  Predicted: Bag
  Ground truth: Coat


In [12]:
model.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test accuracy: 0.1


What is the test accuracy? Did the model performance improve?

.1. No, it didn't improve.

**What** is the difference between the train accuracy and the test accuracy? Did it grow larger as compared to Step 5 in Question 1? Briefly, what is your takeaway?

In Question 2, the difference is 0.7561. In Step 5, the difference is 0.7559. Adding more epochs is counterproductive if anything. The model is now overfitting.

In [14]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

model2 = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model2.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model2.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model2.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test accuracy: 0.8417


What is the test accuracy of `model2`? Did the model performance improve as compared to that in Question 2?

.8417. Yes, it has improved.

What is the difference between the train accuracy and the test accuracy in `model2`? Did it grow larger as compared to that in Question 2? No need to explain.

For model2, the difference is 0.0157. For Question 2, the difference is 0.7561. The difference decreased.

In [15]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

model3 = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(256, activation='relu'),  
    keras.layers.Dense(10, activation='softmax')
])

model3.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model3.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model3.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test accuracy: 0.8533


What is the test accuracy of `model3`? Did the model performance improve as compared to that in `model2`?

.8533. Yes, it improved.

What is the difference between the train accuracy and the test accuracy in `model3`? Did it grow larger as compared to that in `model2`? No need to explain.

For model3, the difference is 0.0171. For model2, the difference is .0157. It got larger.

In [16]:
fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

model4 = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(256, activation='relu'),
keras.layers.Dropout(.25),
keras.layers.Dense(256, activation='relu'), 
keras.layers.Dense(10, activation='softmax')
])

model4.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])

model4.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model4.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test accuracy: 0.7657


What is the test accuracy of `model4`? Did the model performance improve as compared to that in `model3`?

.7657. No, it didn't improve.

What is the difference between the train accuracy and the test accuracy in `model4`? Did it grow larger as compared to that in `model3`? No need to explain.

For model4, the difference is .0132. For model3, the difference is .0171. It got smaller.

In [17]:
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=10)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

test_images = test_images / 255.0
first_15_test_images = test_images[:15]
predictions = model.predict(first_15_test_images)
predicted_labels = np.argmax(predictions, axis=1)
for i in range(15):
    print(f"Image {i+1}:")
    print(f"  Predicted: {class_names[predicted_labels[i]]}")
    print(f"  Ground truth: {class_names[test_labels[i]]}")

model.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

model2 = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model2.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model2.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model2.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

model3 = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model3.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model3.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model3.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

model4 = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dropout(.25),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model4.compile(optimizer='adam',
               loss=tf.keras.losses.SparseCategoricalCrossentropy(),
               metrics=['accuracy'])

model4.fit(train_images, train_labels, epochs=20)

test_loss, test_acc = model4.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Test accuracy: 0.9496
Image 1:
  Predicted: 8
  Ground truth: 7
Image 2:
  Predicted: 8
  Ground truth: 2
Image 3:
  Predicted: 8
  Ground truth: 1
Image 4:
  Predicted: 8
  Ground truth: 0
Image 5:
  Predicted: 8
  Ground truth: 4
Image 6:
  Predicted: 8
  Ground truth: 1
Image 7:
  Predicted: 8
  Ground truth: 4
Image 8:
  Predicted: 8
  Ground truth: 9
Image 9:
  Predicted: 8
  Ground truth: 5
Image 10:
  Predicted: 8
  Ground truth: 9
Image 11:
  Predicted: 8
  Ground truth: 0
Image 12:
  Predicted: 8
  Ground truth: 6
Image 13:
  Predicted: 8
  Ground truth: 9
Image 14:
  Predicted: 8
  Ground truth: 0
Image 15:
  Predicted: 8
  Ground truth: 1
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epo

What is the test accuracy?

.0974

In [20]:
fashion_mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

train_images = train_images / 255.0
test_images = test_images / 255.0

class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(256, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

opt = keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer=opt,
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

estop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

hist = model.fit(train_images, train_labels, epochs=50, validation_split=0.2, callbacks=[estop])

test_loss, test_acc = model.evaluate(test_images, test_labels)

print(f'\nTest accuracy: {test_acc:.4}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50

Test accuracy: 0.9772
