Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.  

In this assignment, we
- build a MLP classifier for the Fashion-MNIST dataset.
- use PCA to reduce the dimensionality of the dataset, and make sure we preserve 95% of the explained variance. (20 points)
- train a classifier using the dimensionality reduced dataset with the same network toplogy as the previous classfier,  and compare the classification accuracy result with the one using the original dataset. (10 points)
- check whether we observe anything surprising. (10 points)
- follow and improve the example from the text to fine tune the neural network hyperparameters using RandomizedSearchCV. Note to use the dataset after the PCA step. This will make the search less time consuming. (40 points)
- report the test result using the best model obtained from the randomized search. Show the summary of the model. Compare this result with the previous results. (20 points)

In [163]:
import numpy as np
import os

# To plot pretty figures
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12

Read the data.

In [164]:
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

Set random seeds.

In [165]:
np.random.seed(42)
tf.random.set_seed(42)

In [166]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(64, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

In [167]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [168]:
X_train_full = X_train_full.reshape((60000, 28 * 28))
X_train_full = X_train_full.astype('float32') / 255

X_test = X_test.reshape((10000, 28 * 28))
X_test = X_test.astype('float32') / 255

In [169]:
from tensorflow.keras.utils import to_categorical

y_train_full = to_categorical(y_train_full)
y_test = to_categorical(y_test)

In [170]:
X_valid = X_train_full[:5000]
y_valid = y_train_full[:5000]
X_train = X_train_full[5000:]
y_train = y_train_full[5000:]

In [171]:
network.fit(X_train, y_train, validation_data=(X_valid, y_valid), epochs=20, batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x7eb8328ccbe0>

In [172]:
test_loss, test_acc = network.evaluate(X_test, y_test)



In [173]:
print('test_acc:', test_acc)

test_acc: 0.8694000244140625


In [174]:
test_acc_withoutPCA = test_acc
test_acc_withoutPCA

0.8694000244140625

Now we read the data again.

In [175]:
import tensorflow as tf
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()

Use the same random seeds.

In [176]:
np.random.seed(42)
tf.random.set_seed(42)

In [177]:
X_train_full.shape

(60000, 28, 28)

In [178]:
X_test.shape

(10000, 28, 28)

In [179]:
X_train_full = X_train_full.reshape((60000,28*28))
X_test = X_test.reshape((10000,28*28))

Conduct fit and transform on X_train_full using PCA. (10 points)

In [180]:
from sklearn.decomposition import PCA

# fill in code here
pca = PCA(n_components=0.95)
X_train_reduced_full = pca.fit_transform(X_train_full)

Transform X_test using the PCA. (10 points)

In [181]:
# fill in code here
X_test_reduced = pca.transform(X_test)

Fill in the input_shape in the following code. (10 points)

In [182]:
from keras import models
from keras import layers

network = models.Sequential()
# fill in code
input_shape = (X_train_reduced_full.shape[1],)
network.add(layers.Dense(64, activation='relu', input_shape=input_shape))
network.add(layers.Dense(10, activation='softmax'))

In [183]:
network.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_22 (Dense)            (None, 64)                12032     
                                                                 
 dense_23 (Dense)            (None, 10)                650       
                                                                 
Total params: 12682 (49.54 KB)
Trainable params: 12682 (49.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [184]:
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

In [185]:
X_train_reduced_full = X_train_reduced_full.astype('float32') / 255
X_test_reduced = X_test_reduced.astype('float32') / 255

In [186]:
from tensorflow.keras.utils import to_categorical

y_train_full = to_categorical(y_train_full)
y_test = to_categorical(y_test)

In [187]:
X_valid_reduced = X_train_reduced_full[:5000]
y_valid = y_train_full[:5000]
X_train_reduced = X_train_reduced_full[5000:]
y_train = y_train_full[5000:]

In [188]:
X_train_reduced_full.shape

(60000, 187)

In [189]:
y_train.shape

(55000, 10)

In [190]:
network.fit(X_train_reduced, y_train, validation_data=(X_valid_reduced, y_valid), epochs=20, batch_size=128)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.src.callbacks.History at 0x7eb8328ce7a0>

In [191]:
test_loss, test_acc = network.evaluate(X_test_reduced, y_test)



In [192]:
print('test_acc:', test_acc)

test_acc: 0.8834999799728394


In [193]:
test_acc_PCA = test_acc
test_acc_PCA

0.8834999799728394

Compare these two accuracy results and check whether we see anything surprising. (10 points)

The accuracy result ...

In [195]:
print("Test Acc without PCA: ", test_acc_withoutPCA)
print("Test Acc PCA: ", test_acc_PCA)
print("\nDifference: ", test_acc_PCA - test_acc_withoutPCA)

Test Acc without PCA:  0.8694000244140625
Test Acc PCA:  0.8834999799728394

Difference:  0.014099955558776855


**Answer:** We can observe that we have a slight increase in accuracy for PCA over previous method. Using PCA method, we got around 1.4% higher accuracy.

In [196]:
np.random.seed(42)
tf.random.set_seed(42)

Modify the code provided by this module and use RandomizedSearchCV to find a model that beats the previous accuracy results. (40 points)

Hint: you can speed up the search by using n_jobs = 1 in RandomizedSearchCV.

In [197]:
X_valid = X_train_reduced[:5000]
y_valid = y_train[:5000]
X_train = X_train_reduced[5000:]
y_train = y_train[5000:]

In [198]:
X_train.shape, X_valid.shape

((50000, 187), (5000, 187))

In [199]:
y_train.shape, y_valid.shape, y_test.shape

((50000, 10), (5000, 10), (10000, 10))

In [200]:
from tensorflow import keras

keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)

In [201]:
def build_model(n_hidden=1, n_neurons=128, learning_rate=3e-3, input_shape=(X_train_reduced.shape[1],)):
    model = keras.models.Sequential()
    model.add(keras.layers.InputLayer(input_shape=input_shape))
    for layer in range(n_hidden):
        # fill in code
        model.add(keras.layers.Dense(n_neurons, activation="relu"))
    # fill in code
    model.add(layers.Dense(10, activation='softmax'))
    optimizer = keras.optimizers.SGD(learning_rate=learning_rate)
    model.compile(optimizer='rmsprop',
                  # fill in code
                loss='categorical_crossentropy',
                metrics=['accuracy'])

    return model

In [202]:
from tensorflow import keras
from sklearn.base import BaseEstimator, RegressorMixin

# Modify the KerasRegressorWrapper class to accept hyperparameters
class KerasRegressorWrapper(BaseEstimator, RegressorMixin):
    def __init__(self, n_hidden=1, n_neurons=200, learning_rate=1e-3):
        self.n_hidden = n_hidden
        self.n_neurons = n_neurons
        self.learning_rate = learning_rate

    def fit(self, X, y, **kwargs):
        self.model = build_model(self.n_hidden, self.n_neurons, self.learning_rate)
        self.model.fit(X, y, **kwargs)
        return self

    def predict(self, X):
        return self.model.predict(X)

# Create an instance of the KerasRegressorWrapper
keras_reg = KerasRegressorWrapper()


In [203]:
keras_reg.fit(X_train, y_train, epochs=100,
              # fill in code
              validation_data=( X_valid,y_valid ),
              callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100


In [204]:
from scipy.stats import reciprocal
from sklearn.model_selection import RandomizedSearchCV

param_distribs = {
    # fill in code
    "n_hidden": [ 1, 2 ],
    "n_neurons": np.arange(200 , 500)
}

rnd_search_cv = RandomizedSearchCV(keras_reg, param_distribs, n_iter=20, cv=3, verbose=2, n_jobs=-1)
rnd_search_cv.fit(X_train, y_train, epochs=100,
                  # fill in code
                  validation_data=( X_valid , y_valid ),
                  callbacks=[keras.callbacks.EarlyStopping(patience=10)])

Fitting 3 folds for each of 20 candidates, totalling 60 fits




Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100


In [205]:
rnd_search_cv.best_params_

{'n_neurons': 484, 'n_hidden': 1}

Show the summary of the best model obtained from the randomized search. Report the test result using the best model, and compare this result with the previous results. (20 points)

In [206]:
# fill in code
model = rnd_search_cv.best_estimator_.model
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 484)               90992     
                                                                 
 dense_3 (Dense)             (None, 10)                4850      
                                                                 
Total params: 95842 (374.38 KB)
Trainable params: 95842 (374.38 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [207]:
model.evaluate(X_test_reduced, y_test)



[0.4564920961856842, 0.890999972820282]

The result from the randomized search ...