# CAP 6619 - Deep Learning - Dr Marques
## Project 1
## Image Classifiers Using Neural Networks and the MNIST and Fashion MNIST Datasets

Sai Jhansi Kongara


https://colab.research.google.com/drive/14waohc6SvTW0nlD_zvmEk7-GRUMqqFc8?usp=sharing

Useful references and sources:

**MNIST**

- https://www.tensorflow.org/datasets/catalog/mnist

- https://en.wikipedia.org/wiki/MNIST_database

- https://github.com/the-deep-learners/deep-learning-illustrated/blob/master/notebooks/shallow_net_in_keras.ipynb

**Fashion MNIST**

- https://www.tensorflow.org/datasets/catalog/fashion_mnist

- https://en.wikipedia.org/wiki/Fashion_MNIST

- https://keras.io/api/datasets/fashion_mnist/

## PART 1 - *MNIST classifier using MLP*




### Import Needed Resources / Libraries

In [None]:
from tensorflow import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

from keras import layers

from matplotlib import pyplot as plt

import numpy as np
import tensorflow as tf

### Load and prepare the data

In [None]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and validation sets
(X_train, y_train), (X_valid, y_valid) = mnist.load_data()

### Examine Data

In [None]:
X_train.shape

In [None]:
y_train.shape

In [None]:
y_train[0:12]

In [None]:
plt.figure(figsize=(5,5))
for k in range(12):
    plt.subplot(3, 4, k+1)
    plt.imshow(X_train[k], cmap='Greys')
    plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
X_valid.shape

In [None]:
y_valid.shape

In [None]:
y_valid[0]

In [None]:
plt.imshow(X_valid[0], cmap='Greys')
plt.axis('off')
plt.show()

In [None]:
# Reshape (flatten) images
X_train_reshaped = X_train.reshape(60000, 784).astype('float32')
X_valid_reshaped = X_valid.reshape(10000, 784).astype('float32')

# Scale images to the [0, 1] range
X_train_scaled_reshaped = X_train_reshaped / 255
X_valid_scaled_reshaped = X_valid_reshaped / 255

# Renaming for conciseness
X_training = X_train_scaled_reshaped
X_validation = X_valid_scaled_reshaped

print("X_training shape (after reshaping + scaling):", X_training.shape)
print(X_training.shape[0], "train samples")
print("X_validation shape (after reshaping + scaling):", X_validation.shape)
print(X_validation.shape[0], "validation samples")

In [None]:
# convert class vectors to binary class matrices
y_training = keras.utils.to_categorical(y_train, num_classes)
y_validation = keras.utils.to_categorical(y_valid, num_classes)

In [None]:
print(y_valid[0])
print(y_validation[0])

### Configure model

In [None]:
model = Sequential()
model.add(Dense(64, activation='sigmoid', input_shape=(784,)))
model.add(Dense(10, activation='softmax'))

In [None]:
model.summary()

In [None]:
(64*784)

In [None]:
(64*784)+64

In [None]:
(10*64)+10

In [None]:
model.compile(
    loss='mean_squared_error',
    optimizer=SGD(learning_rate=0.01),
    metrics=['accuracy']
)

### Train!

In [None]:
batch_size=128
epochs=200

history = model.fit(
  X_training, # training data
  y_training, # training targets
  epochs=epochs,
  batch_size=batch_size,
  verbose=1,
  validation_data=(X_validation, y_validation)
)

### Plot learning curves

In [None]:
# list all data in history
print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

### Evaluate the model

In [None]:
model.evaluate(X_validation, y_validation)

## **PART 1** - *Your Turn*

### **Part 1 - Tasks:**  *(40 pts)*
1. Write code to display the confusion matrix for your classifier and comment on the insights such confusion matrix provides. See [this](https://scikit-learn.org/stable/auto_examples/classification/plot_digits_classification.html) for an example.

2. Write code to display 10 cases where the classifier makes mistakes. Make sure to display both the true value as well as the predicted value.

#### 1.a. Confusion Matrix *(10 pts)*

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Define the class names
classes_names = ['Class 0', 'Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5', 'Class 6', 'Class 7', 'Class 8', 'Class 9']

# Get predicted probabilities
y_pred_prob = model.predict(X_validation)

# Convert probabilities to predicted labels
y_pred = np.argmax(y_pred_prob, axis=1)

# Compute confusion matrix
cm = confusion_matrix(y_valid, y_pred)
# Display confusion matrix using a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes_names, yticklabels=classes_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()





#### 1.b. Comment on the insights confusion matrix provides *(10 pts)*

The confusion matrix offers important insights into how well a categorization model performs. By contrasting the genuine labels from the validation set with the predicted labels, it provides a visual summary of the model's predictions. The confusion matrix can provide the following information:
True Positives (TP): In the confusion matrix, the diagonal elements represent the cases that were correctly categorized into each class. A greater value along the diagonal denotes more accurate forecasts.
False Negatives (FN) are the cases that were wrongly identified as belonging to a different class but actually fall within a specific class. The values in each row, omitting the diagonal element, reflect these examples. It shows how many examples for each class were either overlooked or wrongly categorized.
True Negatives (TN): The values outside of each class's diagonal and outside of its related rows and columns are accurately predicted cases that don't fall under the corresponding classes. The confusion matrix may occasionally not directly display the TN values.
You can learn a lot by studying the confusion matrix, including:
Overall Model Performance: By adding up the values on the diagonal and dividing it by the total number of instances, you can determine the model's overall accuracy. Performance is improved when the values along the diagonal are higher.
Performance per class: The confusion matrix enables you to assess the model's effectiveness in each specific class. By examining the numbers off the diagonal, you may determine which classes are more difficult for the model to appropriately categorize. It aids in determining whether the model tends to misclassify particular classes or if it excels at handling particular classes.
Unbalanced Classes: If the class distributions in the dataset are unbalanced, the confusion matrix may reveal potential problems. Instances from the majority class may be more likely to be misclassified by the model if, for instance, one class has a much higher number of instances than the others.
Error Patterns: By looking at the values that are off the diagonal, you can spot certain error patterns or class misunderstandings. It assists in identifying the classes that are commonly confused with one another and can offer suggestions for future model or dataset enhancements.
The confusion matrix is a useful tool for assessing the effectiveness of a classification model and learning about its advantages, disadvantages, and prospective areas for development.


#### 2.Display 10 cases where the classifier makes mistakes. *(20 pts)*

In [None]:
import matplotlib.pyplot as plt
import numpy as np


# Get predicted probabilities
y_pred_prob = model.predict(X_validation)

# Convert probabilities to predicted labels
y_pred = np.argmax(y_pred_prob, axis=1)

# Find indices where true labels and predicted labels don't match
incorrect_indices = np.where(y_valid != y_pred)[0]

# Randomly select 10 incorrect predictions
incorrect_samples = np.random.choice(incorrect_indices, size=10, replace=False)

# Display the true and predicted labels for the selected samples
plt.figure(figsize=(12, 8))
for i, sample_idx in enumerate(incorrect_samples):
    plt.subplot(2, 5, i+1)
    plt.imshow(X_valid[sample_idx], cmap='gray')
    plt.title("True: {}, Predicted: {}".format(classes_names[y_valid[sample_idx]], classes_names[y_pred[sample_idx]]))
    plt.axis('off')
plt.tight_layout()
plt.show()




## PART 2 - *Fashion MNIST*




### Load and prepare the data

In [None]:
# Model / data parameters
num_classes = 10
input_shape = [28 ,28]

In [None]:
# Loading and Spliting Data in Test and Train
(X_train, y_train), (X_valid, y_valid) = tf.keras.datasets.fashion_mnist.load_data()
assert X_train.shape == (60000, 28, 28)
assert X_valid.shape == (10000, 28, 28)
assert y_train.shape == (60000,)
assert y_valid.shape == (10000,)

### Defining Classes

In [None]:
classes_names = ['T-shirt/top' ,  'Trouser' ,'Pullover' , 'Dress', 'Coat','Sandal','Shirt','Sneaker', 'Bag','Ankle Boot']

### Examine Data

In [None]:
X_train.shape

In [None]:
y_valid.shape

In [None]:
y_train[0:12]

In [None]:
plt.figure(figsize=(5,5))
for k in range(12):
    plt.subplot(3, 4, k+1)
    plt.imshow(X_train[k], cmap='Greys')
    plt.axis('off')
plt.tight_layout()
plt.show()

In [None]:
X_valid.shape

In [None]:
y_valid.shape

In [None]:
y_valid[0]

In [None]:
plt.imshow(X_valid[0], cmap='Greys')
plt.axis('off')
plt.show()

In [None]:
# Reshape (flatten) images
X_train_reshaped = X_train.reshape(60000, 784).astype('float32')
X_valid_reshaped = X_valid.reshape(10000, 784).astype('float32')

# Scale images to the [0, 1] range
X_train_scaled_reshaped = X_train_reshaped / 255
X_valid_scaled_reshaped = X_valid_reshaped / 255

# Renaming for conciseness
X_training = X_train_scaled_reshaped
X_validation = X_valid_scaled_reshaped

print("X_training shape (after reshaping + scaling):", X_training.shape)
print(X_training.shape[0], "train samples")
print("X_validation shape (after reshaping + scaling):", X_validation.shape)
print(X_validation.shape[0], "validation samples")

In [None]:
# convert class vectors to binary class matrices
y_training = keras.utils.to_categorical(y_train, num_classes)
y_validation = keras.utils.to_categorical(y_valid, num_classes)

In [None]:
print(y_valid[0])
print(y_validation[0])

## **PART 2** - *Your Turn*

### **Part 2 - Tasks:** *(60 pts)*
Build a NN solution identical to the one before: *(20 pts)*
> a. Plot learning curves *(10 pts)*

> b. Display the confusion matrix for your classifier *(10 pts)*

> c. Evaluate the model, identify accuracy, etc. *(10 pts)*

> d. Discuss why the results are not as good. *(10 pts)*





#### NN solution *(20 pts)*

Configure the Model *(10 pts)*

In [None]:
# Your Configure the Model code here and in additional code cells as needed
# use same model and hyperparameters as was used for MNIST above
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

# Model configuration
num_classes = 10
input_shape = (784,)

# Build the model
model = Sequential()
model.add(Dense(64, activation='sigmoid', input_shape=input_shape))
model.add(Dense(num_classes, activation='softmax'))

# Configure the model
model.compile(
    loss='categorical_crossentropy',
    optimizer=SGD(learning_rate=0.01),
    metrics=['accuracy']
)



Train the Model  *(10 pts)*

In [None]:
# Your Train the Model code here and in additional code cells as needed
# same as was used in MNIST above
# Train the model
batch_size = 128
epochs = 200
history = model.fit(
    X_training,
    y_training,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(X_validation, y_validation)
)


Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

#### Plot learning curves *(10 pts)*

In [None]:
# Plot learning curves
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()



#### Display the confusion matrix for your classifier *(10 pts)*

In [None]:
# Your confusion matrix code here and in additional code cells as needed
from sklearn.metrics import confusion_matrix
import seaborn as sns
# Define the class names
classes_names = ['Class 0', 'Class 1', 'Class 2', 'Class 3', 'Class 4', 'Class 5', 'Class 6', 'Class 7', 'Class 8', 'Class 9']

# Get predicted labels
y_pred = np.argmax(model.predict(X_validation), axis=-1)

# Compute confusion matrix
cm = confusion_matrix(y_valid, y_pred)

# Display confusion matrix using a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=classes_names, yticklabels=classes_names)
plt.title("Confusion Matrix")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()



#### Evaluate the model, identify accuracy, etc. *(10 pts)*

In [None]:
# Your Evaluate the model code here and in additional code cells as needed
# Evaluate the model
loss, accuracy = model.evaluate(X_validation, y_validation)

# Print the accuracy
print("Accuracy:", accuracy)



#### Discuss why the results are not as good.

If you had more time what would you do to improve the results? *(10 pts)*

There may be a number of factors to consider if the results are not what was anticipated.

Low accuracy may result from the model's inability to understand enough patterns and changes in the data if it was trained on a tiny dataset.


Overfitting: When a model learns to perform well on training data but is unable to generalize to novel, untried data, overfitting has taken place. Overfitting can be decreased by using regularization techniques like dropout or weight decay.

Model complexity: A too-simple model could have trouble capturing intricate data patterns. Performance can be enhanced by expanding the model's capacity or applying more sophisticated topologies.

Fine-tuning:The performance of the model may be greatly affected by hyperparameters like learning rate, batch size, or optimizer selection.

If I had additional time to enhance the outcomes, I would think about taking the following actions:

Data augmentation: Produce more training examples by randomly rotating, translating, scaling, or flipping the original data. The training set's diversity and variability can be increased with the use of data augmentation, which will boost generalization.

Utilize the diversity of predictions by combining different models, either through bagging or boosting methods, to potentially improve performance.

Use pre-trained models—such as those developed using extensive picture datasets like ImageNet—and fine-tune them for the particular task at hand. Utilizing the knowledge and representations acquired from one activity to enhance performance on another related task is known as transfer learning. To avoid overfitting and enhance generalization, use regularization techniques like dropout, L1 or L2 regularization, or batch normalization.

Investigate various model designs or turn to automated methods like neural architecture search (NAS) to find better network topologies adapted to the given dataset and issue.

Cross-validation: Use cross-validation to determine the model's performance with greater accuracy and to identify any potential problems, such as overfitting or data leakage.

Examining misclassified samples and analyzing the model's predictions will help you find any patterns, biases, or places where the model falls short. This analysis may serve as a roadmap for future model or data preparation enhancements.

Deploy the model in the actual world and monitor it there while gathering feedback from users or other systems using it. Analyze forecasts, keep an eye on the model's performance, and make adjustments depending on feedback from the real world.

It is feasible to enhance the outcomes and create a model that is more reliable and accurate by addressing these factors.

[ ]
# (OPTIONAL) Additional code to demonstrate possible improvements to the model in Part 2.


In [None]:
# (OPTIONAL) Additional code to demonstrate possible improvements to the model in Part 2.