#Project 3 - Comparing Deep Neural Network Architectures to Improve Image Multi-Class Classification

Aman Patel

CSCI-B 455

March 28, 2021

# **Introduction**

## Problem Statement
The goal for this project was to create a baseline Deep Neural Network to classify images from the CIFAR-10 dataset, then create two additional DNNs that improve upon the architecture of the baseline.

## Data

The dataset used for this project was collected from the University of Toronto Computer Science Department. It contains 60000 color images of objects and animals, partitioned into 50000 training images and 10000 test images. The data can be found at 
https://www.cs.toronto.edu/~kriz/cifar.html


## Model Parameters

For the baseline model (Model1), the input matrix of 32x32 pixels was flattened into a vector of 1024 pixels. To demonstrate the effects of overcompression, the first hidden layer contained only five neurons. The data was then expanded using a hidden layer of 100 neurons, then compressed to ten neurons in the output layer. This corresponds with the number of possible class labels. Each hidden layer used a sigmoid activation function, while the output layer used a softmax function to select the class with the highest probability.

For the first improved model (Model2), the input matrix was again flattened. Instead of using a bottleneck layer to compress the data rapidly, the data was gradually compressed. This was done by using hidden layers of 200 neurons and 50 neurons, and an output layer of ten neurons. This model also used a sigmoid activation function for the hidden layers and the softmax function for the output layer.

For the second improved model (Model3), the architecture is similar to that of Model2, but the activation function for the hidden layers was changed. By using the ReLU activation function instead of sigmoid, the gradient can be maintained. This leads to improved adaptibility and faster learning.

The loss function and optimizer were consistent for all three models. They were chosen because of their versatility and their compatibility with my code.

# **Code**

In [12]:
import numpy as np
import tensorflow as tf
from keras.layers.experimental.preprocessing import Rescaling
import cv2

# baseline model
class Model1(tf.keras.Model):

  def __init__(self):
    super(Model1, self).__init__()
    # flattens the input matrix into a vector
    self.flatten_layer = tf.keras.layers.Flatten(input_shape = (32, 32))
    # hidden layers - first layer is a bottleneck layer
    self.layer1 = tf.keras.layers.Dense(5, activation = tf.nn.sigmoid)
    self.layer2 = tf.keras.layers.Dense(100, activation = tf.nn.sigmoid)
    # output layer
    self.layer3 = tf.keras.layers.Dense(10, activation = tf.nn.softmax)
  
  # forward propagation of input matrix
  def call(self, inputs):
    flattened = self.flatten_layer(inputs)
    hidden1 = self.layer1(flattened)
    hidden2 = self.layer2(hidden1)
    return self.layer3(hidden2)

# first improved model
class Model2(tf.keras.Model):

  def __init__(self):
    super(Model2, self).__init__()
    # flattens the input matrix into a vector
    self.flatten_layer = tf.keras.layers.Flatten(input_shape = (32, 32))
    # hidden layers - number of hidden neurons changed to minimize overcompression
    self.layer1 = tf.keras.layers.Dense(200, activation = tf.nn.sigmoid)
    self.layer2 = tf.keras.layers.Dense(50, activation = tf.nn.sigmoid)
    # output layer
    self.layer3 = tf.keras.layers.Dense(10, activation = tf.nn.softmax)
  
  # forward propagation of input matrix
  def call(self, inputs):
    flattened = self.flatten_layer(inputs)
    hidden1 = self.layer1(flattened)
    hidden2 = self.layer2(hidden1)
    return self.layer3(hidden2)

class Model3(tf.keras.Model):

  def __init__(self):
    super(Model3, self).__init__()
    # flattens the input matrix into a vector
    self.flatten_layer = tf.keras.layers.Flatten(input_shape = (32, 32))
    # hidden layers - activation functions changed to ReLU to maintain gradient
    self.layer1 = tf.keras.layers.Dense(200, activation = tf.nn.relu)
    self.layer2 = tf.keras.layers.Dense(50, activation = tf.nn.relu)
    # output layer
    self.layer3 = tf.keras.layers.Dense(10, activation = tf.nn.softmax)
  
  # forward propagation of input matrix
  def call(self, inputs):
    flattened = self.flatten_layer(inputs)
    hidden1 = self.layer1(flattened)
    hidden2 = self.layer2(hidden1)
    return self.layer3(hidden2)

# collect data from keras database
(train_images_rgb, train_labels), (test_images_rgb, test_labels) = tf.keras.datasets.cifar10.load_data()

train_images = []
test_images = []

# preprocessing images
for image in train_images_rgb:
  # convert RGB images to grayscale
  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
  new_image = []
  for row in image:
    # normalize grayscale values
    row = row / 255
    new_image.append(row)
  train_images.append(new_image)

for image in test_images_rgb:
  image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
  new_image = []
  for row in image:
    row = row / 255
    new_image.append(row)
  test_images.append(new_image)

# edit the shape of the image datasets
train_images = tf.stack(train_images)
test_images = tf.stack(test_images)

# instantiate the models
model1 = Model1()
model2 = Model2()
model3 = Model3()

# define the optimizer and loss function for each model
model1.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model2.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model3.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# train each model using the training images and labels
# iterate through the whole dataset 20 times for each model
model1.fit(train_images, train_labels, epochs = 20)
print("Model 1 Trained")
model2.fit(train_images, train_labels, epochs = 20)
print("Model 2 Trained")
model3.fit(train_images, train_labels, epochs = 20)
print("Model 3 Trained")

# test each model using the testing images and labels, output the accuracy
model1.evaluate(test_images, test_labels)
model2.evaluate(test_images, test_labels)
model3.evaluate(test_images, test_labels)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model 1 Trained
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model 2 Trained
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model 3 Trained


[1.6364028453826904, 0.41350001096725464]

# **Results**

The baseline model had an accuracy of 0.2680 on the testing dataset. The first improved model had an accuracy of 0.4338 while the second had an accuracy of 0.4135. This was unexpected, as the ReLU activation function maintains the gradient, improving the model's ability to adapt to new training data.

## Future Improvements

The models can be improved by changing the number of hidden layers/neurons, changing the optimizer and loss functions, and increasing the amount of training data and time.