# SWS3009 Lab 3 Introduction to Deep Learning


| Name:    | Liu Yijia                    |
|----------|---------------------|
| Name:    | Rui Yuhan                    |

This lab should be done by both Deep Learning members of the team. Please ensure that you fill in the names of <b>both</b> team members in the spaces above. Answer <b>all</b> your questions on <b>this Python Notebook.</b>

## Submission Instructions

### SUBMISSION DEADLINE: July  5 2024, 2359 hours (11.59 pm). Folder will close by 00:15 hours on July 5 afterwhich no submission will be allowed.

Please submit this Python notebook to Canvas on the deadline provided.

Marks will be awarded as follows:

**0 marks**: No/empty/Non-English submission

**1 mark** : Poor submission

**2 marks**: Acceptable submission

**3 marks**: Good submission


## 1. Introduction

We will achieve the following objectives in this lab:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions

Please submit your answer book to Canvas by the deadline.

## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [9]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)
    
    return (train_x, ret_train_y), (test_x, ret_test_y)


(train_x, train_y), (test_x, test_y) = load_cifar10()

----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from:

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

**Please put your answers in the attached answer books**


The two statements reshape each CIFAR-10 image from a 3D array of shape (32, 32, 3) into a 1D vector of size 3072 (=32 × 32 × 3). This is done to prepare the data for input into a neural network model that expects flat input vectors.

### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [15]:
""" 
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""

import tensorflow as tf
from tensorflow.keras import models, layers, optimizers

model = models.Sequential([
    layers.Input(shape=(3072,)),
    layers.Dense(1024, activation='sigmoid'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer=optimizers.SGD(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit(train_x, train_y, epochs=50,batch_size=64, validation_data=(test_x, test_y))



Epoch 1/50
  8/782 [..............................] - ETA: 5s - loss: 2.3589 - accuracy: 0.1289  

2025-07-01 16:52:38.489356: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2025-07-01 16:52:43.809435: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.src.callbacks.History at 0x36fd06160>

#### Question 2

Complete the following table on the design choices for your MLP:

| Hyperparameter       | What I used                  | Why?                                                                 |
|:---------------------|:-----------------------------|:----------------------------------------------------------------------|
| Optimizer            | SGD                          | Specified in question                                                |
| # of hidden layers   | 1                            | Specified in question                                                |
| # of hidden neurons  | 1024                         | Specified in question                                                |
| Hid layer activation | Sigmoid                      | A traditional activation function; simple and smooth                 |
| # of output neurons  | 10                           | CIFAR-10 has 10 classes                                              |
| Output activation    | Softmax                      | Converts outputs to probability distribution for classification      |
| learning_rate        | Default                      | Default value used for SGD when not specified                        |
| momentum             | Default                      | Not specified in question, so default used                           |
| loss                 | Categorical crossentropy     | Because labels are one-hot encoded for multi-class classification    |


#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer:

***PLACE YOUR ANSWER HERE ***

Final training accuracy: 0.4652

Final validation accuracy: 0.4434

Underfitting, for the model shows relatively low accuracy on both training and validation sets, indicating that it hasn’t learned the data patterns well enough. Because MLP model is too simple and the training time is insufficient.

### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [22]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc. 

Train for 100 epochs.

"""
import tensorflow as tf
from tensorflow.keras import models, layers, optimizers

model2 = models.Sequential([
    layers.Input(shape=(3072,)),
    layers.Dense(2048, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(1024, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(512, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

model2.compile(optimizer=optimizers.Adam(learning_rate=0.001),
               loss='categorical_crossentropy',
               metrics=['accuracy'])

model2.fit(train_x, train_y, epochs=100, batch_size=64, validation_data=(test_x, test_y))





Epoch 1/100
  6/782 [..............................] - ETA: 9s - loss: 2.7282 - accuracy: 0.1120  

2025-07-01 22:58:14.934597: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2025-07-01 22:58:23.977301: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78/100
Epoch 7

<keras.src.callbacks.History at 0x3865a2880>

Train for 100 epoches: 

Final training accuracy: 0.4841

Final validation accuracy: 0.4795

# Here we train for 150 epoches for MLP

In [23]:
# Here we continue to try to run for 150 epoches
import tensorflow as tf
from tensorflow.keras import models, layers, optimizers

model2 = models.Sequential([
    layers.Input(shape=(3072,)),
    layers.Dense(2048, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(1024, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(512, activation='sigmoid'),
    layers.Dropout(0.2),
    layers.Dense(10, activation='softmax')
])

model2.compile(optimizer=optimizers.Adam(learning_rate=0.001),
               loss='categorical_crossentropy',
               metrics=['accuracy'])

model2.fit(train_x, train_y, epochs=150, batch_size=64, validation_data=(test_x, test_y))



Epoch 1/150
  5/782 [..............................] - ETA: 9s - loss: 2.8139 - accuracy: 0.0844  

2025-07-01 23:14:43.475272: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2025-07-01 23:14:52.213908: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78/150
Epoch 7

<keras.src.callbacks.History at 0x3b22a1310>

----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the learning_rate, momentum etc rows with parameters more appropriate to the optimizer that you have chosen.

| Hyperparameter         | What I used | Why?                                                                 |
|:-----------------------|:------------|:----------------------------------------------------------------------|
| Optimizer              | Adam        | Combines momentum and adaptive learning rate; performs well in practice |
| # of hidden layers     | 3           | Allows model to learn complex hierarchical features                    |
| # neurons (layer1)     | 2048        | Large layer to capture high-dimensional input features                |
| Hid layer1 activation  | Sigmoid     | Performed better than ReLU in this specific task after trials         |
| Dropout (after layer1) | 0.2         | Prevents overfitting by randomly deactivating neurons during training |
| # neurons (layer2)     | 1024        | Reduces dimensionality while preserving learned patterns              |
| Hid layer2 activation  | Sigmoid     | Keeps activation consistent and effective                             |
| Dropout (after layer2) | 0.2         | Same reason as above                                                  |
| # neurons (layer3)     | 512         | Further abstraction, preparing for output layer                       |
| Hid layer3 activation  | Sigmoid     | Same reason as above                                                  |
| Dropout (after layer3) | 0.2         | Same reason as above                                                  |
| # of output neurons    | 10          | CIFAR-10 has 10 classes                                               |
| Output activation      | Softmax     | Converts logits to class probabilities                                |
| learning_rate          | 0.001       | Default and stable value for Adam optimizer                           |
| momentum               | N/A         | Not used with Adam                                                    |
| loss                   | Categorical crossentropy | Suitable for one-hot multi-class classification              |

#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer.

***Write your answers here***

For 150 epoches:

Final training accuracy: 0.4874

Final validation accuracy: 0.4830

There is a very slight improvement compared to the initial MLP model, with the accuracy from 0.4652 incresing to 0.4874.

Still signs of underfitting, the gap between training and validation accuracy is not very large, but both accuracies are still not very high. This suggests that the model still hasn’t fully captured the complexity of the CIFAR-10 data.

#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets.

***Write your answers here***

Using a dense MLP for CIFAR-10 is challenging because flattening the image loses important spatial features. Without convolution, the model can’t easily detect patterns like edges or textures, leading to low accuracy even after long training.

----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [24]:
from keras.datasets import mnist
from keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')
    
    train_x /= 255.0
    test_x /= 255.0
        
    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)
        
    return (train_x, train_y), (test_x, test_y) 

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [25]:
# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn.keras'

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)                                                                                             
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model



----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean? 

***Write your answer here***

The reason why input_shape is set to (28,28,1) is that this matches the format of MNIST images.

input_shape=(28, 28, 1) means the input images are 28x28 pixels, and they are 1 channel—standard for grayscale MNIST images.

padding='same' means that the output of the convolution layer has the same spatial dimensions as the input (28x28), by adding zero-padding around the edges. 

#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean? 

***Write your answer here***

Other pooling layers available:

AveragePooling2D: takes the average value in each pooling window;

GlobalMaxPooling2D / GlobalAveragePooling2D: performs pooling over the entire feature map.

strides=2 means the pooling window moves 2 pixels at a time.

#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```

***Write your answer here***

The Flatten() layer converts the multi-dimensional feature maps into a 1D vector. This helps connect the convolutional part of the model to the fully connected layers for classification.

----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [26]:
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.7), 
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What does do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```

min_delta=0.001 sets the minimum change in the monitored metric to qualify as an improvement.

patience=10 allows training to continue for 10 more epochs without significant improvement before stopping early.

---

### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [None]:
(train_x, train_y),(test_x, test_y) = load_mnist()
model = buildmodel(MODEL_NAME)
train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)
    

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz




Starting training.
Epoch 1/5


2025-07-02 00:28:53.619064: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2025-07-02 00:29:11.022251: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Done. Now evaluating.
Test accuracy: 0.99, loss: 0.03


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)?

***Type your answers here***

CNNs work better than MLPs for image tasks like CIFAR-10 because they keep the spatial structure of the image and can detect local patterns like edges. They also use fewer parameters thanks to weight sharing, which makes training more efficient.

In contrast, MLPs flatten the image and treat all pixels the same, which loses important spatial information. That’s why they usually perform worse on image data.

The downside of CNNs is that they can be more complex to design and train, but the boost in accuracy is often worth it.

## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the CIFAR-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

| Hyperparameter       | What I used                    | Why?                              |
|:---------------------|:-------------------------------|:----------------------------------|
| Optimizer            | Adam                           | Fast convergence, works well      |
| Input shape          | (32, 32, 3)                    | CIFAR-10 images are RGB 32×32     |
| First layer          | Conv2D(64, 3×3, relu) + BN     | Extracts basic features           |
| Second layer         | Conv2D(64, 3×3, relu) + BN + MaxPooling | Deepens feature extraction |
| Add more layers      | Conv2D(128, 256) + BN + MaxPooling | Capture complex patterns    |
| if needed            | BatchNormalization             | Stabilizes and speeds up training |
| Dense layer          | Dense(1024→512→10) + Dropout   | Classifies and reduces overfitting |




In [58]:
"""
Write your code for your CNN for the CIFAR-10 dataset here. 

Note: train_x, train_y, test_x, test_y were changed when we called 
load_mnist in the previous section. We reload them below for you.

"""

(train_x, train_y), (test_x, test_y) = cifar10.load_data()
train_x = train_x.astype('float32')
test_x = test_x.astype('float32')
train_x /= 255.0
test_x /= 255.0
ret_train_y = to_categorical(train_y,10)
ret_test_y = to_categorical(test_y, 10)

In [None]:
MODEL_NAME = 'cifar'
from keras.optimizers import Adam
from keras.layers import BatchNormalization


def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        model = Sequential()

        model.add(Conv2D(64, kernel_size=(3,3), activation='relu',
                         input_shape=(32, 32, 3), padding='same'))
        model.add(BatchNormalization())
        model.add(Conv2D(64, kernel_size=(3,3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))

        model.add(Conv2D(128, kernel_size=(3,3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(Conv2D(128, kernel_size=(3,3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))

        model.add(Conv2D(256, kernel_size=(3,3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(Conv2D(256, kernel_size=(3,3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))

        model.add(Flatten())
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.4))
        model.add(Dense(512, activation='relu'))
        model.add(Dropout(0.4))
        model.add(Dense(10, activation='softmax'))

    return model

In [66]:
def train(model, train_x, train_y, epochs, test_x, test_y, model_name):
    model.compile(optimizer=Adam(learning_rate=0.0001),
                  loss='categorical_crossentropy', metrics=['accuracy'])
    savemodel = ModelCheckpoint(model_name + '.h5')
    stopmodel = EarlyStopping(min_delta=0.001, patience=10)
    model.fit(x=train_x, y=train_y, batch_size=64,
              validation_data=(test_x, test_y), shuffle=True,
              epochs=epochs, callbacks=[savemodel, stopmodel])
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f" % (acc, loss))

In [67]:
model = buildmodel(MODEL_NAME)
train(model, train_x, ret_train_y, 50, test_x, ret_test_y, MODEL_NAME)



Epoch 1/50


2025-07-02 03:02:36.214073: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.




2025-07-02 03:02:53.214867: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:114] Plugin optimizer for device_type GPU is enabled.


Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Test accuracy: 0.79, loss: 2.90


Final train accuracy: 0.8461

Final validation accuracy: 0.7895

Test accuracy: 0.79