# SWS3009 Lab 3 Introduction to Deep Learning


| Name:    | YANG RUNKANG                    |
|----------|---------------------|
| Name:    | WEN SIJIE                    |

This lab should be done by both Deep Learning members of the team. Please ensure that you fill in the names of <b>both</b> team members in the spaces above. Answer <b>all</b> your questions on <b>this Python Notebook.</b>

## Submission Instructions

### SUBMISSION DEADLINE: Thursday July 4 2024, 2359 hours (11.59 pm). Folder will close by 00:15 hours on July 5 afterwhich no submission will be allowed.

Please submit this Python notebook to Canvas on the deadline provided.

Marks will be awarded as follows:

**0 marks**: No/empty/Non-English submission

**1 mark** : Poor submission

**2 marks**: Acceptable submission

**3 marks**: Good submission


## 1. Introduction

We will achieve the following objectives in this lab:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions

Please submit your answer book to Canvas by the deadline.

## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [32]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)
    
    return (train_x, ret_train_y), (test_x, ret_test_y)


(train_x, train_y), (test_x, test_y) = load_cifar10()

----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from:

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

 Train_x and test_x are arrays where each image has the shape (number_of_samples, 32, 32, 3). This means there are a number of images, each being 32x32 pixels with 3 color channels (RGB). The .reshape method changes the shape of these arrays. train_x.shape[0] and test_x.shape[0] give the number of images in the datasets. The number 3072 is used to flatten each image from a 32x32x3 matrix into a single vector of length 3072. This is necessary because dense neural networks work with 1D vectors rather than 3D images.




### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [33]:
""" 
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.regularizers import l2

model = Sequential()
model.add(Dense(1024, input_shape=(3072,), activation='relu'))
# model.add(Dense(512, activation='relu'))
model.add(Dense(10, activation='softmax'))

sgd = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=50, validation_data=(test_x, test_y), batch_size=32, verbose=2)

final_train_accuracy = history.history['accuracy'][-1]
final_val_accuracy = history.history['val_accuracy'][-1]
print(f"Final training accuracy: {final_train_accuracy}")
print(f"Final validation accuracy: {final_val_accuracy}")


Epoch 1/50
1563/1563 - 8s - 5ms/step - accuracy: 0.3458 - loss: 1.8454 - val_accuracy: 0.3821 - val_loss: 1.7319
Epoch 2/50
1563/1563 - 7s - 5ms/step - accuracy: 0.4173 - loss: 1.6666 - val_accuracy: 0.4288 - val_loss: 1.6168
Epoch 3/50
1563/1563 - 8s - 5ms/step - accuracy: 0.4430 - loss: 1.5900 - val_accuracy: 0.4599 - val_loss: 1.5509
Epoch 4/50
1563/1563 - 7s - 5ms/step - accuracy: 0.4634 - loss: 1.5337 - val_accuracy: 0.4515 - val_loss: 1.5468
Epoch 5/50
1563/1563 - 8s - 5ms/step - accuracy: 0.4804 - loss: 1.4926 - val_accuracy: 0.4753 - val_loss: 1.4959
Epoch 6/50
1563/1563 - 8s - 5ms/step - accuracy: 0.4955 - loss: 1.4549 - val_accuracy: 0.4681 - val_loss: 1.4857
Epoch 7/50
1563/1563 - 8s - 5ms/step - accuracy: 0.5041 - loss: 1.4238 - val_accuracy: 0.4811 - val_loss: 1.4612
Epoch 8/50
1563/1563 - 8s - 5ms/step - accuracy: 0.5146 - loss: 1.3944 - val_accuracy: 0.4850 - val_loss: 1.4547
Epoch 9/50
1563/1563 - 8s - 5ms/step - accuracy: 0.5213 - loss: 1.3673 - val_accuracy: 0.4943 - 

#### Question 2

Complete the following table on the design choices for your MLP:

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | SGD         | Specified in question |
| # of hidden layers   | 1           | Specified in question |
| # of hidden neurons  | 1024        | Specified in question |
| Hid layer activation | ReLU            | they are simple, introduce non-linearity, enable sparse activation, and avoid the vanishing gradient problem                      |
| # of output neurons  | 10            | There are 10 classes in the CIFAR-10 dataset                      |
| Output activation    | Softmax            | Softmax activation is used for multi-class classification                      |
| learning_rate        | 0.001            | Common initial learning rate for SGD                      |
| momentum             | 0.9            | Helps accelerate SGD in the relevant direction and dampen oscillations                      |
| loss                 |             | Common loss function for multi-class classification                      |


#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer:

**The Final training accuracy is 0.7359600067138672, the Final validation accuracy is 0.5307999849319458. There seems to be overfitting since the training accuracy is high while the validation accuracy is low.**


### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [35]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc. 

Train for 100 epochs.

"""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

model = Sequential()
model.add(Dense(2048, input_shape=(3072,), activation='relu'))
# model.add(Dense(1024, activation='relu', input_shape=(3072,)))
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

adam = Adam(learning_rate=0.001)
model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

history = model.fit(train_x, train_y, epochs=100, validation_data=(test_x, test_y), batch_size=32, verbose=1)

final_train_accuracy = history.history['accuracy'][-1]
final_val_accuracy = history.history['val_accuracy'][-1]
print(f"Final training accuracy: {final_train_accuracy}")
print(f"Final validation accuracy: {final_val_accuracy}")

Epoch 1/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m49s[0m 31ms/step - accuracy: 0.2509 - loss: 2.0488 - val_accuracy: 0.3360 - val_loss: 1.9329
Epoch 2/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m47s[0m 30ms/step - accuracy: 0.3694 - loss: 1.7483 - val_accuracy: 0.3956 - val_loss: 1.6759
Epoch 3/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 30ms/step - accuracy: 0.4103 - loss: 1.6553 - val_accuracy: 0.4262 - val_loss: 1.6168
Epoch 4/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 30ms/step - accuracy: 0.4268 - loss: 1.5939 - val_accuracy: 0.4481 - val_loss: 1.5498
Epoch 5/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 31ms/step - accuracy: 0.4451 - loss: 1.5481 - val_accuracy: 0.4504 - val_loss: 1.5318
Epoch 6/100
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 31ms/step - accuracy: 0.4636 - loss: 1.4956 - val_accuracy: 0.4515 - val_loss: 1.550

----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the learning_rate, momentum etc rows with parameters more appropriate to the optimizer that you have chosen.


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | Adam            | Adam optimizer generally provides good performance on various tasks                      |
| # of hidden layers   | 5            | 	Increasing the depth of the network can improve its capacity to learn complex patterns                      |
| # neurons(layer1)    | 2048            | Increasing the number of neurons increases the model's capacity                      |
| Hid layer1 activation| ReLU            | they are simple, introduce non-linearity, enable sparse activation, and avoid the vanishing gradient problem                      |
| # neurons(layer2)    | 1024            | Increasing the number of neurons increases the model's capacity                      |
| Hid layer2 activation| ReLU            | they are simple, introduce non-linearity, enable sparse activation, and avoid the vanishing gradient problem                      |
| # of output neurons  | 10            | There are 10 classes in the CIFAR-10 dataset                      |
| Output activation    | Softmax            | Softmax activation is used for multi-class classification                      |
| learning_rate        | 0.001            | Lower learning rate can provide more stable convergence                      |
| momentum             | N/A            | Not applicable for Adam optimizer                      |
| loss                 | Categorical Crossentropy            | Common loss function for multi-class classification                      |



#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer.

**The Final training accuracy is 0.8792399764060974, the Final validation accuracy is 0.4657999873161316. There seems to be still overfitting too, since the training accuracy is high while the validation accuracy is low.**


#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets.

The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 different classes, meaning each image has 3 color channels, resulting in 3072 input features (32x32x3) for a dense MLP. This high dimensionality poses significant challenges. The vast number of input features leads to an explosion in the number of parameters in the network, making it computationally expensive and memory-intensive to train. Each neuron in the hidden layer is connected to all 3072 input features, and with multiple layers, the number of parameters increases exponentially, requiring substantial computational resources and increasing the risk of overfitting, as the model may memorize the training data rather than generalizing to unseen data. Dense MLPs do not exploit the spatial structure of the images, unlike CNNs, which use convolutional layers to learn spatial hierarchies by focusing on local patterns and gradually combining them to understand more complex features. This lack of spatial awareness makes dense MLPs less effective at capturing the local and hierarchical patterns in the images, leading to poorer performance in image classification tasks. Additionally, the training process of dense MLPs on such high-dimensional data can be challenging due to vanishing and exploding gradient problems. As the error signal propagates back through many layers, it can diminish exponentially, making it difficult for the network to learn, or if the gradients are too large, they can cause numerical instability and hinder convergence.


----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [19]:
from keras.datasets import mnist
from keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')
    
    train_x /= 255.0
    test_x /= 255.0
        
    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)
        
    return (train_x, train_y), (test_x, test_y) 

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [24]:
# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

# MODEL_NAME = 'mnist-cnn.hd5'
MODEL_NAME = 'mnist-cnn.keras'


def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)                                                                                             
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model



----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean? 

The input_shape is set to (28, 28, 1) because the MNIST dataset consists of 28x28 pixel grayscale images. The 28, 28 represents the height and width of the images, and 1 represents the single color channel (grayscale).

padding='same' means that the convolutional layer will apply padding to the input images so that the output has the same height and width as the original image. This involves adding zeroes around the borders of the image before applying the convolution operation.


#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean? 

Other types of pooling layers include AveragePooling2D and GlobalAveragePooling2D.
+ AveragePooling2D: Computes the average value for each patch of the feature map.
+ GlobalAveragePooling2D: Reduces each feature map to a single value by taking the average over all spatial dimensions.

strides=2 means that the pooling window will move 2 pixels at a time across the input feature map. This reduces the spatial dimensions of the feature map by a factor of 2.

#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```

The Flatten layer converts the multi-dimensional output of the previous layers into a one-dimensional vector.

It is needed to transform the 2D matrix of features into a 1D vector so that it can be used as input to the fully connected (dense) layers that follow. Fully connected layers require a 1D input.

----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [25]:
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(learning_rate=0.01, momentum=0.7), 
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs, 
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What does do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```

---

+ min_delta: This parameter specifies the minimum change in the monitored quantity (e.g., validation loss) that qualifies as an improvement. If the change in the monitored quantity is less than min_delta, it is not considered as an improvement. In this case, min_delta=0.001 means that only changes in the validation loss greater than 0.001 will be considered as improvements.
+ patience: This parameter specifies the number of epochs with no improvement after which training will be stopped. If the monitored quantity does not improve for patience number of epochs, training is halted. Here, patience=10 means that if there is no improvement in the validation loss for 10 consecutive epochs, training will stop.

### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [26]:
    (train_x, train_y),(test_x, test_y) = load_mnist()
    model = buildmodel(MODEL_NAME)
    train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)
    

Starting training.
Epoch 1/5


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.7412 - loss: 0.7860 - val_accuracy: 0.9714 - val_loss: 0.0890
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9739 - loss: 0.0811 - val_accuracy: 0.9791 - val_loss: 0.0600
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9850 - loss: 0.0476 - val_accuracy: 0.9847 - val_loss: 0.0459
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9884 - loss: 0.0369 - val_accuracy: 0.9880 - val_loss: 0.0355
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 7ms/step - accuracy: 0.9909 - loss: 0.0284 - val_accuracy: 0.9862 - val_loss: 0.0447
Done. Now evaluating.
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.9826 - loss: 0.0604
Test accuracy: 0.99, loss: 0.04


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)?

Dense MLPs consist of layers where each neuron is connected to every neuron in the subsequent layer. This dense connectivity allows them to learn complex patterns in the data but at the cost of a large number of parameters, making them computationally expensive and prone to overfitting, especially with high-dimensional input data like images.

CNNs are particularly well-suited for image and spatial data due to their ability to capture spatial hierarchies and local features. This makes CNNs generally better at image classification and object detection tasks compared to Dense MLPs. However, CNNs can be less effective for tasks where spatial relationships are less important or for data that isn't grid-like, where Dense MLPs might still be preferable.



## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the CIFAR-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

| Hyperparameter       | What I used      | Why?                                                      |
|:---------------------|:-----------------|:----------------------------------------------------------|
| Optimizer            | Adam             | Adam optimizer generally provides good performance        |
| Input shape          | (32, 32, 3)      | CIFAR-10 images are 32x32 pixels with 3 color channels    |
| First layer          | Conv2D(32)       | Convolutional layer with 32 filters and ReLU activation   |
| Second layer         | MaxPooling2D(2,2)| Pooling layer to reduce spatial dimensions                |
| Third layer          | Conv2D(64)       | Convolutional layer with 64 filters and ReLU activation   |
| Fourth layer         | MaxPooling2D(2,2)| Pooling layer to reduce spatial dimensions                |
| Fifth layer          | Conv2D(128)      | Convolutional layer with 128 filters and ReLU activation  |
| Sixth layer          | MaxPooling2D(2,2)| Pooling layer to reduce spatial dimensions                |
| Dense layer          | Dense(512)       | Fully connected layer with 512 neurons and ReLU activation|
| Output layer         | Dense(10)        | Fully connected layer with 10 neurons for 10 classes      |





In [27]:
"""
Write your code for your CNN for the CIFAR-10 dataset here. 

Note: train_x, train_y, test_x, test_y were changed when we called 
load_mnist in the previous section. You will now need to call load_cifar10
again.

"""

from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, ModelCheckpoint
import os

# Define the filename to save the model
MODEL_NAME = 'cifar.keras'

# Function to load CIFAR-10 dataset
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 32, 32, 3)
    test_x = test_x.reshape(test_x.shape[0], 32, 32, 3)
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    train_x /= 255.0
    test_x /= 255.0
    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)
    
    return (train_x, train_y), (test_x, test_y)

(train_x, train_y), (test_x, test_y) = load_cifar10()

# Function to build the CNN model
def build_model(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3), padding='same'))
        model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
        model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
        model.add(Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(MaxPooling2D(pool_size=(2, 2), strides=2))
        model.add(Flatten())
        model.add(Dense(512, activation='relu'))
        model.add(Dropout(0.5))
        model.add(Dense(10, activation='softmax'))
    
    return model

# Function to train the CNN model
def train(model, train_x, train_y, epochs, test_x, test_y, model_name):
    model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
    
    savemodel = ModelCheckpoint(model_name, save_best_only=True)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10)

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32, validation_data=(test_x, test_y), shuffle=True, epochs=epochs, callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f" % (acc, loss))

# Build and train the model
model = build_model(MODEL_NAME)
train(model, train_x, train_y, 50, test_x, test_y, MODEL_NAME)


Starting training.
Epoch 1/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.3740 - loss: 1.7110 - val_accuracy: 0.5686 - val_loss: 1.2052
Epoch 2/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.6226 - loss: 1.0674 - val_accuracy: 0.6832 - val_loss: 0.9077
Epoch 3/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.6880 - loss: 0.8853 - val_accuracy: 0.7194 - val_loss: 0.8319
Epoch 4/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.7333 - loss: 0.7599 - val_accuracy: 0.7208 - val_loss: 0.8049
Epoch 5/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 7ms/step - accuracy: 0.7620 - loss: 0.6717 - val_accuracy: 0.7374 - val_loss: 0.7648
Epoch 6/50
[1m1563/1563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.7821 - loss: 0.6054 - val_accuracy: 0.7422 - val_loss