# SWS3009 Lab 3 Introduction to Deep Learning


| group9   | name                |
|----------|---------------------|
| member1  | 贾世安(JIA SHIAN)    |
| member2  | 陶毅诚(TAO YICHENG)  |

This lab should be done by both Deep Learning members of the team. Please ensure that you fill in the names of <b>both</b> team members in the spaces above. Answer <b>all</b> your questions on <b>this Python Notebook.</b>

## Submission Instructions

Please submit this Python notebook to Canvas on the deadline provided.

Marks will be awarded as follows:

**0 marks**: No/empty/Non-English submission

**1 mark** : Poor submission

**2 marks**: Acceptable submission

**3 marks**: Good submission


## 1. Introduction

We will achieve the following objectives in this lab:

    1. An understanding of the practical limitations of using dense networks in complex tasks
    2. Hands-on experience in building a deep learning neural network to solve a relatively complex task.
    

Each step may take a long time to run. You and your partner may want to work out how to do things simultaneously, but please do not miss out on any learning opportunities.


## 2. Submission Instructions

Please submit your answer book to Canvas by the deadline.

## 3. Creating a Dense Network for CIFAR-10

We will now begin building a neural network for the CIFAR-10 dataset. The CIFAR-10 dataset consists of 50,000 32x32x3 (32x32 pixels, RGB channels) training images and 10,000 testing images (also 32x32x3), divided into the following 10 categories:

    1. Airplane
    2. Automobile
    3. Bird
    4. Cat
    5. Deer
    6. Dog
    7. Frog
    8. Horse
    9. Ship
    10. Truck
    
In the first two parts of this lab we will create a classifier for the CIFAR-10 dataset.

### 3.1 Loading the Dataset

We begin firstly by creating a Dense neural network for CIFAR-10. The code below shows how we load the CIFAR-10 dataset:


In [None]:
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10():
    (train_x, train_y), (test_x, test_y) = cifar10.load_data()
    # reshape the image data from 3D arrays (32x32x3) to 2D arrays (3072 elements) to prepare them for further processing.
    train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
    test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
    # The image data (train_x and test_x) is then converted to float32 data type.
    train_x = train_x.astype('float32')
    test_x = test_x.astype('float32')
    # The pixel values of the image data are normalized by dividing them by 255.0
    # which scales the values between 0 and 1.
    train_x /= 255.0
    test_x /= 255.0
    # The labels (train_y and test_y) are one-hot encoded using to_categorical function from tensorflow.keras.utils.
    # This converts the label values from integers to binary vectors of size 10, representing the 10 classes in CIFAR-10.
    ret_train_y = to_categorical(train_y,10)
    ret_test_y = to_categorical(test_y, 10)

    return (train_x, ret_train_y), (test_x, ret_test_y)


(train_x, train_y), (test_x, test_y) = load_cifar10()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


----

#### Question 1

Explain what the following two  statements do, and where the number "3072" came from:

```
  train_x = train_x.reshape(train_x.shape[0], 3072) # Question 1
  test_x = test_x.reshape(test_x.shape[0], 3072) # Question 1
```

**Please put your answers in the attached answer books**

what the following two  statements do: The two statements reshape the image data from 3D arrays of shape (32, 32, 3) to 2D arrays of shape (3072,)

where the number "3072" came from: 3072 = 32x32x3, since the reshape function reshapes the image data from 3D arrays (32x32x3) to 2D arrays (3072 elements) to prepare them for further processing.


### 3.2 Building the MLP Classifier

In the code box below, create a new fully connected (dense) multilayer perceptron classifier for the CIFAR-10 dataset. To begin with, create a network with one hidden layer of 1024 neurons, using the SGD optimizer. You should output the training and validation accuracy at every epoch, and train for 50 epochs:


In [None]:
"""
Write your code to build an MLP with one hidden layer of 1024 neurons,
with an SGD optimizer. Train for 50 epochs, and output the training and
validation accuracy at each epoch.
"""
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.models import load_model
import os

model_name = 'MLP_1'

def BuildMLPModel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        # The Sequential class is a linear stack of layers in Keras
        # which means you can easily add layers one by one.
        model = Sequential()
        # input layer
        #model.add(Dense(1 , activation='relu', input_shape = (3072, 1)))

        # hidden layer
        # input_shape=(None, 1, 3072) 指定输入层的形状。这里的输入形状是(None, 1, 3072)，
        # 其中None表示可以接受任意数量的样本，1表示每个样本有一个维度，3072表示每个样本的特征维度为3072。
        model.add(Dense(1024, input_shape=(None, 1, 3072), activation='relu'))
        # output layer
        model = Dense(10, activation='softmax')(model)
        # model.add(Dense(10, activation='softmax'))
    return model


def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(learning_rate=0.01, weight_decay = 1e-6, momentum=0.7),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

epochs = 50
model = BuildMLPModel(model_name)
train(model, train_x, train_y, epochs, test_x, test_y, model_name)


Starting training.
Epoch 1/50



Epoch 2/50



Epoch 3/50



Epoch 4/50



Epoch 5/50



Epoch 6/50



Epoch 7/50



Epoch 8/50



Epoch 9/50



Epoch 10/50



Epoch 11/50



Done. Now evaluating.
Test accuracy: 0.53, loss: 1.51


#### Question 2

Complete the following table on the design choices for your MLP:

| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            | SGD         | Specified in question |
| # of hidden layers   | 1           | Specified in question |
| # of hidden neurons  | 1024        | Specified in question |
| Hid layer activation | relu        | it is normally used              |
| # of output neurons  | 10          | it is normally used              |
| Output activation    | softmax     | it is normally used              |
| lr                   | 0.01        | it is normally used              |
| momentum             | 0.7         | it is normally used              |
| decay                | 1e-6        | it is normally used              |
| loss                 |categorical_crossentropy| it is normally used              |


#### Question 3:

What was your final training accuracy? Validation accuracy? Is there overfitting / underfitting? Explain your answer:

***PLACE YOUR ANSWER HERE ***
final training accuracy is 0.7127, validation accuracy is 0.5265
there exists some overfitting since the validation accuracy is about 74% of the training accuracy

### 3.3 Experimenting with the MLP

Cut and paste your code from Section 3.2 to the box below (you may need to rename your MLP). Experiment with the number of hidden layers, the number of neurons in each hidden layer, the optimization algorithm, etc. See [Keras Optimizers](https://keras.io/optimizers) for the types of optimizers and their parameters. **Train for 100 epochs.**


In [None]:
"""
Cut and paste your code from Section 3.2 below, then modify it to get
much better results than what you had earlier. E.g. increase the number of
nodes in the hidden layer, increase the number of hidden layers,
change the optimizer, etc.

Train for 100 epochs.

"""


from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.keras.models import load_model
import os

model_name = 'MLP_2'

def BuildMLPModel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        # The Sequential class is a linear stack of layers in Keras
        # which means you can easily add layers one by one.
        model = Sequential()
        # input layer
        #model.add(Dense(1 , activation='relu', input_shape = (3072, 1)))

        # hidden layer
        # input_shape=(None, 1, 3072) 指定输入层的形状。这里的输入形状是(None, 1, 3072)，
        # 其中None表示可以接受任意数量的样本，1表示每个样本有一个维度，3072表示每个样本的特征维度为3072。
        model.add(Dense(1024, input_shape=(None, 1, 3072), activation='relu'))
        model.add(Dense(32, activation='relu'))
        model.add(Dense(512, activation='relu'))
        # output layer
        model.add(Dense(10, activation='softmax'))
        # model.add(Dense(10, activation='softmax'))
    return model


def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(learning_rate=0.0005, weight_decay = 0.00001, momentum=0.7),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

epochs = 100
model = BuildMLPModel(model_name)
train(model, train_x, train_y, epochs, test_x, test_y, model_name)



Starting training.
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Done. Now evaluating.
Test accuracy: 0.53, loss: 1.40


----

#### Question 4:

Complete the following table with your final design (you may add more rows for the # neurons (layer1) etc. to detail how many neurons you have in each hidden layer). Likewise you may replace the lr, momentum etc rows with parameters more appropriate to the optimizer that you have chosen.


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |SGD          |it is fast and also useful|
| # of hidden layers   |3            |3 is ok                |
| # neurons(layer1)    |1024         |it is ok               |
| Hid layer1 activation|relu         |  useful               |
| # neurons(layer2)    |32           |just try               |
| Hid layer2 activation|relu         |it is ok               |
| # neurons(layer2)    |512          |just try               |
| Hid layer2 activation|relu         |it is ok               |
| # of output neurons  |10           |we have 10 classes     |
| Output activation    |softmax      |it is a classification problem|
| lr                   |0.0005       |try many times, it is not so bad|
| momentum             |0.7          |try many times, it is not so bad|
| decay                |0.00001      |try many times, it is not so bad|
| loss                 |categorical_crossentropy|it is ok         |



#### Question 5

What is the final training and validation accuracy that you obtained after 150 epochs. Is there considerable improvement over Section 3.2? Are there still signs of underfitting or overfitting? Explain your answer.

***Write your answers here***
there is no considerable improvement over section 3.2, there still exists some sign of overfitting, since the training acc is 0.67 and the validation acc is 0.53

#### Question 6

Write a short reflection on the practical difficulties of using a dense MLP to classsify images in the CIFAR-10 datasets.

***Write your answers here***
the result of using a dense MLP to classify images in the CIFAR-10 datasets is not very good, and the results exists sign of overfitting, although we have tried many differnet kinds of parameters, the result has no big difference

----

## 4. Creating a CNN for the MNIST Data Set

In this section we will now create a convolutional neural network (CNN) to classify images in the MNIST dataset that we used in the previous lab. Let's go through each part to see how to do this.

### 4.1 Loading the MNIST Dataset

As always we will load the MNIST dataset, scale the inputs to between 0 and 1, and convert the Y labels to one-hot vectors. However unlike before we will not flatten the 28x28 image to a 784 element vector, since CNNs can inherently handle 2D data.

In [None]:
from keras.datasets import mnist
from keras.utils import to_categorical

def load_mnist():
    (train_x, train_y),(test_x, test_y) = mnist.load_data()
    train_x = train_x.reshape(train_x.shape[0], 28, 28, 1)
    test_x = test_x.reshape(test_x.shape[0], 28, 28, 1)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')

    train_x /= 255.0
    test_x /= 255.0

    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)

    return (train_x, train_y), (test_x, test_y)

### 4.2 Building the CNN

We will now build the CNN. Unlike before we will create a function to produce the CNN. We will also look at how to save and load Keras models using "checkpoints", particularly "ModelCheckpoint" that saves the model each epoch.

Let's begin by creating the model. We call os.path.exists to see if a model file exists, and call "load_model" if it does. Otherwise we create a new model.



In [None]:
# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn.hd5'

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7

        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(128, kernel_size=(5,5), activation='relu'))
        model.add(Conv2D(64, kernel_size=(5,5), activation='relu'))
        model.add(MaxPooling2D(pool_size=(2,2), strides=2))
        model.add(Flatten()) # Question 9
        model.add(Dense(1024, activation='relu'))
        model.add(Dropout(0.1))
        model.add(Dense(10, activation='softmax'))

    return model




----

#### Question 7

The first layer in our CNN is a 2D convolution kernel, shown here:

```
        model.add(Conv2D(32, kernel_size=(5,5),
        activation='relu',
        input_shape=(28, 28, 1), padding='same')) # Question 7
```

Why is the input_shape set to (28, 28, 1)? What does this mean? What does "padding = 'same'" mean?

***Write your answer here***
`input_shape=(28, 28, 1)` specifies the shape of the input data. The first two dimensions (28, 28) represent the height and width of the input image, while the last dimension 1 indicates a single channel, representing grayscale images.

`padding='same'` determines the padding strategy for the convolutional layer. 'same' padding means that the output feature maps will have the same spatial dimensions as the input feature maps. And the input image is padded with zeros on the borders if necessary to maintain the same spatial dimensions. This ensures that the output feature maps have the same height and width as the input feature maps.

#### Question 8

The second layer is the MaxPooling2D layer shown below:

```
        model.add(MaxPooling2D(pool_size=(2,2), strides=2)) # Question 8
```

What other types of pooling layers are available? What does 'strides = 2' mean?

***Write your answer here***
1. other types of pooling layers that are available including AveragePooling2D、GlobalAveragePooling2D、GlobalMaxPooling2D and so on
2. `strides=2` means that the pooling window moves by 2 units horizontally and vertically.

#### Question 9

What does the "Flatten" layer here do? Why is it needed?

```
        model.add(Flatten()) # Question 9
```

***Write your answer here***
In a CNN, the earlier layers typically consist of convolutional and pooling layers that preserve the spatial structure of the input data. However, the subsequent layers are often fully connected layers that expect a 1D vector as input. The Flatten layer bridges the gap between the convolutional layers and the fully connected layers by flattening the output of the previous layers into a 1D vector, by flattening the output, the subsequent fully connected layers can process the data as a traditional feedforward neural network. These fully connected layers are typically responsible for learning higher-level abstractions and making predictions.


----

### 4.3 Training the CNN

Let's now train the CNN. In this example we introduce the idea of a "callback", which is a routine that Keras calls at the end of each epoch. Specifically we look at two callbacks:

    1. ModelCheckpoint: When called, Keras saves the model to the specified filename.
    
    2. EarlyStopping: When called, Keras checks if it should stop the training prematurely.
    

Let's look at the code to see how training is done, and how callbacks are used.

In [None]:
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(lr=0.01, momentum=0.7),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))

Notice that there isn't very much that is unusual going on; we compile the model with our loss function and optimizer, then call fit, and finally evaluate to look at the final accuracy for the test set.  The only thing unusual is the "callbacks" parameter here in the fit function call

```
    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])
```

----

#### Question 10.

What do the min_delta and patience parameters do in the EarlyStopping callback, as shown below? (2 MARKS)

```
    stopmodel = EarlyStopping(min_delta=0.001, patience=10) # Question 10
```

`min_delta`: The min_delta parameter specifies the minimum change in the monitored quantity that is considered as an improvement. If the improvement in the monitored quantity is less than min_delta, it is not considered significant, and the training process continues. On the other hand, if the improvement is greater than or equal to min_delta, it is considered significant, and the training continues. The min_delta value is typically set based on the desired sensitivity to changes and the scale of the monitored quantity.

`patience`: The patience parameter determines the number of epochs to wait before stopping the training process if there is no significant improvement in the monitored quantity. It measures the number of epochs with no improvement before the training is halted. If, after patience epochs, the monitored quantity does not improve by at least min_delta, the training process is stopped early

---

### 4.4 Putting it together.

Now let's run the code and see how it goes (Note: To save time we are training for only 5 epochs; we should train much longer to get much better results):

In [None]:
    (train_x, train_y),(test_x, test_y) = load_mnist()
    model = buildmodel(MODEL_NAME)
    train(model, train_x, train_y, 5, test_x, test_y, MODEL_NAME)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


  super().__init__(name, **kwargs)


Starting training.
Epoch 1/5



Epoch 2/5



Epoch 3/5



Epoch 4/5



Epoch 5/5



Done. Now evaluating.
Test accuracy: 0.99, loss: 0.04


----

#### Question 11.

Compare the relative advantages and disadvantages of CNN vs. the Dense MLP that you build in sections 3.2 and 3.3. What makes CNNs better (or worse)?

***Type your answers here***
+ Advantages of CNNs to make CNNs better:
1. Spatial feature extraction: CNNs are designed to efficiently capture spatial relationships and extract meaningful features from images or other grid-like data. The use of convolutional layers with shared weights enables the model to identify local patterns and hierarchically learn more complex representations.
2. Translation invariance: CNNs are able to recognize patterns irrespective of their location in the input. This translation invariance property makes CNNs robust to spatial transformations, such as image translations, rotations, and scale changes
3. Parameter sharing: CNNs have a parameter sharing mechanism, where the same set of weights is applied to different parts of the input. This sharing of parameters helps reduce the number of parameters and makes the model more efficient and effective in learning from limited data.
4. Hierarchical representation learning: CNNs can learn hierarchical representations of the input data by stacking multiple convolutional and pooling layers. Lower layers learn low-level features like edges and textures, while higher layers learn more abstract and high-level representations.
5. Reduced overfitting: The pooling and down-sampling operations in CNNs help reduce the spatial dimensions of the feature maps, which can prevent overfitting by providing a form of regularization. Additionally, the use of dropout and regularization techniques in CNNs further aids in reducing overfitting.

+ Disadvantages of CNNs to make CNNs worse:

1. Limited interpretability: Due to their complex architecture and hierarchical feature extraction, interpreting the learned features and understanding the decision-making process in CNNs can be challenging. Dense MLPs, with their fully connected layers, provide more direct interpretability.

## 5. Making a CNN for the CIFAR-10 Dataset

Now comes the fun part: Using the example above for creating a CNN for the MNIST dataset, now create a CNN in the box below for the CIFAR-10 dataset. At the end of each epoch save the model to a file called "cifar.hd5" (note: the .hd5 is added automatically for you).

---

#### Question 12.

Summarize your design in the table below (the actual coding cell comes after this):

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3), padding='same'))
        model.add(BatchNormalization())  # Adding Batch Normalization
        model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Flatten())
        model.add(Dense(512, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))
        model.add(Dense(10, activation='softmax'))


| Hyperparameter       | What I used | Why?                  |
|:---------------------|:------------|:----------------------|
| Optimizer            |SGD          |it's ok compared with other kinds of optimizer    |
| Input shape          |32x32x3      |The CIFAR-10 dataset consists of color images,<br> where each image has a height and width of 32 pixels. |
| First layer          |Conv2D       |just have a try        |
| Second layer         |MaxPooling2D |just have a try        |
| Add more layers      |Conv2D       |just have a try        |
| Add more layers      |Conv2D       |just have a try        |
| Add more layers      |MaxPooling2D |just have a try        |
| Add more layers      |Flatten      |just have a try        |
| Dense layer          |Dense(512, activation='relu')|just have a try        |
|Output layer          |Dense(10, activation='softmax')|use softmax for 10 classes|




In [None]:
"""
Write your code for your CNN for the CIFAR-10 dataset here.

Note: train_x, train_y, test_x, test_y were changed when we called
load_mnist in the previous section. You will now need to call load_cifar10
again.

"""

from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

def load_cifar10_1():
    (train_x, train_y),(test_x, test_y) = cifar10.load_data()
    train_x = train_x.reshape(train_x.shape[0], 32, 32, 3)
    test_x = test_x.reshape(test_x.shape[0], 32, 32, 3)

    train_x=train_x.astype('float32')
    test_x = test_x.astype('float32')

    train_x /= 255.0
    test_x /= 255.0

    train_y = to_categorical(train_y, 10)
    test_y = to_categorical(test_y, 10)

    return (train_x, train_y), (test_x, test_y)



# load_model loads a model from a hd5 file.
from keras.models import Sequential, load_model
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
import os

MODEL_NAME = 'mnist-cnn_2.hd5'


from tensorflow.keras.layers import Dropout, BatchNormalization

def buildmodel(model_name):
    if os.path.exists(model_name):
        model = load_model(model_name)
    else:
        model = Sequential()
        model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3), padding='same'))
        model.add(BatchNormalization())  # Adding Batch Normalization
        model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
        model.add(MaxPooling2D(pool_size=(2, 2)))
        model.add(Dropout(0.25))

        model.add(Flatten())
        model.add(Dense(512, activation='relu'))
        model.add(BatchNormalization())
        model.add(Dropout(0.5))
        model.add(Dense(10, activation='softmax'))

    return model





from keras.optimizers import SGD
from keras.callbacks import EarlyStopping, ModelCheckpoint

def train(model, train_x, train_y, epochs, test_x, test_y, model_name):

    model.compile(optimizer=SGD(lr=0.01, momentum=0.7),
                  loss='categorical_crossentropy', metrics=['accuracy'])

    savemodel = ModelCheckpoint(model_name)
    stopmodel = EarlyStopping(min_delta=0.001, patience=10)

    print("Starting training.")

    model.fit(x=train_x, y=train_y, batch_size=32,
    validation_data=(test_x, test_y), shuffle=True,
    epochs=epochs,
    callbacks=[savemodel, stopmodel])

    print("Done. Now evaluating.")
    loss, acc = model.evaluate(x=test_x, y=test_y)
    print("Test accuracy: %3.2f, loss: %3.2f"%(acc, loss))


epoch = 50
(train_x, train_y), (test_x, test_y) = load_cifar10_1()
model = buildmodel(MODEL_NAME)
train(model, train_x, train_y, epoch, test_x, test_y, MODEL_NAME)


  super().__init__(name, **kwargs)


Starting training.
Epoch 1/50



Epoch 2/50



Epoch 3/50



Epoch 4/50



Epoch 5/50



Epoch 6/50



Epoch 7/50



Epoch 8/50



Epoch 9/50



Epoch 10/50



Epoch 11/50



Epoch 12/50



Epoch 13/50



Epoch 14/50



Epoch 15/50



Epoch 16/50



Epoch 17/50



Epoch 18/50



Epoch 19/50



Epoch 20/50



Epoch 21/50



Epoch 22/50



Epoch 23/50



Epoch 24/50



Epoch 25/50



Epoch 26/50



Epoch 27/50



Epoch 28/50



Epoch 29/50



Epoch 30/50



Epoch 31/50



Epoch 32/50



Epoch 33/50



Done. Now evaluating.
Test accuracy: 0.83, loss: 0.55
