<a href="https://colab.research.google.com/github/KeqingW44448/api/blob/main/RSM8421/Assignments/Assignment%20Two/Problem_3_(28_marks).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem 3

The objective of this problem is to build a convolutional neural network (CNN) to classify images as *hot dog* or *not hot dog*. This is the same task popularized in the HBO TV show *Silicon Valley*  
([video link](https://www.youtube.com/watch?v=pqTntG1RXSY)).  

We will use the dataset created by a Kaggle user
([Hot Dog / Not Hot Dog dataset](https://www.kaggle.com/dansbecker/hot-dog-not-hot-dog)),  
which contains 498 training images and 500 test images.

There are two parts to this assignment:
1. A simple CNN is provided below. Due to the small dataset, it achieves a poor test set accuracy (approximately 55%). Your task is to design and train a CNN that achieves **at least 75% test set accuracy**.  
2. Describe three modifications you made beyond what is provided in the notebook, and explain the effect of each change on test set accuracy.

### Submission

Submit the completed and executed notebook on Quercus, showing your best test set accuracy. A friendly in-class competition will be held to see who can achieve the highest accuracy---with bonus points and bragging rights.

## Student Information
**Name:** Keqing Wang
**Student ID:** 1006927337

## Code

In [None]:
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Rescaling
from keras.layers import Input, Activation, Dropout, Flatten, Dense
from keras.utils import image_dataset_from_directory

## Loading Hotdog-Not-Hotdog Dataset

In [None]:
# Download files
!wget --quiet https://github.com/tdmdal/datasets-teaching/raw/main/hotdog/hotdog.tar.gz
!tar -xvzf hotdog.tar.gz

hotdog/
hotdog/test/
hotdog/test/hot_dog/
hotdog/test/hot_dog/324507.jpg
hotdog/test/hot_dog/800992.jpg
hotdog/test/hot_dog/716049.jpg
hotdog/test/hot_dog/588881.jpg
hotdog/test/hot_dog/570799.jpg
hotdog/test/hot_dog/838604.jpg
hotdog/test/hot_dog/315220.jpg
hotdog/test/hot_dog/612440.jpg
hotdog/test/hot_dog/250715.jpg
hotdog/test/hot_dog/292683.jpg
hotdog/test/hot_dog/291354.jpg
hotdog/test/hot_dog/380963.jpg
hotdog/test/hot_dog/533521.jpg
hotdog/test/hot_dog/558890.jpg
hotdog/test/hot_dog/408504.jpg
hotdog/test/hot_dog/201986.jpg
hotdog/test/hot_dog/382188.jpg
hotdog/test/hot_dog/752871.jpg
hotdog/test/hot_dog/225367.jpg
hotdog/test/hot_dog/147874.jpg
hotdog/test/hot_dog/829968.jpg
hotdog/test/hot_dog/207335.jpg
hotdog/test/hot_dog/388733.jpg
hotdog/test/hot_dog/398941.jpg
hotdog/test/hot_dog/593867.jpg
hotdog/test/hot_dog/453463.jpg
hotdog/test/hot_dog/677481.jpg
hotdog/test/hot_dog/807481.jpg
hotdog/test/hot_dog/650514.jpg
hotdog/test/hot_dog/628106.jpg
hotdog/test/hot_dog/701201.j

In [None]:
# Re-sized dimensions of our images.
img_height, img_width = 150, 150

train_data_dir = 'hotdog/train'
test_data_dir = 'hotdog/test'

input_shape = (img_height, img_width, 3)

## Model

In [None]:
def mymodel():
    ''' Improve this model!
    '''
    model = Sequential()
    model.add(Input(shape=input_shape))
    model.add(Rescaling(1./255))  # rescale the input image using a Rescaling layer
    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(64, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten())
    model.add(Dense(64))
    model.add(Activation('relu'))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    model.compile(loss='binary_crossentropy', metrics=['accuracy'],
                  optimizer='rmsprop')

    return model

# Test function
mymodel().summary()

### Loading Data on the Fly

We load the data directly from disk using the Keras utility function `image_dataset_from_directory`. This function also resizes the images to the specified dimensions (`img_height` × `img_width`, here 150 × 150).  

During training, images are read from disk in batches, loaded into memory, and resized on the fly.

In [None]:
# You may optionally change these parameters
batch_size = 50
epochs = 10

# Load images from disk on the fly (DO NOT MODIFY)
# Training dataset
train_dataset = image_dataset_from_directory(
    train_data_dir,
    labels='inferred',
    label_mode='binary',
    image_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=True
)

# Test dataset
test_dataset = image_dataset_from_directory(
    test_data_dir,
    labels='inferred',
    label_mode='binary',
    image_size=(img_height, img_width),
    batch_size=batch_size,
    shuffle=False
)

Found 498 files belonging to 2 classes.
Found 500 files belonging to 2 classes.


In [None]:
def evaluate_model(runs=5):
    ''' DO NOT MODIFY THIS FUNCTION '''
    scores = []
    for i in range(runs):
        print('Executing run %d' % (i+1))
        model = mymodel()
        model.fit(train_dataset,
                  epochs=epochs,
                  verbose=0,
                  callbacks=[])
        print(' * Evaluating model on test set')
        scores.append(model.evaluate(test_dataset, verbose=0))
        print(' * Test set Loss: %.4f, Accuracy: %.4f' % (scores[-1][0], scores[-1][1]))

    accuracies = [score[1] for score in scores]
    return np.mean(accuracies), np.std(accuracies)

mean_accuracy, std_accuracy = evaluate_model(runs=5)

Executing run 1
 * Evaluating model on test set
 * Test set Loss: 0.6848, Accuracy: 0.5660


In [None]:
# You will be evaluated on your mean test set accuracy over 5 runs
print ('Mean test set accuracy over 5 runs: %.4f +/- %.4f' % (mean_accuracy, std_accuracy))

Mean test set accuracy over 5 runs: 0.5660 +/- 0.0000


## Question

Describe three modifications you applied to your network that are not already included in this notebook. For each change (1–3 paragraphs), explain what you did and the effect it had on test set performance.  

Note: Not all of your modifications need to be part of your final model—for example, some changes may have reduced test performance.  

1. **Change One:**   Data Augumentation Kera regulization
2. **Change Two:**  
3. **Change Three:** Batch Normalization Method