## 8.15 Homework 8: Neural Networks and Deep Learning

Note: it's very likely that in this homework your answers won't match the options exactly. That's okay and expected. Select the option that's closest to your solution.

### Dataset

In this homework, we'll build a model for classifying various hair types. For this, we will use the Hair Type dataset that was obtained from Kaggle and slightly rebuilt: 

    https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset

You can download the target dataset for this homework from here: https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip

In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch.

### Data Preparation

The dataset contains around 1000 images of hairs in the separate folders for training and test sets.

### Reproducibility

Reproducibility in deep learning is a multifaceted challenge that requires attention to both software and hardware details. In some cases, we can't guarantee exactly the same results during the same experiment runs. Therefore, in this homework we suggest to:

- install tensorflow version 2.17.1
- set the seed generators by:

In [5]:
import numpy as np
import tensorflow as tf

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

tf.__version__

'2.18.0'

### Model

For this homework we will use Convolutional Neural Network (CNN). Like in the lectures, we'll use Keras.

You need to develop the model with following structure:
- The shape for input should be (200, 200, 3)
- Next, create a convolutional layer (Conv2D):
    - Use 32 filters
    - Kernel size should be (3, 3) (that's the size of the filter)
    - Use 'relu' as activation
- Reduce the size of the feature map with max pooling (MaxPooling2D)
    - Set the pooling size to (2, 2)
- Turn the multi-dimensional result into vectors using a Flatten layer
- Next, add a Dense layer with 64 neurons and 'relu' activation
- Finally, create the Dense layer with 1 neuron - this will be the output
    - The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use SGD with the following parameters:
- SGD(lr=0.002, momentum=0.8)

### Question 1

Since we have a binary classification problem, what is the best loss function for us?
- mean squared error
- binary crossentropy
- categorical crossentropy
- cosine similarity

Note: since we specify an activation for the output layer, we don't need to set from_logits=True

In [8]:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import SGD

In [9]:
# Define the CNN model
def create_cnn_model():
    model = Sequential([
        # Explicit Input layer
        Input(shape=(200, 200, 3)),
        # Convolutional layer
        Conv2D(32, kernel_size=(3, 3), activation='relu'),
        # MaxPooling layer
        MaxPooling2D(pool_size=(2, 2)),
        # Flatten layer
        Flatten(),
        # Fully connected Dense layer
        Dense(64, activation='relu'),
        # Output Dense layer for binary classification
        Dense(1, activation='sigmoid')
    ])
    return model

# Create the model
model = create_cnn_model()

# Compile the model
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

# Display model summary
model.summary()


#### Question 1 - Answer: binary crossentropy

### Question 2

What's the total number of parameters of the model? You can use the summary method for that.
- 896
- 11214912
- 15896912
- 20072512

#### Question 2 - Answer: 20072512

### Generators and Training
For the next two questions, use the following data generator for both train and test sets:

    ImageDataGenerator(rescale=1./255)

- We don't need to do any additional pre-processing for the images.
- When reading the data from train/test directories, check the class_mode parameter. Which value should it be for a binary classification problem?
- Use batch_size=20
- Use shuffle=True for both training and test sets.

For training use .fit() with the following params:

    model.fit(
        train_generator,
        epochs=10,
        validation_data=test_generator
    )

In [14]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [15]:
# Data Generators
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

# Load train and test data
train_generator = train_datagen.flow_from_directory(
    './data/train/',
    target_size=(200, 200),
    batch_size=20,
    class_mode='binary',
    shuffle=True
)

test_generator = test_datagen.flow_from_directory(
    './data/test/',
    target_size=(200, 200),
    batch_size=20,
    class_mode='binary',
    shuffle=True
)

# Training the model
model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)

Found 800 images belonging to 2 classes.
Found 201 images belonging to 2 classes.


  self._warn_if_super_not_called()


Epoch 1/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 300ms/step - accuracy: 0.5259 - loss: 0.8649 - val_accuracy: 0.6119 - val_loss: 0.6928
Epoch 2/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 295ms/step - accuracy: 0.5610 - loss: 0.6926 - val_accuracy: 0.5821 - val_loss: 0.6919
Epoch 3/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 305ms/step - accuracy: 0.6296 - loss: 0.6904 - val_accuracy: 0.6418 - val_loss: 0.6880
Epoch 4/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 298ms/step - accuracy: 0.6290 - loss: 0.6848 - val_accuracy: 0.5323 - val_loss: 0.6809
Epoch 5/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 289ms/step - accuracy: 0.6118 - loss: 0.6680 - val_accuracy: 0.5821 - val_loss: 0.6607
Epoch 6/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 291ms/step - accuracy: 0.6838 - loss: 0.6263 - val_accuracy: 0.6418 - val_loss: 0.6278
Epoch 7/10
[1m40/40[

<keras.src.callbacks.history.History at 0x242ad113c90>

### Question 3

What is the median of training accuracy for all the epochs for this model?
- 0.10
- 0.32
- 0.50
- 0.72

### Question 4

What is the standard deviation of training loss for all the epochs for this model?
- 0.028
- 0.068
- 0.128
- 0.168

In [18]:
import numpy as np

# Training the model
history = model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)

# Access training accuracy and loss for all epochs
training_accuracies = history.history['accuracy']
training_losses = history.history['loss']

# Calculate the median of the training accuracies
median_accuracy = np.median(training_accuracies)

# Calculate the standard deviation of the training losses
std_dev_loss = np.std(training_losses)

# Print results
print(f"Median Training Accuracy: {median_accuracy}")
print(f"Standard Deviation of Training Loss: {std_dev_loss}")

Epoch 1/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 302ms/step - accuracy: 0.7202 - loss: 0.5488 - val_accuracy: 0.6667 - val_loss: 0.5998
Epoch 2/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 292ms/step - accuracy: 0.7156 - loss: 0.5452 - val_accuracy: 0.6816 - val_loss: 0.6026
Epoch 3/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 293ms/step - accuracy: 0.6962 - loss: 0.5762 - val_accuracy: 0.6866 - val_loss: 0.5983
Epoch 4/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 295ms/step - accuracy: 0.7512 - loss: 0.5094 - val_accuracy: 0.6667 - val_loss: 0.6024
Epoch 5/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 303ms/step - accuracy: 0.7631 - loss: 0.4932 - val_accuracy: 0.6866 - val_loss: 0.5856
Epoch 6/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 294ms/step - accuracy: 0.7630 - loss: 0.4798 - val_accuracy: 0.6866 - val_loss: 0.5740
Epoch 7/10
[1m40/40[

#### Question 3 - Answer: 0.72

#### Question 4 - Answer: 0.028

### Data Augmentation

For the next two questions, we'll generate more data using data augmentations.

Add the following augmentations to your training data generator:
- rotation_range=50,
- width_shift_range=0.1,
- height_shift_range=0.1,
- zoom_range=0.1,
- horizontal_flip=True,
- fill_mode='nearest'

In [22]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Augmented training data generator
train_datagen = ImageDataGenerator(
    rescale=1./255,          # Normalize pixel values to [0, 1]
    rotation_range=50,       # Randomly rotate images by up to 50 degrees
    width_shift_range=0.1,   # Shift images horizontally by up to 10% of width
    height_shift_range=0.1,  # Shift images vertically by up to 10% of height
    zoom_range=0.1,          # Randomly zoom in/out by up to 10%
    horizontal_flip=True,    # Randomly flip images horizontally
    fill_mode='nearest'      # Fill empty pixels with the nearest value
)

# Test data generator (no augmentation, only rescaling)
test_datagen = ImageDataGenerator(rescale=1./255)

# Load train data with augmentations
train_generator = train_datagen.flow_from_directory(
    './data/train/',
    target_size=(200, 200),
    batch_size=20,
    class_mode='binary',
    shuffle=True
)

# Load test data without augmentations
test_generator = test_datagen.flow_from_directory(
    './data/test/',
    target_size=(200, 200),
    batch_size=20,
    class_mode='binary',
    shuffle=True
)


Found 800 images belonging to 2 classes.
Found 201 images belonging to 2 classes.


### Question 5

Let's train our model for 10 more epochs using the same code as previously.

    Note: make sure you don't re-create the model - we want to continue training the model we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?
- 0.26
- 0.56
- 0.86
- 1.16

### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10) for the model trained with augmentations?
- 0.31
- 0.51
- 0.71
- 0.91

In [25]:
# Save the model
model.save('model_with_augmentations.keras')

# Later, load the model
from tensorflow.keras.models import load_model
model = load_model('model_with_augmentations.keras')

In [42]:
# Continue training the same model
history_augmented = model.fit(
    train_generator,          # Augmented training data generator
    epochs=10,                # Additional epochs
    validation_data=test_generator  # Test data
)

# Access test loss for all epochs
test_losses = history_augmented.history['val_loss']

# Calculate the mean of test losses
mean_test_loss = np.mean(test_losses)

# Access test accuracy for all epochs
test_accuracies = history_augmented.history['val_accuracy']

# Calculate the average test accuracy for the last 5 epochs
average_test_accuracy_last_5 = np.mean(test_accuracies[-5:])

# Print the results
print(f"Mean Test Loss: {mean_test_loss}")
print(f"Average Test Accuracy (Last 5 Epochs): {average_test_accuracy_last_5}")

Epoch 1/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 414ms/step - accuracy: 0.6729 - loss: 0.6015 - val_accuracy: 0.7413 - val_loss: 0.5103
Epoch 2/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 409ms/step - accuracy: 0.6987 - loss: 0.5611 - val_accuracy: 0.7313 - val_loss: 0.5009
Epoch 3/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 405ms/step - accuracy: 0.7276 - loss: 0.5389 - val_accuracy: 0.7463 - val_loss: 0.5281
Epoch 4/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 415ms/step - accuracy: 0.7021 - loss: 0.5560 - val_accuracy: 0.7463 - val_loss: 0.5135
Epoch 5/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 414ms/step - accuracy: 0.7582 - loss: 0.5268 - val_accuracy: 0.7363 - val_loss: 0.5074
Epoch 6/10
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 410ms/step - accuracy: 0.7183 - loss: 0.5518 - val_accuracy: 0.7612 - val_loss: 0.4980
Epoch 7/10
[1m40/40[

#### Question 5 - Answer: 0.56

#### Question 6 - Answer: 0.71