### Dataset

In this homework, we'll build a model for classifying various hair types. 
For this, we will use the Hair Type dataset that was obtained from 
[Kaggle](https://www.kaggle.com/datasets/kavyasreeb/hair-type-dataset) 
and slightly rebuilt. 

You can download the target dataset for this homework from 
[here](https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip):

```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
unzip data.zip
```

In [21]:
# !wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/straight-curly-data/data.zip
# !unzip data.zip

### Data Preparation

The dataset contains around 1000 images of hairs in the separate folders 
for training and test sets. 

### Reproducibility

Reproducibility in deep learning is a multifaceted challenge that requires attention 
to both software and hardware details. In some cases, we can't guarantee exactly 
the same results during the same experiment runs. Therefore, in this homework we suggest to:
* install tensorflow version 2.17.1
* set the seed generators by:

```python
import numpy as np
import tensorflow as tf

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)
```

In [22]:
import numpy as np
import tensorflow as tf

SEED = 42
np.random.seed(SEED)
tf.random.set_seed(SEED)

In [23]:
from tensorflow import keras


### Model

For this homework we will use Convolutional Neural Network (CNN). Like in the lectures, we'll use Keras.

You need to develop the model with following structure:

* The shape for input should be `(200, 200, 3)`
* Next, create a convolutional layer ([`Conv2D`](https://keras.io/api/layers/convolution_layers/convolution2d/)):
    * Use 32 filters
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation 
* Reduce the size of the feature map with max pooling ([`MaxPooling2D`](https://keras.io/api/layers/pooling_layers/max_pooling2d/))
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using a [`Flatten`](https://keras.io/api/layers/reshaping_layers/flatten/) layer
* Next, add a `Dense` layer with 64 neurons and `'relu'` activation
* Finally, create the `Dense` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use [`SGD`](https://keras.io/api/optimizers/sgd/) with the following parameters:

* `SGD(lr=0.002, momentum=0.8)`

For clarification about kernel size and max pooling, check [Office Hours](https://www.youtube.com/watch?v=1WRgdBTUaAc).

In [24]:
inputs = keras.Input(shape=(200, 200, 3))
x = keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu')(inputs)
x = keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(64, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs, outputs)
optimizer = keras.optimizers.SGD(momentum=0.8, learning_rate=0.002)

### Question 1

Since we have a binary classification problem, what is the best loss function for us?

* `mean squared error`
* **`binary crossentropy`**
* `categorical crossentropy`
* `cosine similarity`

> **Note:** since we specify an activation for the output layer, we don't need to set `from_logits=True`

In [25]:
loss = keras.losses.BinaryCrossentropy()

### Question 2

What's the total number of parameters of the model? You can use the `summary` method for that. 

* 896 
* 11214912
* 15896912
* **20072512**

In [26]:
model.compile(optimizer=optimizer,
              loss=loss,
              metrics=['accuracy'])

In [27]:
model.summary()

### Generators and Training

For the next two questions, use the following data generator for both train and test sets:

```python
ImageDataGenerator(rescale=1./255)
```

* We don't need to do any additional pre-processing for the images.
* When reading the data from train/test directories, check the `class_mode` parameter. Which value should it be for a binary classification problem?
* Use `batch_size=20`
* Use `shuffle=True` for both training and test sets. 

For training use `.fit()` with the following params:

```python
model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)
```

In [28]:
# The data has webp images, which are not compatable with TensorFlow. This code will convert these files to jpgs

from pathlib import Path
import imghdr
from PIL import Image

data_dir = "./data/test/straight"
image_extensions = [".png", ".jpg"]  # add there all your images file extensions

img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
    if filepath.suffix.lower() in image_extensions:
        img_type = imghdr.what(filepath)
        if img_type is None:
            print(f"{filepath} is not an image")
        elif img_type not in img_type_accepted_by_tf:
            print(f"{filepath} is a {img_type}, not accepted by TensorFlow, converting to jpg")
            im = Image.open(filepath).convert("RGB")
            im.save(filepath, "jpeg")

In [29]:
train_ds = tf.keras.utils.image_dataset_from_directory(
    './data/train',
    image_size=(200, 200),
    batch_size=20,
    label_mode='binary',
    shuffle=True
)

# Normalize pixel values to [0, 1] range
normalization_layer = tf.keras.layers.Rescaling(1./255)
train_ds_normalized = train_ds.map(lambda x, y: (normalization_layer(x), y))


val_ds = tf.keras.utils.image_dataset_from_directory(
    './data/test',
    image_size=(200, 200),
    batch_size=20,
    label_mode='binary',
    shuffle=True
)

# Normalize pixel values to [0, 1] range
val_ds_normalized = val_ds.map(lambda x, y: (normalization_layer(x), y))

Found 801 files belonging to 2 classes.
Found 201 files belonging to 2 classes.


In [30]:
history = model.fit(
    train_ds_normalized,
    epochs=10,
    validation_data=val_ds_normalized
)

Epoch 1/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 86ms/step - accuracy: 0.4697 - loss: 0.7195 - val_accuracy: 0.5373 - val_loss: 0.6799
Epoch 2/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 79ms/step - accuracy: 0.6176 - loss: 0.6505 - val_accuracy: 0.6219 - val_loss: 0.6355
Epoch 3/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 78ms/step - accuracy: 0.6875 - loss: 0.6127 - val_accuracy: 0.6119 - val_loss: 0.6405
Epoch 4/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 75ms/step - accuracy: 0.6933 - loss: 0.5810 - val_accuracy: 0.6418 - val_loss: 0.6199
Epoch 5/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 79ms/step - accuracy: 0.6983 - loss: 0.5670 - val_accuracy: 0.6418 - val_loss: 0.6170
Epoch 6/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 76ms/step - accuracy: 0.7089 - loss: 0.5461 - val_accuracy: 0.6418 - val_loss: 0.6333
Epoch 7/10
[1m41/41[0m [32m━━━━


### Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.10
* 0.32
* 0.50
* **0.72**

In [31]:
np.median(history.history['accuracy'])

0.693508118391037


### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.028
* **0.068**
* 0.128
* 0.168

In [32]:
np.std(history.history['loss'])

0.0508859467802245



### Data Augmentation

For the next two questions, we'll generate more data using data augmentations. 

Add the following augmentations to your training data generator:

* `rotation_range=50,`
* `width_shift_range=0.1,`
* `height_shift_range=0.1,`
* `zoom_range=0.1,`
* `horizontal_flip=True,`
* `fill_mode='nearest'`

In [33]:
# Define augmentation layers
data_augmentation = tf.keras.Sequential([
    keras.layers.Rescaling(1.0 / 255),               # Normalize pixel values to [0, 1]
    keras.layers.RandomRotation(50 / 360, fill_mode='nearest'),          # Convert degrees to fraction (50° = 50/360)
    keras.layers.RandomTranslation(height_factor=0.1, width_factor=0.1, fill_mode='nearest'),
    keras.layers.RandomZoom(0.1, fill_mode='nearest'),
    keras.layers.RandomFlip("horizontal"),
])

# Apply augmentation during training
train_ds_augmented = train_ds.map(lambda x, y: (data_augmentation(x, training=True), y))


### Question 5 

Let's train our model for 10 more epochs using the same code as previously.
> **Note:** make sure you don't re-create the model - we want to continue training the model
we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.26
* **0.56**
* 0.86
* 1.16

In [34]:
history = model.fit(
    train_ds_augmented,
    epochs=10,
    validation_data=val_ds_normalized
)

Epoch 1/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 97ms/step - accuracy: 0.6395 - loss: 0.6338 - val_accuracy: 0.6766 - val_loss: 0.5913
Epoch 2/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 99ms/step - accuracy: 0.6369 - loss: 0.6334 - val_accuracy: 0.6567 - val_loss: 0.6076
Epoch 3/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 92ms/step - accuracy: 0.6400 - loss: 0.6172 - val_accuracy: 0.6816 - val_loss: 0.5884
Epoch 4/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 92ms/step - accuracy: 0.6608 - loss: 0.6239 - val_accuracy: 0.6517 - val_loss: 0.6738
Epoch 5/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 93ms/step - accuracy: 0.6154 - loss: 0.6659 - val_accuracy: 0.6368 - val_loss: 0.5958
Epoch 6/10
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 92ms/step - accuracy: 0.6353 - loss: 0.6221 - val_accuracy: 0.6318 - val_loss: 0.5992
Epoch 7/10
[1m41/41[0m [32m━━━━

In [35]:
np.mean(history.history['loss'])

0.6200315117835998

### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.31
* 0.51
* **0.71**
* 0.91

In [36]:
np.mean(history.history['accuracy'][-5:])

0.6601747751235962