## Homework

> **Note**: sometimes your answer doesn't match one of 
> the options exactly. That's fine. 
> Select the option that's closest to your solution.

### Dataset

In this homework, we'll build a model for predicting if we have an image of a bee or a wasp. 
For this, we will use the "Bee or Wasp?" dataset that was obtained from [Kaggle](https://www.kaggle.com/datasets/jerzydziewierz/bee-vs-wasp) and slightly rebuilt. 

You can download the dataset for this homework from [here](https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip):

```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip
unzip data.zip
```

In the lectures we saw how to use a pre-trained neural network. In the homework, we'll train a much smaller model from scratch. 

> **Note:** you will need an environment with a GPU for this homework. We recommend to use [Saturn Cloud](https://bit.ly/saturn-mlzoomcamp). 
> You can also use a computer without a GPU (e.g. your laptop), but it will be slower.


### Data Preparation

The dataset contains around 2500 images of bees and around 2100 images of wasps. 

The dataset contains separate folders for training and test sets. 


### Model

For this homework we will use Convolutional Neural Network (CNN). Like in the lectures, we'll use Keras.

You need to develop the model with following structure:

* The shape for input should be `(150, 150, 3)`
* Next, create a convolutional layer ([`Conv2D`](https://keras.io/api/layers/convolution_layers/convolution2d/)):
    * Use 32 filters
    * Kernel size should be `(3, 3)` (that's the size of the filter)
    * Use `'relu'` as activation 
* Reduce the size of the feature map with max pooling ([`MaxPooling2D`](https://keras.io/api/layers/pooling_layers/max_pooling2d/))
    * Set the pooling size to `(2, 2)`
* Turn the multi-dimensional result into vectors using a [`Flatten`](https://keras.io/api/layers/reshaping_layers/flatten/) layer
* Next, add a `Dense` layer with 64 neurons and `'relu'` activation
* Finally, create the `Dense` layer with 1 neuron - this will be the output
    * The output layer should have an activation - use the appropriate activation for the binary classification case

As optimizer use [`SGD`](https://keras.io/api/optimizers/sgd/) with the following parameters:

* `SGD(lr=0.002, momentum=0.8)`

For clarification about kernel size and max pooling, check [Office Hours](https://www.youtube.com/watch?v=1WRgdBTUaAc).


### Question 1

Since we have a binary classification problem, what is the best loss function for us?

* `mean squared error`
* `binary crossentropy`
* `categorical crossentropy`
* `cosine similarity`

> **Note:** since we specify an activation for the output layer, we don't need to set `from_logits=True`

In [1]:
!pip install scipy 
# restart kernel afterwards

Collecting scipy
  Obtaining dependency information for scipy from https://files.pythonhosted.org/packages/6b/d4/d62ce38ba00dc67d7ec4ec5cc19d36958d8ed70e63778715ad626bcbc796/scipy-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
  Downloading scipy-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.4/60.4 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Downloading scipy-1.11.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m36.4/36.4 MB[0m [31m17.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: scipy
Successfully installed scipy-1.11.4
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, ru

In [2]:
%%bash 
pushd /workspace
echo "PWD: "$PWD
path_zip="data.zip"
url_zip="https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip"

if [ ! -f "$path_zip" ]; then
    # use curl for downloading
    echo "File not found! Downloading it"
    curl -LJO -o $path_zip $url_zip
    unzip -q $path_zip
else
    echo "File already exists. Skipping download."
fi

/workspace /
PWD: /workspace
File already exists. Skipping download.


In [3]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.losses import binary_crossentropy

#https://www.tensorflow.org/versions/r2.8/api_docs/python/tf/config/experimental/enable_op_determinism
seed = 1234
tf.keras.utils.set_random_seed(seed)
tf.config.experimental.enable_op_determinism()

tf.test.is_built_with_gpu_support(), tf.config.list_physical_devices('GPU'), tf.test.gpu_device_name()

2023-11-20 06:38:10.636038: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-20 06:38:11.136045: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-20 06:38:11.136091: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-20 06:38:11.137853: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-20 06:38:11.329147: I tensorflow/core/platform/cpu_feature_g

(True,
 [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')],
 '/device:GPU:0')

In [4]:
!nvidia-smi --list-gpus

GPU 0: NVIDIA GeForce RTX 4050 Laptop GPU (UUID: GPU-32a8e098-cd49-63b9-94ae-5b7a033749f2)


In [5]:
def simple_model(seed, shape=(150, 150, 3)):
    tf.keras.backend.clear_session()
    tf.keras.utils.set_random_seed(seed)
    #tf.config.experimental.enable_op_determinism()
    model = tf.keras.Sequential([
        tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=shape),
        tf.keras.layers.MaxPooling2D(2, 2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ])
    return model


### Answer 1

binary crossentropy

---

### Question 2

What's the number of parameters in the convolutional layer of our model? You can use the `summary` method for that. 

* 1 
* 65
* 896
* 11214912


In [6]:
model = simple_model(seed)
model.summary()

2023-11-20 06:38:18.213061: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-20 06:38:18.213191: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-20 06:38:18.213210: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-20 06:38:18.213911: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:880] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-11-20 06:38:18.213936: I tensorflow/core/co

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d (MaxPooling2  (None, 74, 74, 32)        0         
 D)                                                              
                                                                 
 flatten (Flatten)           (None, 175232)            0         
                                                                 
 dense (Dense)               (None, 64)                11214912  
                                                                 
 dense_1 (Dense)             (None, 1)                 65        
                                                                 
Total params: 11215873 (42.79 MB)
Trainable params: 11215873 (42.79 MB)
Non-trainable params: 0 (0.00 Byte)
______________

### Answer 2

896



### Generators and Training

For the next two questions, use the following data generator for both train and test sets:

```python
ImageDataGenerator(rescale=1./255)
```

* We don't need to do any additional pre-processing for the images.
* When reading the data from train/test directories, check the `class_mode` parameter. Which value should it be for a binary classification problem?
* Use `batch_size=20`
* Use `shuffle=True` for both training and test sets. 

For training use `.fit()` with the following params:

```python
model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)
```

---

### Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.20
* 0.40
* 0.60
* 0.80


In [7]:
image_height = 150
image_width = 150
batch_size = 20
train_dir = '/workspace/data/train/'
test_dir = '/workspace/data/test/'



def get_train_val_ds(
    train_dir,
    val_dir,
    image_height,
    image_width,
    batch_size,
    seed,
):
    tf.keras.utils.set_random_seed(seed)
    train_gen = ImageDataGenerator(rescale=1./255)
    val_gen = ImageDataGenerator(rescale=1./255)
    train_ds = train_gen.flow_from_directory(
        train_dir,
        # seed=seed,
        target_size=(image_height, image_width),
        batch_size=batch_size,
        class_mode='binary',
        shuffle=True,
        )
    val_ds = val_gen.flow_from_directory(
        test_dir,
        # seed=seed,
        target_size=(image_height, image_width),
        batch_size=batch_size,
        class_mode='binary',
        shuffle=True,
        )
    # print(train_ds.class_indices)
    return train_ds, val_ds

In [8]:
import multiprocessing
cpu_count = multiprocessing.cpu_count()
workers = min(1, cpu_count - 1)
workers = max(workers, 12)

In [9]:
optimizer = SGD(learning_rate=0.002, momentum=0.8)
loss = binary_crossentropy
metrics = ['accuracy']
epochs = 10

In [10]:
tf.keras.utils.set_random_seed(seed)
train_ds, val_ds = get_train_val_ds(
    train_dir=train_dir,
    val_dir=test_dir,
    batch_size=batch_size,
    image_height=image_height, 
    image_width=image_width,
    seed=seed,
)
model = simple_model(seed)
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
history = model.fit(
    train_ds, 
    epochs=epochs, 
    validation_data=val_ds, 
    workers=workers, 
    )

Found 3677 images belonging to 2 classes.
Found 918 images belonging to 2 classes.
Epoch 1/10


2023-11-20 06:38:23.266711: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
2023-11-20 06:38:25.220073: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fdf64353790 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-11-20 06:38:25.220181: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 4050 Laptop GPU, Compute Capability 8.9
2023-11-20 06:38:25.347724: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [11]:
np.round(np.median(history.history['accuracy']), 2)

0.77

### Answer

0.80

---

### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.031
* 0.061
* 0.091
* 0.131


In [24]:
np.round(np.std(history.history['loss']), 3)

0.014

### Answer 4

0.031

---

### Data Augmentation

For the next two questions, we'll generate more data using data augmentations. 

Add the following augmentations to your training data generator:

* `rotation_range=50,`
* `width_shift_range=0.1,`
* `height_shift_range=0.1,`
* `zoom_range=0.1,`
* `horizontal_flip=True,`
* `fill_mode='nearest'`

In [13]:
def get_train_val_ds(
    train_dir,
    val_dir,
    image_height,
    image_width,
    batch_size,
    seed,
):
    tf.keras.utils.set_random_seed(seed)
    train_gen = ImageDataGenerator(
        rescale=1./255,
        rotation_range=50,
        width_shift_range=0.1,
        height_shift_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True,
        fill_mode='nearest',
        )
    val_gen = ImageDataGenerator(rescale=1./255)
    train_ds = train_gen.flow_from_directory(
        train_dir,
        # seed=seed,
        target_size=(image_height, image_width),
        batch_size=batch_size,
        class_mode='binary',
        shuffle=True,
        )
    val_ds = val_gen.flow_from_directory(
        test_dir,
        # seed=seed,
        target_size=(image_height, image_width),
        batch_size=batch_size,
        class_mode='binary',
        shuffle=True,
        )
    # print(train_ds.class_indices)
    return train_ds, val_ds

---

### Question 5 

Let's train our model for 10 more epochs using the same code as previously.
> **Note:** make sure you don't re-create the model - we want to continue training the model
we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.18
* 0.48
* 0.78
* 0.108

In [14]:
tf.keras.utils.set_random_seed(seed)
train_ds, val_ds = get_train_val_ds(
    train_dir=train_dir,
    val_dir=test_dir,
    batch_size=batch_size,
    image_height=image_height, 
    image_width=image_width,
    seed=seed,
)

history = model.fit(
    train_ds, 
    epochs=10, 
    validation_data=val_ds, 
    workers=workers, 
    )

Found 3677 images belonging to 2 classes.
Found 918 images belonging to 2 classes.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [25]:
np.round(np.mean(history.history['val_loss']), 2)

0.5

### Answer 5

0.48

---

### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.38
* 0.58
* 0.78
* 0.98

In [22]:
history.history['val_accuracy'][-5:]

[0.7854030728340149,
 0.7679738402366638,
 0.7821350693702698,
 0.8028322458267212,
 0.7516340017318726]

In [21]:
np.round(np.mean(history.history['val_accuracy'][-5:]), 2)

0.78

### Answer 6

0.78

---

## Submit the results

- Submit your results here: https://forms.gle/5sjtM3kzY9TmLmU17
- If your answer doesn't match options exactly, select the closest one
- You can submit your solution multiple times. In this case, only the last submission will be used


## Deadline

The deadline for submitting is November 20 (Monday), 23:00 CEST. After that the form will be closed.