# Objective

This notebook corresponds to the homework of the week 8 (Deep Learning) of the Machine Learning Zoomcamp (2023 cohort). The subject can be found here : https://github.com/DataTalksClub/machine-learning-zoomcamp/blob/master/cohorts/2023/08-deep-learning/homework.md .

The goal is to predict whether the insect on the image is a bee or a wasp with a Convolutional Neural Network (CNN) built from scratch and an image dataset having bees and wasps photos to train and test the model.

<img src="images/maxresdefault.jpg" style="display:block;float:none;margin-left:auto;margin-right:auto;width:100%">


# Data

The used dataset can be downloaded from this link : https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip
It corresponds to the "Bee or Wasp?" Kaggle dataset that was slightly rebuilt, as specified in the homework description.

To download it easily using a Saturn cloud notebook, use these commands:
```bash
wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip
unzip data.zip
```

This dataset contains two folders: train and test. Each of these two folders contains two subfolders: bee and wasp. These bee and wasp folders contain photos (.jpg format).

# Notebook

## Imports

In [37]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import load_img
from tensorflow.keras.preprocessing.image import ImageDataGenerator

import statistics
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [4]:
# Check tensorflow version
tf.__version__

'2.9.1'

## Functions

## Data preparation

### Model

Here is how the model should be built initially:

For this homework we will use Convolutional Neural Network (CNN). Like in the lectures, we'll use Keras.

You need to develop the model with following structure:

- The shape for input should be (150, 150, 3)
- Next, create a convolutional layer (Conv2D):
    - Use 32 filters
    - Kernel size should be (3, 3) (that's the size of the filter)
    - Use 'relu' as activation
- Reduce the size of the feature map with max pooling (MaxPooling2D)
    - Set the pooling size to (2, 2)
- Turn the multi-dimensional result into vectors using a Flatten layer
- Next, add a Dense layer with 64 neurons and 'relu' activation
- Finally, create the Dense layer with 1 neuron - this will be the output
    - The output layer should have an activation - use the appropriate activation for the binary classification case
- As optimizer use SGD with the following parameters:
    - SGD(lr=0.002, momentum=0.8)
    
    
    
For clarification about kernel size and max pooling, check Office Hours.

In [53]:
# Setup constants
INPUT_SHAPE=(150,150,3)
ACTIVATION="relu"
OUTPUT_ACTIVATION="sigmoid"
NUMBER_FILTERS=32
KERNEL_SIZE=(3,3)
POOLING_SIZE=(2,2)
DENSE_FIRST_NEURONS_NUMBER=64
DENSE_OUTPUT_NEURONS_NUMBER=1
SGD_LR = 0.002
SGD_MOMENTUM = 0.8

In [54]:
# Build the model
model = keras.Sequential(
    [
        layers.Conv2D(filters = NUMBER_FILTERS,
                      kernel_size=KERNEL_SIZE,
                      activation=ACTIVATION,
                      input_shape=INPUT_SHAPE),
        layers.MaxPooling2D(pool_size=POOLING_SIZE),
        layers.Flatten(),
        layers.Dense(units=DENSE_FIRST_NEURONS_NUMBER, 
                     activation=ACTIVATION),
        layers.Dense(units=DENSE_OUTPUT_NEURONS_NUMBER, 
                     activation=OUTPUT_ACTIVATION)
    ]
)

loss = keras.losses.BinaryCrossentropy()
opt = keras.optimizers.SGD(learning_rate=SGD_LR,
                           momentum = SGD_MOMENTUM)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

In [55]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 74, 74, 32)       0         
 2D)                                                             
                                                                 
 flatten_3 (Flatten)         (None, 175232)            0         
                                                                 
 dense_6 (Dense)             (None, 64)                11214912  
                                                                 
 dense_7 (Dense)             (None, 1)                 65        
                                                                 
Total params: 11,215,873
Trainable params: 11,215,873
Non-trainable params: 0
__________________________________________

The Conv2D layer has 896 parameters.

### Training and testing datasets

- We don't need to do any additional pre-processing for the images.
- When reading the data from train/test directories, check the class_mode parameter. Which value should it be for a binary classification problem?
- Use batch_size=20
- Use shuffle=True for both training and test sets.

In [56]:
# Data Generator 
datagen = ImageDataGenerator(rescale=1./255)

In [57]:
# Set target size (according to input shape)
TARGET_SIZE=(150,150)

In [58]:
train_ds = datagen.flow_from_directory(
    './data/train/',
    batch_size=20,
    shuffle=True,
    class_mode='binary',
    target_size=TARGET_SIZE
)

Found 3677 images belonging to 2 classes.


In [59]:
test_ds = datagen.flow_from_directory(
    './data/test/',
    batch_size=20,
    shuffle=True,
    class_mode='binary',
    target_size=TARGET_SIZE
)

Found 918 images belonging to 2 classes.


## Training the model

In [60]:
result = model.fit(
    train_ds,
    epochs=10,
    validation_data=test_ds
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [61]:
result.history

{'loss': [0.6862706542015076,
  0.6566957235336304,
  0.6195502877235413,
  0.5731316208839417,
  0.5408139228820801,
  0.5128502249717712,
  0.49793028831481934,
  0.48356375098228455,
  0.4619581699371338,
  0.44041910767555237],
 'accuracy': [0.5422899127006531,
  0.5901550054550171,
  0.6453630924224854,
  0.694316029548645,
  0.7277672290802002,
  0.7579548358917236,
  0.7715529203414917,
  0.7764481902122498,
  0.7963013052940369,
  0.8123469948768616],
 'val_loss': [0.6475895047187805,
  0.6191149353981018,
  0.6055774688720703,
  0.6480372548103333,
  0.5402923822402954,
  0.5368143916130066,
  0.5487772226333618,
  0.5405684113502502,
  0.5251412391662598,
  0.5441334247589111],
 'val_accuracy': [0.5936819314956665,
  0.6481481194496155,
  0.6612200140953064,
  0.584967315196991,
  0.727668821811676,
  0.757080614566803,
  0.7113289833068848,
  0.7309368252754211,
  0.757080614566803,
  0.741830050945282]}

In [62]:
# Median of training accuracy
statistics.median(result.history['accuracy'])

0.7428610324859619

In [63]:
# Standard deviation of training loss
statistics.stdev(result.history['loss'])

0.08406540282090642

## Data Augmentation

Add the following augmentations to your training data generator:

- rotation_range=50,
- width_shift_range=0.1,
- height_shift_range=0.1,
- zoom_range=0.1,
- horizontal_flip=True,
- fill_mode='nearest'

In [64]:
# Data Generator for data augmentation
datagen_augmentation = ImageDataGenerator(rescale=1./255,
                            rotation_range=50,
                            width_shift_range=0.1,
                            height_shift_range=0.1,
                            zoom_range=0.1,
                            horizontal_flip=True,
                            fill_mode='nearest')

In [65]:
train_augmentation = datagen_augmentation.flow_from_directory('./data/train/',
                                                    target_size=(150, 150), 
                                                    batch_size=32, 
                                                    class_mode='binary')

Found 3677 images belonging to 2 classes.


In [66]:
result_augmentation = model.fit(
    train_augmentation,
    epochs=10,
    validation_data=test_ds
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [67]:
result_augmentation.history

{'loss': [0.505081057548523,
  0.4947676658630371,
  0.4872701168060303,
  0.49634361267089844,
  0.49021443724632263,
  0.48182985186576843,
  0.46772652864456177,
  0.46757882833480835,
  0.4648420512676239,
  0.458371102809906],
 'accuracy': [0.7593146562576294,
  0.7778080105781555,
  0.7753603458404541,
  0.7680174112319946,
  0.7707369923591614,
  0.7742725014686584,
  0.7840631008148193,
  0.7829752564430237,
  0.7922219038009644,
  0.788414478302002],
 'val_loss': [0.5197620987892151,
  0.503851592540741,
  0.48665040731430054,
  0.5040733814239502,
  0.499086856842041,
  0.4831225574016571,
  0.5363031625747681,
  0.4673035442829132,
  0.5069840550422668,
  0.4916398525238037],
 'val_accuracy': [0.7483659982681274,
  0.7679738402366638,
  0.7832244038581848,
  0.7494553327560425,
  0.7657952308654785,
  0.7690631747245789,
  0.7385621070861816,
  0.7788671255111694,
  0.7472766637802124,
  0.7657952308654785]}

In [68]:
# Mean of test loss
statistics.mean(result_augmentation.history['val_loss'])

0.4998777508735657

In [69]:
# Average of test accuracy for the last 5 epochs
statistics.mean(result_augmentation.history['val_accuracy'][-5:])

0.7599128603935241