# Convolutional Neural Network
### Importing the libraries

In [7]:
import os
import pandas as pd

import tensorflow as tf
from keras.preprocessing.image import ImageDataGenerator

In [8]:
tf.__version__

'2.14.0'

## Part 0 - Prepare the dataset
In this section, we will reorganize our dataset. 

Thus, we will move the images contained directly in the test, train, and valid folders into subfolders (hostile, passive) based on the nature of the creatures present in the respective image. 

The nature of each creature present in the image can be obtained from the _annotations.csv files present in each folder of the dataset

In [5]:
def sort_images_dataset(path_folder):
    """
    Sorts images in a dataset based on their class, moving them to separate folders.
    If the image represents a passive creature, it will be moved to the 'passive' subfolder, otherwise, it will be moved to the 'hostile' subfolder.

    Parameters:
        - path_folder (str): Path to the folder containing the images and _annotations.csv file.

    """
    PASSIVE_CLASS_LIST = ['chicken', 'cow', 'pig', 'sheep', 'bee', 'fox', 'frog', 'goat', 'llama', 'turtle', 'wolf']
    annotations_path = os.path.join(path_folder, '_annotations.csv')
    annotations_data = pd.read_csv(annotations_path)
    
    for filename, classe in annotations_data[['filename', 'class']].values:

        img_path = os.path.join(path_folder, filename)
        if os.path.exists(img_path):

            category = "hostile" if classe in PASSIVE_CLASS_LIST else "passive"
            category_dir = os.path.join(path_folder, category)

            if not os.path.exists(category_dir):
                os.makedirs(category_dir)
            
            new_img_path = os.path.join(category_dir, filename)
            os.rename(img_path, new_img_path)
    


In [6]:
path_dataset_to_sort = ['./data/train', './data/test', './data/valid']
for path in path_dataset_to_sort:
    sort_images_dataset(path)

## Part 1 - Data Preprocessing

> How are we going to preprocess our images? 

Actually, we are going to do multiple things. We will apply various transformations to all images in the training set. However, we will not apply the same transformations to the test set. The reason is to prevent overfitting. If we fail to apply these transformations properly during training on the training set, we will observe a significant difference in accuracy between the training and test sets. The accuracy on the training set will be very high, whereas it will be much lower on the test set.

In the realm of computer vision, the key to avoiding overfitting is to apply certain transformations.

> What do these transformations consist in ? 

They encompass simple geometrical transformations, zooms, rotations on our images, and more. This process is called image augmentation. The aim is to prevent the CNN from overlearning on the existing images. By applying these transformations, we generate new images, thereby increasing the variety and diversity of our dataset.



### Preprocessing the training set

## 

In [10]:
train_datagen = ImageDataGenerator(
                    rescale=1./255, # Normalisation / Feature scaling
                    shear_range=0.2,
                    zoom_range=0.2,
                    horizontal_flip=True
                )

In [11]:
training_set = train_datagen.flow_from_directory(
                        './data/train',
                        target_size=(64, 64), # final size of images
                        batch_size=32,
                        class_mode='binary'
                    )

Found 2307 images belonging to 2 classes.


### Preprocessing the test set

In [12]:
test_datagen = ImageDataGenerator(rescale=1./255)

In [13]:
test_set = test_datagen.flow_from_directory(
                './data/test',
                target_size=(64, 64),
                batch_size=32,
                class_mode='binary'
            )

Found 155 images belonging to 2 classes.
