# ILSVRC Pipeline Demo 
 
##### Get the new sorted dataset [here](https://tumde-my.sharepoint.com/:f:/g/personal/gohdennis_tum_de/EmooVZ4vE95Iic-HIP9-P10BzX7oIOBmRhK8Q9tYzfJWRQ?e=maOqo5) [08_Aug_2022]

Annotations are stored under notebooks/preprocesing/restructured_w_original_labels.json (also in the .zip file)

Extract the zip under data/.


<hr style="height:2px;border-width:0;color:black;background-color:black">

This notebook will show case the functioning of the EfficientNet pipeline.

In [9]:
import tensorflow as tf
import json
import os
import shutil
from pathlib import Path
from models.ilsvrc import EfficientNetV2S
from pipelines.ilsvrc import EfficientNetPipeline
from optimization.ilsvrc import ILSVRCOptunaSearch

## I. Load Data
To begin our showcase, we load the data from the directory, after setting it up as specified.

In [10]:
image_path = Path(os.getenv("DATA"), "sort")

train_ds = tf.keras.utils.image_dataset_from_directory(directory=image_path,
                                                       validation_split=0.3,
                                                       subset='training',
                                                       seed=0,
                                                       image_size=(224, 224))
val_ds = tf.keras.utils.image_dataset_from_directory(directory=image_path,
                                                     validation_split=0.3,
                                                     subset='validation',
                                                     seed=0,
                                                     image_size=(224, 224))

# get current working directory

Found 897 files belonging to 4 classes.
Using 628 files for training.
Found 897 files belonging to 4 classes.
Using 269 files for validation.


## II. Configuring the Pipeline

Before we run the pipeline, we need to configure the model and the pipeline hyperparameters. The model parameters are mostly self explanatory, pooling_type refers to the pooling layer between the final base model layer and the first top network layer.

The pipeline config feature the following notable parameters:
- epochs: max number of epochs. Early stopping might cut the training of.
- model_name: directory name, where the run is to be executed
- store_model: stores the best model iteration as a checkpoint
- patience: after how many epochs without improvement in val_loss the operation should stop
- callbacks: if needed additional callbacks can be passed as a list
- custom_objects: dictionary pointing to custom objectes for the compiler configuration
- save_weights_only: stores entire model if False

In [11]:
model_config = {
    "layer_size": {
        "type": "int",
        "low": 16,
        "high": 4096,
        "step": 16
    },
    "dropout": {
        "type": "float",
        "low": 0.0,
        "high": 1.0,
    },
    "pooling_type": {
        "type": "categorical",
        "choices": ["max", "avg"]
    },
    "depth": {
        "type": "int",
        "low": 0,
        "high": 6,
    }
}

pipeline_config = {
    "batch_size": {
        "type": "int",
        "low": 8,
        "high": 32,
        "step": 8
    },
    "epochs": 4,
    "model_name": "trial",
    "store_model": True,
    "save_weights_only": True,
    "patience": 2,
    "compiler_config": {
        "optimizer": "adam",
        "loss": "sparse_categorical_crossentropy",
        "metrics": ['accuracy']
    }
}


opt = ILSVRCOptunaSearch(name="trial",
                         n_trials=2,
                         model=EfficientNetV2S,
                         pipeline=EfficientNetPipeline,
                         model_config=model_config,
                         pipeline_config=pipeline_config)

## III. Fitting the Model
Next, we can continue by fitting the model to the data. This will create the model directory under Group04/models/trial, containing checkpoints and history. The fit method returns a json with history. The execution is fully resilient to crashes and as long as the models directory is intact, it will always remember its previous run. Try interupting and restarting the notebook to see what happens.

In [12]:
opt.run(train_ds, val_ds)

Exception: Invalid parent directory '//home/amadou/CodeWorkspace/Group04/models/trial/feature_extraction/mlflow/.trash'