# ILSVRC Pipeline Demo 
 
##### Get the new sorted dataset [here](https://tumde-my.sharepoint.com/:f:/g/personal/gohdennis_tum_de/EmooVZ4vE95Iic-HIP9-P10BzX7oIOBmRhK8Q9tYzfJWRQ?e=maOqo5) [08_Aug_2022]

Annotations are stored under notebooks/preprocesing/restructured_w_original_labels.json (also in the .zip file)

Extract the zip under data/.


<hr style="height:2px;border-width:0;color:black;background-color:black">

This notebook will show case the functioning of the EfficientNet pipeline.

In [1]:
import tensorflow as tf
import json
import os
import shutil
from pathlib import Path
from models.ilsvrc import EfficientNetV2S
from pipelines.ilsvrc import EfficientNetPipeline

2022-08-23 17:10:11.536639: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## I. Load Data
To begin our showcase, we load the data from the directory, after setting it up as specified.

In [2]:
image_path = Path(os.getenv("DATA"), "sort")

train_ds = tf.keras.utils.image_dataset_from_directory(directory=image_path,
                                                       validation_split=0.3,
                                                       subset='training',
                                                       seed=0,
                                                       image_size=(224, 224))
val_ds = tf.keras.utils.image_dataset_from_directory(directory=image_path,
                                                     validation_split=0.3,
                                                     subset='validation',
                                                     seed=0,
                                                     image_size=(224, 224))

# get current working directory

json_file = Path("resources", "restructured_w_original_labels.json")
json_target = Path(os.getenv("DATA"), json_file.name)
shutil.copy(str(json_file), str(json_target))
with json_target.open() as f:
    data = json.load(f)


Found 897 files belonging to 4 classes.
Using 628 files for training.


2022-08-23 17:10:20.787471: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-23 17:10:20.926351: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-23 17:10:20.926737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:975] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-23 17:10:20.931878: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the approp

Found 897 files belonging to 4 classes.
Using 269 files for validation.


## II. Configuring the Pipeline

Before we run the pipeline, we need to configure the model and the pipeline hyperparameters. The model parameters are mostly self explanatory, pooling_type refers to the pooling layer between the final base model layer and the first top network layer.

The pipeline config feature the following notable parameters:
- epochs: max number of epochs. Early stopping might cut the training of.
- model_name: directory name, where the run is to be executed
- store_model: stores the best model iteration as a checkpoint
- patience: after how many epochs without improvement in val_loss the operation should stop
- callbacks: if needed additional callbacks can be passed as a list
- custom_objects: dictionary pointing to custom objectes for the compiler configuration
- save_weights_only: stores entire model if False

In [3]:
model_config = {"layer_size": (128, 32), "dropout": 0.1, "pooling_type": "max"}

pipeline_config = {
    "batch_size": 32,
    "epochs": 40,
    "model_name": "trial",
    "store_model": True,
    "patience": 10,
    "compiler_config": {
        "optimizer": "adam",
        "loss": "sparse_categorical_crossentropy",
        "metrics": ['accuracy']
    }
}

model = EfficientNetV2S(**model_config)
pipeline = EfficientNetPipeline(model, **pipeline_config)

## III. Fitting the Model
Next, we can continue by fitting the model to the data. This will create the model directory under Group04/models/trial, containing checkpoints and history. The fit method returns a json with history. The execution is fully resilient to crashes and as long as the models directory is intact, it will always remember its previous run. Try interupting and restarting the notebook to see what happens.

In [4]:
history = pipeline.fit(train_ds, val_ds)
history

Training model.
Epoch 1/40


2022-08-23 17:10:59.673404: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500


Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Training complete.


<keras.callbacks.History at 0x7fe68a2dd3d0>

# IV. Operate the Model
After having fitted the model, you can continue by scoring and predicting different data. Scores are saved to the history.json.

In [5]:
score = pipeline.score(val_ds)
pred = pipeline.predict(val_ds)
pred_class = pipeline.predict_class(val_ds)

