# Modelling and Evaluation Notebook

## Objectives

* Rebuild data generators with batch size = 32 (same augmentation as Step 1 for training; rescale only for validation/test).
* Load the top 5 scenarios from Step 1 and prepare them for retraining.
* Create the CNN model for each selected scenario.
* Fit each model (EarlyStopping + best‐model checkpoint).
* Evaluate each run on the validation set and collect: loss, accuracy, precision, recall, F1, and a confusion matrix.
* Build a simple leaderboard to compare runs and pick the top candidates for the next step.
  

## Inputs

* inputs/mildew-dataset/cherry-leaves/train
* inputs/mildew-dataset/cherry-leaves/test
* inputs/mildew-dataset/cherry-leaves/validation
* image shape embeddings
* Step-1 leaderboard (outputs/step_1/reports/grid_report_bs16.csv) to select the top 5 configurations

## Outputs
* Best-epoch model files per scenario.
* Training histories.
* Learning-curve plots (accuracy/loss).
* Validation metrics per run. (loss, accuracy, precision, recall, F1, confusion matrix)
* Consolidated leaderboard for Step 2. (sorted by validation accuracy, tie-break by validation loss)
* Shortlist (top 3) ready for test-set evaluation in the final step

## Additional Comments | Insights | Conclusions

---

## Import Libraries

In [28]:

import os, math, json, itertools, time, joblib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.image import imread
from pathlib import Path

# TensorFlow and Keras:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, BatchNormalization
from tensorflow.keras.optimizers import Adam, Adamax
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

# Sklearn:
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, f1_score, roc_auc_score, confusion_matrix

---

## Set Seed

In [29]:
SEED = 27
np.random.seed(SEED)
tf.random.set_seed(SEED)

---

## Define Main Variables

In [30]:
version = 'step_2'

# Set batch size
batch_size = 32

# Set number of epochs
epochs = 25


---

### Create Run Tag Hashes 

In [31]:
def run_tag(cfg: dict, version: str, batch_size: int, seed: int | None = None, include_seed: bool = True) -> str:
    """
    Build a readable tag like:
      step_1_bs16_k3_do0.5_act-relu_opt-adam_seed27
    """
    parts = [
        version,
        f"bs{batch_size}",
        f"k{cfg['kernel_size']}",
        f"do{cfg['dropout']}",
        f"act-{cfg['activation']}",
        f"opt-{cfg['optimizer']}",
    ]
    if include_seed and seed is not None:
        parts.append(f"seed{seed}")
    return "_".join(parts)

---

# Set Directories

  ## Set Working Directory

In [32]:
# Parent directory
parent_dir =  "/Users/marcelldemeter/GIT/CodeInstitute/ci-p5-mildew-detector"

# Change working directory to parent directory
os.chdir(parent_dir)
print (f"New working directory: {os.getcwd()} ")

New working directory: /Users/marcelldemeter/GIT/CodeInstitute/ci-p5-mildew-detector 


## Set Input Directory

In [33]:
dataset_dir = "inputs/mildew-dataset/cherry-leaves"
train_dir = os.path.join(parent_dir, dataset_dir, "train")
validation_dir = os.path.join(parent_dir, dataset_dir, "validation")
test_dir = os.path.join(parent_dir, dataset_dir, "test")

## Set Output Directory

In [34]:

file_path = f'outputs/{version}'

# Create main output directory if it doesn’t exist
if 'outputs' in os.listdir(parent_dir) and version in os.listdir(parent_dir + '/outputs'):
    print('Old version is already available create a new version.')
    pass
else:
    os.makedirs(name=file_path)

# Define subfolders and ensure they exist:
models_dir = os.path.join(file_path, 'models')
reports_dir = os.path.join(file_path, 'reports')

os.makedirs(models_dir, exist_ok=True)
os.makedirs(reports_dir, exist_ok=True)


Old version is already available create a new version.


### Get Artifact Paths:

In [35]:
def get_artifact_paths(models_dir: str, reports_dir: str, tag: str) -> dict:
    """
    Return file paths (no new dirs) for this run's artifacts, using the tag in filenames.
    """
    return {
        "model_path":   Path(models_dir)  / f"{tag}.keras",
        "history_pkl":  Path(reports_dir) / f"history_{tag}.pkl",
        "evaluation_pkl": Path(reports_dir) / f"eval_{tag}.pkl",
        "curve_png":    Path(reports_dir) / f"curves_{tag}.png",
    }

---

## Load  Top 5 Scenarios from Batch 16 Run:

In [36]:
prev_version = "step_1"  # adjust to your step_1 folder name
df_prev = pd.read_csv(f"outputs/{prev_version}/reports/grid_report_bs16.csv")

# same ranking rule as before
df_prev = df_prev.sort_values(by=["val_accuracy", "val_loss"], ascending=[False, True]).reset_index(drop=True)

TOP_N = 5 

## Set Scenarios:

In [37]:
SCENARIOS = df_prev.head(TOP_N)[["dropout","kernel_size","activation","optimizer"]].to_dict(orient="records")
print(len(SCENARIOS), SCENARIOS[:2])

5 [{'dropout': 0.3, 'kernel_size': 5, 'activation': 'relu', 'optimizer': 'adamax'}, {'dropout': 0.3, 'kernel_size': 3, 'activation': 'elu', 'optimizer': 'adamax'}]


---

## Set Labels

In [38]:
# Set the labels
labels = os.listdir(train_dir)
print('Label for the images are', labels)

Label for the images are ['powdery_mildew', 'healthy']


## Set image shape

In [39]:
## Import saved image shape embedding
import joblib
image_shape = joblib.load(filename=f"outputs/v1/image_shape.pkl") # Set from previous run
image_shape

(256, 256, 3)

---

# Image Data Augmentation

### Image Data Generator

In [40]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

### Initialize ImageDataGenerator

In [41]:
augmented_image_data = ImageDataGenerator(rotation_range=20,
                                          width_shift_range=0.10,
                                          height_shift_range=0.10,
                                          shear_range=0.1,
                                          zoom_range=0.1,
                                          horizontal_flip=True,
                                          vertical_flip=True,
                                          fill_mode='nearest',
                                          rescale=1./255
                                          )


### Augment image datasets

In [42]:
# Train Set:
train_set = augmented_image_data.flow_from_directory(train_dir,
                                                     target_size=image_shape[:2],
                                                     color_mode='rgb',
                                                     batch_size=batch_size,
                                                     class_mode='binary',
                                                     shuffle=True
                                                     )

# Validation Set:
validation_set = ImageDataGenerator(rescale=1./255).flow_from_directory(validation_dir,
                                                                        target_size=image_shape[:2],
                                                                        color_mode='rgb',
                                                                        batch_size=batch_size,
                                                                        class_mode='binary',
                                                                        shuffle=False
                                                                        )

# Test Set:
test_set = ImageDataGenerator(rescale=1./255).flow_from_directory(test_dir,
                                                                  target_size=image_shape[:2],
                                                                  color_mode='rgb',
                                                                  batch_size=batch_size,
                                                                  class_mode='binary',
                                                                  shuffle=False
                                                                  )


train_set.class_indices
validation_set.class_indices
test_set.class_indices

Found 2944 images belonging to 2 classes.
Found 420 images belonging to 2 classes.
Found 844 images belonging to 2 classes.


{'healthy': 0, 'powdery_mildew': 1}

### Save class_indices

In [43]:
joblib.dump(value=train_set.class_indices,
            filename=f"{file_path}/class_indices.pkl")

['outputs/step_2/class_indices.pkl']

---

# Model Creation

## ML Model

### Model Builder:

In [44]:

def build_model(cfg):
    """
    CNN builder using the scenario dict `cfg`.
    Expects keys: 'kernel_size', 'dropout', 'activation'.
    Uses global `image_shape`.
    """
    k   = int(cfg["kernel_size"])
    do  = float(cfg["dropout"])
    act = str(cfg["activation"])

    m = Sequential(name=f"cnn_k{k}_do{do}_{act}")
    m.add(Conv2D(32, (k, k), padding='same', activation=act, input_shape=image_shape))
    m.add(BatchNormalization()); m.add(MaxPooling2D((2,2)))

    m.add(Conv2D(64, (k, k), padding='same', activation=act))
    m.add(BatchNormalization()); m.add(MaxPooling2D((2,2)))

    m.add(Conv2D(64, (k, k), padding='same', activation=act))
    m.add(BatchNormalization()); m.add(MaxPooling2D((2,2)))

    m.add(Flatten())
    m.add(Dropout(do))
    m.add(Dense(64, activation=act))
    m.add(Dropout(do))
    m.add(Dense(1, activation='sigmoid'))
    return m


### Save History Plot

In [45]:
def save_history_plot(history, out_png):
    h = history.history
    plt.figure(figsize=(8,5))
    if "accuracy" in h and "val_accuracy" in h:
        plt.plot(h["accuracy"], label="train_acc")
        plt.plot(h["val_accuracy"], label="val_acc")
    if "loss" in h and "val_loss" in h:
        plt.plot(h["loss"], label="train_loss")
        plt.plot(h["val_loss"], label="val_loss")
    plt.xlabel("Epoch"); plt.ylabel("Value"); plt.legend(); plt.tight_layout()
    plt.savefig(out_png, dpi=120); plt.close()

### Train and Evaluate Model

In [46]:
def train_and_eval(cfg: dict):
    # tag + artifact paths
    tag = run_tag(cfg, version, batch_size, seed=SEED)
    P = get_artifact_paths(models_dir, reports_dir, tag)

    # build & compile model
    model = build_model(cfg)
    model.compile(
        optimizer=cfg["optimizer"],
        loss="binary_crossentropy",
        metrics=[tf.keras.metrics.BinaryAccuracy(name="accuracy")]
    )

    # callbacks (save best by val_accuracy)
    cbs = [
        # Stop training if val_accuracy doesn't improve for 4 epochs, roll back to best weights:
        EarlyStopping(monitor="val_accuracy", mode="max", patience=4, restore_best_weights=True, verbose=1),
        # Keep the best model:
        ModelCheckpoint(filepath=str(P["model_path"]), monitor="val_accuracy", mode="max", save_best_only=True, verbose=1),
    ]

    # Fit Model
    history = model.fit(
        train_set,
        epochs= epochs,
        batch_size=batch_size,
        validation_data=validation_set,
        callbacks=cbs,
        verbose=2
    )

    # Save history + curves
    joblib.dump(history.history, P["history_pkl"])
    save_history_plot(history, P["curve_png"])

    # Evaluate on validation: get probabilities, true labels, and hard preds
    validation_set.reset()                                      # start from first batch, keep order aligned
    y_prob = model.predict(validation_set, verbose=0).squeeze()  # (N,)
    y_true = validation_set.classes                              # (N,)
    y_pred = (y_prob >= 0.5).astype(int)                         # threshold @ 0.5

    # Capture val_loss and Keras accuracy for the same validation set ---
    validation_set.reset()                                       # reset again before evaluate
    val_loss, val_acc = model.evaluate(validation_set, verbose=0) # loss & accuracy from the compiled metrics

    # Compute metrics
    prec = precision_score(y_true, y_pred, zero_division=0)
    rec  = recall_score(y_true, y_pred, zero_division=0)
    f1   = f1_score(y_true, y_pred, zero_division=0)
    cm   = confusion_matrix(y_true, y_pred).tolist()

    # --- Summary, CSV-ready ---
    out = {
        "tag": tag,                          # Unique ID
        "version": version,
        "batch_size": batch_size,
        **cfg,                               # includes dropout, kernel_size, activation, optimizer
        "epochs_trained": len(history.history["loss"]),
        "val_loss": float(val_loss),
        "val_accuracy": float(val_acc),
        "val_precision": float(prec),
        "val_recall": float(rec),
        "val_f1": float(f1),
        "confusion_matrix": cm,
    }

    # Save Evaluation Pickle
    joblib.dump(out, P["evaluation_pkl"])

    # Hand back the summary so the caller can append to a list/DataFrame
    return out


### Execute Scenarios:

In [47]:
results = []
t0 = time.time()
for i, cfg in enumerate(SCENARIOS, 1):
    print(f"\n=== [{i}/{len(SCENARIOS)}] {cfg} ===")
    res = train_and_eval(cfg)          # uses your build_model + callbacks + eval
    results.append(res)

print(f"\nCompleted {len(SCENARIOS)} runs in ~{(time.time()-t0)/60:.1f} min")


=== [1/5] {'dropout': 0.3, 'kernel_size': 5, 'activation': 'relu', 'optimizer': 'adamax'} ===


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-10-12 03:41:48.339703: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M3 Pro
2025-10-12 03:41:48.339741: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 18.00 GB
2025-10-12 03:41:48.339750: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 6.66 GB
2025-10-12 03:41:48.339768: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-10-12 03:41:48.339781: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
  self._warn_if_super_not_called()


Epoch 1/25


2025-10-12 03:41:49.296934: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.



Epoch 1: val_accuracy improved from None to 0.50000, saving model to outputs/step_2/models/step_2_bs32_k5_do0.3_act-relu_opt-adamax_seed27.keras
92/92 - 25s - 271ms/step - accuracy: 0.9667 - loss: 0.9076 - val_accuracy: 0.5000 - val_loss: 31.0168
Epoch 2/25

Epoch 2: val_accuracy did not improve from 0.50000
92/92 - 22s - 235ms/step - accuracy: 0.9891 - loss: 0.2457 - val_accuracy: 0.5000 - val_loss: 50.6415
Epoch 3/25

Epoch 3: val_accuracy did not improve from 0.50000
92/92 - 21s - 232ms/step - accuracy: 0.9874 - loss: 0.2806 - val_accuracy: 0.5000 - val_loss: 47.4519
Epoch 4/25

Epoch 4: val_accuracy did not improve from 0.50000
92/92 - 21s - 232ms/step - accuracy: 0.9932 - loss: 0.1790 - val_accuracy: 0.5000 - val_loss: 53.2405
Epoch 5/25

Epoch 5: val_accuracy improved from 0.50000 to 0.89762, saving model to outputs/step_2/models/step_2_bs32_k5_do0.3_act-relu_opt-adamax_seed27.keras
92/92 - 21s - 233ms/step - accuracy: 0.9895 - loss: 0.3636 - val_accuracy: 0.8976 - val_loss: 1.4

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/25

Epoch 1: val_accuracy improved from None to 0.70238, saving model to outputs/step_2/models/step_2_bs32_k3_do0.3_act-elu_opt-adamax_seed27.keras
92/92 - 30s - 331ms/step - accuracy: 0.9575 - loss: 0.6087 - val_accuracy: 0.7024 - val_loss: 3.9012
Epoch 2/25

Epoch 2: val_accuracy did not improve from 0.70238
92/92 - 30s - 322ms/step - accuracy: 0.9847 - loss: 0.0930 - val_accuracy: 0.5190 - val_loss: 14.6210
Epoch 3/25

Epoch 3: val_accuracy did not improve from 0.70238
92/92 - 29s - 317ms/step - accuracy: 0.9878 - loss: 0.0359 - val_accuracy: 0.5976 - val_loss: 7.4959
Epoch 4/25

Epoch 4: val_accuracy improved from 0.70238 to 0.80238, saving model to outputs/step_2/models/step_2_bs32_k3_do0.3_act-elu_opt-adamax_seed27.keras
92/92 - 34s - 373ms/step - accuracy: 0.9932 - loss: 0.0201 - val_accuracy: 0.8024 - val_loss: 1.7244
Epoch 5/25

Epoch 5: val_accuracy improved from 0.80238 to 0.98810, saving model to outputs/step_2/models/step_2_bs32_k3_do0.3_act-elu_opt-adamax_seed27.k

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/25

Epoch 1: val_accuracy improved from None to 0.50000, saving model to outputs/step_2/models/step_2_bs32_k5_do0.5_act-elu_opt-adamax_seed27.keras
92/92 - 36s - 393ms/step - accuracy: 0.9592 - loss: 0.3779 - val_accuracy: 0.5000 - val_loss: 28.6869
Epoch 2/25

Epoch 2: val_accuracy did not improve from 0.50000
92/92 - 33s - 359ms/step - accuracy: 0.9738 - loss: 0.0836 - val_accuracy: 0.5000 - val_loss: 45.7233
Epoch 3/25

Epoch 3: val_accuracy did not improve from 0.50000
92/92 - 38s - 418ms/step - accuracy: 0.9817 - loss: 0.0675 - val_accuracy: 0.5000 - val_loss: 23.2244
Epoch 4/25

Epoch 4: val_accuracy did not improve from 0.50000
92/92 - 38s - 417ms/step - accuracy: 0.9840 - loss: 0.0602 - val_accuracy: 0.5000 - val_loss: 22.6320
Epoch 5/25

Epoch 5: val_accuracy improved from 0.50000 to 0.94286, saving model to outputs/step_2/models/step_2_bs32_k5_do0.5_act-elu_opt-adamax_seed27.keras
92/92 - 40s - 435ms/step - accuracy: 0.9864 - loss: 0.0285 - val_accuracy: 0.9429 - val_

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/25

Epoch 1: val_accuracy improved from None to 0.50000, saving model to outputs/step_2/models/step_2_bs32_k5_do0.3_act-elu_opt-adamax_seed27.keras
92/92 - 34s - 369ms/step - accuracy: 0.9606 - loss: 0.4107 - val_accuracy: 0.5000 - val_loss: 37.0386
Epoch 2/25

Epoch 2: val_accuracy did not improve from 0.50000
92/92 - 31s - 333ms/step - accuracy: 0.9878 - loss: 0.0438 - val_accuracy: 0.5000 - val_loss: 47.6419
Epoch 3/25

Epoch 3: val_accuracy improved from 0.50000 to 0.51429, saving model to outputs/step_2/models/step_2_bs32_k5_do0.3_act-elu_opt-adamax_seed27.keras
92/92 - 35s - 382ms/step - accuracy: 0.9885 - loss: 0.0428 - val_accuracy: 0.5143 - val_loss: 24.7974
Epoch 4/25

Epoch 4: val_accuracy improved from 0.51429 to 0.52857, saving model to outputs/step_2/models/step_2_bs32_k5_do0.3_act-elu_opt-adamax_seed27.keras
92/92 - 38s - 408ms/step - accuracy: 0.9932 - loss: 0.0300 - val_accuracy: 0.5286 - val_loss: 14.5665
Epoch 5/25

Epoch 5: val_accuracy improved from 0.52857

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/25

Epoch 1: val_accuracy improved from None to 0.50000, saving model to outputs/step_2/models/step_2_bs32_k3_do0.3_act-relu_opt-adamax_seed27.keras
92/92 - 18s - 201ms/step - accuracy: 0.9711 - loss: 2.3476 - val_accuracy: 0.5000 - val_loss: 60.8139
Epoch 2/25

Epoch 2: val_accuracy did not improve from 0.50000
92/92 - 17s - 183ms/step - accuracy: 0.9885 - loss: 0.8165 - val_accuracy: 0.5000 - val_loss: 70.1090
Epoch 3/25

Epoch 3: val_accuracy did not improve from 0.50000
92/92 - 17s - 184ms/step - accuracy: 0.9925 - loss: 0.3912 - val_accuracy: 0.5000 - val_loss: 60.3915
Epoch 4/25

Epoch 4: val_accuracy did not improve from 0.50000
92/92 - 17s - 183ms/step - accuracy: 0.9922 - loss: 0.4023 - val_accuracy: 0.5000 - val_loss: 51.9764
Epoch 5/25

Epoch 5: val_accuracy improved from 0.50000 to 0.78810, saving model to outputs/step_2/models/step_2_bs32_k3_do0.3_act-relu_opt-adamax_seed27.keras
92/92 - 17s - 183ms/step - accuracy: 0.9935 - loss: 0.2796 - val_accuracy: 0.7881 - va

### Create Dataframe on results and save to .csv

In [48]:
df = pd.DataFrame(results).sort_values(
    by=["val_accuracy", "val_loss"], ascending=[False, True] # Sort by highest accuracy, then lowest loss
).reset_index(drop=True)

report_csv = os.path.join(reports_dir, f"grid_report_bs{batch_size}.csv")
df.to_csv(report_csv, index=False)
print(f"Saved report to: {report_csv}")

Saved report to: outputs/step_2/reports/grid_report_bs32.csv


### See Results

In [49]:
df.head(5) # Display top 5 results

Unnamed: 0,tag,version,batch_size,dropout,kernel_size,activation,optimizer,epochs_trained,val_loss,val_accuracy,val_precision,val_recall,val_f1,confusion_matrix
0,step_2_bs32_k3_do0.3_act-relu_opt-adamax_seed27,step_2,32,0.3,3,relu,adamax,14,0.0,1.0,1.0,1.0,1.0,"[[210, 0], [0, 210]]"
1,step_2_bs32_k3_do0.3_act-elu_opt-adamax_seed27,step_2,32,0.3,3,elu,adamax,12,0.001563,1.0,1.0,1.0,1.0,"[[210, 0], [0, 210]]"
2,step_2_bs32_k5_do0.5_act-elu_opt-adamax_seed27,step_2,32,0.5,5,elu,adamax,16,0.002002,1.0,1.0,1.0,1.0,"[[210, 0], [0, 210]]"
3,step_2_bs32_k5_do0.3_act-relu_opt-adamax_seed27,step_2,32,0.3,5,relu,adamax,11,0.242814,0.997619,1.0,0.995238,0.997613,"[[210, 0], [1, 209]]"
4,step_2_bs32_k5_do0.3_act-elu_opt-adamax_seed27,step_2,32,0.3,5,elu,adamax,11,0.015024,0.995238,1.0,0.990476,0.995215,"[[210, 0], [2, 208]]"
