# Step 4. Model Evaluation & Metrics

### In this step, we evaluate and compare our trained models (Custom CNN and MobileNetV2) to assess whether they meet the expectations of Hypothesis 1:
> *"A convolutional neural network (CNN) can accurately classify fruit images into 8 classes."*

We perform the following evaluations:

1. **Classification Report**
   - Computes precision, recall, and F1-score for each fruit class.
   - Helps us understand how well each model predicts individual categories.

2. **Confusion Matrix**
   - A matrix showing how often classes are correctly or incorrectly predicted.
   - Reveals systematic misclassifications (e.g., between visually similar fruits).

3. **Overall Accuracy**
   - The percentage of correctly predicted images in the test set.
   - Provides a straightforward benchmark for comparing models.

4. **Learning Curves**
   - Plots of training loss and validation accuracy over epochs.
   - Allow us to compare convergence speed and detect overfitting/underfitting.

These metrics provide insight into the **strengths and weaknesses of each approach** (lightweight custom CNN vs. pretrained MobileNetV2), and allow us to determine whether the models are suitable for real-world use or require further improvement.


### 4.1 Training MobileNetV2 Variants
Here we fine-tune MobileNetV2 under four conditions:
1. Grayscale + MaxPool  
2. Grayscale + AdaptiveAvgPool  
3. RGB with Noise (augmentation)  
4. RGB Clean (no augmentation)  

The goal is to compare these transfer learning results with our custom CNNs from Step 3.


In [6]:
import sys, os
sys.path.append(os.path.abspath("../src"))
from models.mobileNetV2compare import (
    TrainConfig,
    run_mobilenet_v2_experiment
)


In [None]:
# Example 1: Grayscale + MaxPool
cfg = TrainConfig(epochs=10, lr=1e-3, batch_size=8, ckpt_path="../experiments/runs/mnv2_gray_max.pt")
test_acc, model, test_loader, hist = run_mobilenet_v2_experiment(
    train_dir=TRAIN_DIR, test_dir=TEST_DIR, num_classes=8,
    cfg=cfg, input_type="grayscale", pooling="max",
    experiment_name="MobileNetV2 Grayscale MaxPool"
)

# Example 2: Grayscale + AdaptiveAvgPool
cfg = TrainConfig(epochs=10, lr=1e-3, batch_size=8, ckpt_path="../experiments/runs/mnv2_gray_adapt.pt")
test_acc, model, test_loader, hist = run_mobilenet_v2_experiment(
    train_dir=TRAIN_DIR, test_dir=TEST_DIR, num_classes=8,
    cfg=cfg, input_type="grayscale", pooling="adaptive",
    experiment_name="MobileNetV2 Grayscale Adaptive"
)

# Example 3: RGB + Noise
cfg = TrainConfig(epochs=10, lr=1e-3, batch_size=16, ckpt_path="../experiments/runs/mnv2_rgb_noise.pt")
test_acc, model, test_loader, hist = run_mobilenet_v2_experiment(
    train_dir=TRAIN_DIR, test_dir=TEST_DIR, num_classes=8,
    cfg=cfg, input_type="rgb", use_noise=True,
    experiment_name="MobileNetV2 RGB With Noise"
)

# Example 4: RGB Clean
cfg = TrainConfig(epochs=10, lr=1e-3, batch_size=16, ckpt_path="../experiments/runs/mnv2_rgb_clean.pt")
test_acc, model, test_loader, hist = run_mobilenet_v2_experiment(
    train_dir=TRAIN_DIR, test_dir=TEST_DIR, num_classes=8,
    cfg=cfg, input_type="rgb", use_noise=False,
    experiment_name="MobileNetV2 RGB Clean"
)
