# SakuraScan – Modelling (PyTorch)

## Objectives
- Train a binary image classifier to distinguish healthy cherry leaves from leaves with powdery mildew.
- Use transfer learning with a pretrained CNN (ResNet18) for robust performance.
- Save the trained model for use in the SakuraScan Streamlit dashboard.

## Inputs
- Image dataset stored in `Data/source_images/healthy` and `Data/source_images/powdery_mildew`.

## Outputs
- Trained PyTorch model weights saved to `app_pages/src/models/sakuramodel_resnet18.pth`.
- Printed training and validation accuracy and loss.


In [5]:
"""
Model training script for SakuraScan using PyTorch and transfer learning.
"""

from pathlib import Path
import os
from typing import Tuple, Dict, List

import torch
from torch import nn, optim
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms, models

In [9]:
"""
Set up paths, constants, and devive configuration.
"""

from pathlib import Path

# Project root is the parent folder of the notebooks directory
PROJECT_ROOT = Path("..").resolve()

DATA_DIR = PROJECT_ROOT / "Data" / "source_images"
MODEL_DIR = PROJECT_ROOT / "app_pages" / "src" / "models"
MODEL_DIR.mkdir(parents=True, exist_ok=True)

MODEL_PATH = MODEL_DIR / "sakuramodel_resnet18.pth"

BATCH_SIZE = 32  # Number of images processed in one training step.
NUM_EPOCHS = 8  # How many full passes the model makes over the entire training dataset.
LEARNING_RATE = 1e-4  # Controls how big the weight updates are during training.
VAL_SPLIT = 0.2  # Fraction of the dataset reserved for validation to evaluate model performance.
IMAGE_SIZE = 224  # Target resolution for all input images (ResNet models expect 224×224 pixels).

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


device(type='cpu')

In [11]:
print("PROJECT_ROOT:", PROJECT_ROOT)
print("DATA_DIR:", DATA_DIR)
print("Exists DATA_DIR?", DATA_DIR.exists())
print("MODEL_DIR:", MODEL_DIR)
print("Exists MODEL_DIR?", MODEL_DIR.exists())

PROJECT_ROOT: C:\Users\NeosT\OneDrive\Skrivbord\VsCode-Projects\SakuraScan\SakuraScan
DATA_DIR: C:\Users\NeosT\OneDrive\Skrivbord\VsCode-Projects\SakuraScan\SakuraScan\Data\source_images
Exists DATA_DIR? True
MODEL_DIR: C:\Users\NeosT\OneDrive\Skrivbord\VsCode-Projects\SakuraScan\SakuraScan\app_pages\src\models
Exists MODEL_DIR? True


In [12]:
"""
Create training and validation datasets and dataloaders using ImageFolder.
"""

# Data augmentation and normalization for training
train_transform = transforms.Compose(
    [
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(10),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225]),
    ]
)

# Only resize + normalize for validation
val_transform = transforms.Compose(
    [
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225]),
    ]
)

# Load all images with a temporary transform (updated per split)
full_dataset = datasets.ImageFolder(root=str(DATA_DIR), transform=train_transform)

class_names: List[str] = full_dataset.classes
class_names

['healthy', 'powdery_mildew']