# ðŸ”¬ Image Preprocessing Pipeline Optimization for Malaria Detection

This notebook performs **automated hyperparameter optimization** for an image filter pipeline applied to malaria detection using YOLO.

**Goal:** Find the optimal combination of filters (noise reduction, color correction, contrast enhancement, sharpening) and their parameters to maximize model accuracy (mAP@50).

> ðŸ“š For detailed library documentation, see: [filter_learning_framework](https://github.com/andreolli-davide/filter_learning_framework)

## 1. Environment Setup
Clone the repository and initialize the project.

In [None]:
!rm -rf .git
!git init . -b main
!git remote add origin "https://github.com/andreolli-davide/filter_learning_framework.git"
!git pull origin main

Install the package and its dependencies.

In [None]:
%pip install .

## 2. Download Pre-trained Model
Download the YOLO model weights from Hugging Face. Choose between the base model or the data-augmented version.

In [None]:
from huggingface_hub import hf_hub_download
from pathlib import Path
from typing import Literal

MODEL_CHOICE: Literal["base", "data_augmentation"] = "data_augmentation"

resources_directory_path = Path("/content/resources")
resources_directory_path.mkdir(parents=True, exist_ok=True)

model_path = hf_hub_download(
    repo_id="DavideSenette/yolo-malaria-HCM_LCM",
    filename=(
        "weights/check_point_NO_DA.pt"
        if MODEL_CHOICE == "base"
        else "weights/check_point_WITH_DA.pt"
    ),
    local_dir=resources_directory_path,
    local_dir_use_symlinks=False,
)

## 3. Download Dataset
Fetch the malaria detection dataset from Kaggle.

In [None]:
import kagglehub

dataset_path = Path(
    kagglehub.dataset_download(
        "davidesenette/malaria-hcm-lcm-1000",
    )
)

## 4. Initialize Model and Dataset
Load the YOLO model and prepare the dataset samples for optimization.

In [None]:
from src.yolo import Yolo
from src.dataset import Dataset, DatasetSplit, Magnitude
from src.orchestrator import Orchestrator, OrchestratorConfig
from pathlib import Path
from typing import Literal

DEVICE: Literal["cpu", "mps", "cuda"] = "cuda"

model = Yolo.load_model(Path(model_path), device="cuda")
dataset = Dataset.load_from_directory(dataset_path)
samples = dataset.pick_random_samples(
    magnitude=Magnitude.LCM,
    split=DatasetSplit.VAL,
    # k parameter can be specified for choosing k random samples witin the
    # previously defined split.
)
staging_dataset = Dataset.create_staging_dataset(samples)

## 5. Run Optimization
Configure and execute the orchestrator to find the optimal filter pipeline. This will test different filter combinations and optimize their hyperparameters using Optuna.

In [None]:
from src.orchestrator import Orchestrator, OrchestratorConfig
from pathlib import Path

config = OrchestratorConfig.create_default(
    optuna_db_path=Path("optuna_studies.db"),
    checkpoint_path=Path("orchestrator_checkpoint.json"),
    n_trials_per_combination=30,
)

log = Orchestrator.train(
    model=model,
    dataset=staging_dataset,
    config=config,
)