# Brain Cancer MRI: Fine-tune DINOv2 (facebook/dinov2-base) with EasyTune

This notebook loads the Kaggle brain cancer dataset via kagglehub (HF adapter), prepares an image-folder layout, and fine-tunes `facebook/dinov2-base` for image similarity with EasyTune.


In [None]:
# If running locally or on Colab, ensure dependencies are installed
# This cell locates the repo root (with setup.py) and installs easytune in editable mode
# Also tries a direct sys.path fallback so a restart isn't required
import sys, subprocess
from pathlib import Path

try:
    import easytune  # noqa
    print("easytune already available")
except Exception:
    # First try adding the parent directory (project root containing the package) to sys.path
    candidate = Path.cwd().parent
    if (candidate / "easytune" / "__init__.py").exists():
        sys.path.insert(0, str(candidate))
        try:
            import easytune  # noqa
            print("easytune imported via sys.path fallback")
        except Exception:
            pass
    # If still not importable, locate root and pip install -e
    if 'easytune' not in sys.modules:
        print("Locating project root and installing easytune in editable mode ...")
        cwd = Path.cwd()
        root = None
        for p in [cwd, *cwd.parents]:
            if (p / "setup.py").exists() and (p / "easytune").exists():
                root = p
                break
        if root is None:
            for c in [Path("."), Path(".."), Path("../..")]:
                if (c / "setup.py").exists() and (c / "easytune").exists():
                    root = c.resolve()
                    break
        if root is None:
            raise RuntimeError("Could not find project root with setup.py; please cd to repo root.")
        print(f"Installing from: {root}")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", str(root)])
        import easytune  # noqa
        print("easytune installed and imported")

# Optional: KaggleHub for dataset loading via Hugging Face adapter
try:
    import kagglehub  # noqa
except Exception:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "kagglehub[hf-datasets]"])

# Ensure scikit-learn for StratifiedShuffleSplit
try:
    import sklearn  # noqa
except Exception:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "scikit-learn"])


Locating project root and installing easytune in editable mode ...
Installing from: c:\Users\sunco\OneDrive\Desktop\fine-tuning\easytune


In [4]:
# Download the Kaggle dataset locally and return its directory
import kagglehub

dataset_dir = kagglehub.dataset_download("orvile/brain-cancer-mri-dataset")
print("Downloaded dataset directory:", dataset_dir)

# Inspect a few entries
import os
for root, dirs, files in os.walk(dataset_dir):
    print("DIR:", root)
    # show a small sample of files
    sample = [f for f in files if os.path.splitext(f)[1].lower() in {".jpg",".jpeg",".png",".bmp",".gif"}]
    if sample:
        print("SAMPLE IMAGES:", sample[:5])
    # stop after printing top-level and one sublevel
    break


Downloading from https://www.kaggle.com/api/v1/datasets/download/orvile/brain-cancer-mri-dataset?dataset_version_number=2...


100%|██████████| 144M/144M [00:23<00:00, 6.39MB/s] 

Extracting files...





Downloaded dataset directory: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2
DIR: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2


In [5]:
# Build a folder-of-classes view expected by EasyTune
# The dataset provides subfolders per class; detect them and point EasyTune to the root
import os
from pathlib import Path

root = Path(dataset_dir)
class_dirs = [p for p in root.rglob("*") if p.is_dir() and any(suffix in {".jpg",".jpeg",".png",".bmp",".gif"} for suffix in [q.suffix.lower() for q in p.glob("*")])]
if not class_dirs:
    # Fallback: some datasets have split subfolders like train/val/test
    # Try to find a split with class subfolders
    split_candidates = [d for d in root.iterdir() if d.is_dir()]
    selected = None
    for sc in split_candidates:
        cds = [p for p in sc.iterdir() if p.is_dir()]
        if cds and any(list(cd.glob("*.png")) or list(cd.glob("*.jpg")) for cd in cds):
            selected = sc
            break
    if selected is None:
        raise RuntimeError("Could not detect class subfolders with images. Please inspect dataset_dir.")
    detected_root = selected
else:
    detected_root = root

print("Detected dataset root for training:", detected_root)


Detected dataset root for training: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2


In [None]:
# No export needed when the dataset is already arranged in class subfolders.
# If your dataset has clear train/val splits, you may point to just one of them.
# Here we point to detected_root and let EasyTune create its own validation split.


In [None]:
# Train with EasyTune FineTuner from detected_root
# Extra sys.path fallback to ensure import without restart
import sys
from pathlib import Path
candidate = Path.cwd().parent
if 'easytune' not in sys.modules and (candidate / 'easytune' / '__init__.py').exists():
    sys.path.insert(0, str(candidate))

from easytune import FineTuner
import os

model_name = "facebook/dinov2-base"

print("Initializing FineTuner ...")
ft = FineTuner(model=model_name, task="image-similarity", device="auto")
ft.add_from_folder(str(detected_root), validation_split=0.2)

print("Training ...")
ft.train(epochs=5, batch_size=16, learning_rate=1e-5, temperature=0.07)

save_dir = "./models/brain_cancer_dinov2_adapter"
os.makedirs(save_dir, exist_ok=True)
ft.save(save_dir)
print("Saved to:", save_dir)


Initializing FineTuner ...


preprocessor_config.json:   0%|          | 0.00/436 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


config.json:   0%|          | 0.00/548 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/346M [00:00<?, ?B/s]

Loaded 6056 images from folder; total 6056 samples
Training ...




Epoch 1/5:   0%|          | 0/303 [00:00<?, ?it/s]

## Run this notebook on Google Colab using local files (no PyPI publish)

1. Upload the project zip to Colab (or clone from your private repo):
   - Zip the project locally from the repo root so `setup.py` and the `easytune/` package are at the top level.
   - In Colab, upload the zip and unzip:
```bash
!unzip -q your_project.zip -d /content
%cd /content/fine-tuning  # adjust if folder name differs
```
2. Install the local package in editable mode inside Colab:
```bash
!pip install -e .
```
3. Open this example notebook or create a new one in Colab and run cells. If the environment was started before install, restart the runtime once after step 2.
4. If you prefer mounting Google Drive to access the local files instead of uploading a zip:
```python
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/path/to/fine-tuning
!pip install -e .
```
5. Install kagglehub if not already installed:
```bash
!pip install kagglehub[hf-datasets]
```
6. Then run the dataset load + training cells above as-is. No PyPI publish is required; Colab will import `easytune` from the local files you installed in editable mode.
