# Brain Cancer MRI: Fine-tune DINOv2 (facebook/dinov2-base) with EasyTune

This notebook loads the Kaggle brain cancer dataset via kagglehub (HF adapter), prepares an image-folder layout, and fine-tunes `facebook/dinov2-base` for image similarity with EasyTune.


In [2]:
# If running locally or on Colab, ensure dependencies are installed
# This cell locates the repo root (with setup.py) and installs easytune in editable mode
# Also tries a direct sys.path fallback so a restart isn't required
import sys, subprocess
from pathlib import Path

try:
    import easytune  # noqa
    print("easytune already available")
except Exception:
    # First try adding the parent directory (project root containing the package) to sys.path
    candidate = Path.cwd().parent
    if (candidate / "easytune" / "__init__.py").exists():
        sys.path.insert(0, str(candidate))
        try:
            import easytune  # noqa
            print("easytune imported via sys.path fallback")
        except Exception:
            pass
    # If still not importable, locate root and pip install -e
    if 'easytune' not in sys.modules:
        print("Locating project root and installing easytune in editable mode ...")
        cwd = Path.cwd()
        root = None
        for p in [cwd, *cwd.parents]:
            if (p / "setup.py").exists() and (p / "easytune").exists():
                root = p
                break
        if root is None:
            for c in [Path("."), Path(".."), Path("../..")]:
                if (c / "setup.py").exists() and (c / "easytune").exists():
                    root = c.resolve()
                    break
        if root is None:
            raise RuntimeError("Could not find project root with setup.py; please cd to repo root.")
        print(f"Installing from: {root}")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", str(root)])
        import easytune  # noqa
        print("easytune installed and imported")

# Optional: KaggleHub for dataset loading via Hugging Face adapter
try:
    import kagglehub  # noqa
except Exception:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "kagglehub[hf-datasets]"])

# Ensure scikit-learn for StratifiedShuffleSplit
try:
    import sklearn  # noqa
except Exception:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "scikit-learn"])


easytune already available


In [3]:
# Download the Kaggle dataset locally and return its directory
import kagglehub

dataset_dir = kagglehub.dataset_download("orvile/brain-cancer-mri-dataset")
print("Downloaded dataset directory:", dataset_dir)

# Inspect a few entries
import os
for root, dirs, files in os.walk(dataset_dir):
    print("DIR:", root)
    # show a small sample of files
    sample = [f for f in files if os.path.splitext(f)[1].lower() in {".jpg",".jpeg",".png",".bmp",".gif"}]
    if sample:
        print("SAMPLE IMAGES:", sample[:5])
    # stop after printing top-level and one sublevel
    break


Downloaded dataset directory: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2
DIR: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2


In [4]:
# Build a folder-of-classes view expected by EasyTune
# The dataset provides subfolders per class; detect them and point EasyTune to the root
import os
from pathlib import Path

root = Path(dataset_dir)
class_dirs = [p for p in root.rglob("*") if p.is_dir() and any(suffix in {".jpg",".jpeg",".png",".bmp",".gif"} for suffix in [q.suffix.lower() for q in p.glob("*")])]
if not class_dirs:
    # Fallback: some datasets have split subfolders like train/val/test
    # Try to find a split with class subfolders
    split_candidates = [d for d in root.iterdir() if d.is_dir()]
    selected = None
    for sc in split_candidates:
        cds = [p for p in sc.iterdir() if p.is_dir()]
        if cds and any(list(cd.glob("*.png")) or list(cd.glob("*.jpg")) for cd in cds):
            selected = sc
            break
    if selected is None:
        raise RuntimeError("Could not detect class subfolders with images. Please inspect dataset_dir.")
    detected_root = selected
else:
    detected_root = root

print("Detected dataset root for training:", detected_root)


Detected dataset root for training: C:\Users\sunco\.cache\kagglehub\datasets\orvile\brain-cancer-mri-dataset\versions\2


In [5]:
# No export needed when the dataset is already arranged in class subfolders.
# If your dataset has clear train/val splits, you may point to just one of them.
# Here we point to detected_root and let EasyTune create its own validation split.


In [None]:
# Train on a cloud GPU via EasyTune SDK (Modal backend) — no manual CLI

from easytune.finetuner import FineTuner
from easytune.cloud.modal_backend import ModalBackend

model_name = "facebook/dinov2-base"
out_dir = "./cloud_artifacts"

print("Launching cloud training ...")
FineTuner(model=model_name, task="image-similarity").train_remote(
    ModalBackend(),
    data_dir=str(detected_root),  # local folder with class subfolders; auto-uploaded by backend
    out_dir=out_dir,
    epochs=5,
    batch_size=16,
    learning_rate=1e-5,
)
print("Done. Artifacts in:", out_dir)


Launching cloud training ...
- Initializing...
âœ“ Initialized. View run at 
https://modal.com/apps/mikethebot44/main/ap-SfDVzc9JIj2Wa667tJUBq5
- Initializing...
- Initializing...

- Creating objects...
| Creating objects...
- Creating objects...
\ Creating objects...
â””â”€â”€ - Creating mount 
    c:\Users\sunco\OneDrive\Desktop\fine-tuning\easytune\examples\remote\modal_
    easytune.py: Uploaded 0/1 files
- Creating objects...
â””â”€â”€ | Creating mount 
    c:\Users\sunco\OneDrive\Desktop\fine-tuning\easytune\examples\remote\modal_
    easytune.py: Uploaded 0/1 files
\ Creating objects...
â””â”€â”€ / Creating mount 
    c:\Users\sunco\OneDrive\Desktop\fine-tuning\easytune\examples\remote\modal_
    easytune.py: Finalizing index of 1 files
- Creating objects...
â”œâ”€â”€ ðŸ”¨ Created mount 
â”‚   c:\Users\sunco\OneDrive\Desktop\fine-tuning\easytune\examples\remote\modal_
â”‚   easytune.py
â””â”€â”€ - Creating function train_remote...
| Creating objects...
â”œâ”€â”€ ðŸ”¨ Created mou

In [1]:
# Inference: load trained model and run simple search

from easytune.inference import EasyModel, SimpleIndex
from pathlib import Path

model_path = Path('./cloud_artifacts/latest')
if not model_path.exists():
    raise FileNotFoundError("Trained artifacts not found at ./cloud_artifacts/latest. Run training first.")

# Load model
m = EasyModel.load(str(model_path), device='auto')

# Build an index over a small sample set (change to your gallery)
# Here we just reuse a few images from the dataset root's first class folder
import os
sample_images = []
for root, dirs, files in os.walk(str(detected_root)):
    imgs = [os.path.join(root, f) for f in files if os.path.splitext(f)[1].lower() in {'.jpg','.jpeg','.png','.bmp','.gif'}]
    if imgs:
        sample_images = imgs[:10]
        break

if not sample_images:
    raise RuntimeError("Could not find sample images under detected_root.")

E = m.embed_images(sample_images)
idx = SimpleIndex.from_embeddings(E, ids=sample_images)

# Query with another image from the same folder
q = m.embed_images([sample_images[0]])
results = idx.search(q, k=5)
print('Top-5 similar images:')
for i, (img, score) in enumerate(results[0], 1):
    print(f'{i}. {img} \t score={score:.4f}')



FileNotFoundError: Trained artifacts not found at ./cloud_artifacts/latest. Run training first.

## Run this notebook on Google Colab using local files (no PyPI publish)

1. Upload the project zip to Colab (or clone from your private repo):
   - Zip the project locally from the repo root so `setup.py` and the `easytune/` package are at the top level.
   - In Colab, upload the zip and unzip:
```bash
!unzip -q your_project.zip -d /content
%cd /content/fine-tuning  # adjust if folder name differs
```
2. Install the local package in editable mode inside Colab:
```bash
!pip install -e .
```
3. Open this example notebook or create a new one in Colab and run cells. If the environment was started before install, restart the runtime once after step 2.
4. If you prefer mounting Google Drive to access the local files instead of uploading a zip:
```python
from google.colab import drive
drive.mount('/content/drive')
%cd /content/drive/MyDrive/path/to/fine-tuning
!pip install -e .
```
5. Install kagglehub if not already installed:
```bash
!pip install kagglehub[hf-datasets]
```
6. Then run the dataset load + training cells above as-is. No PyPI publish is required; Colab will import `easytune` from the local files you installed in editable mode.
