<a href="https://colab.research.google.com/github/akshatamadavi/data_mining/blob/main/autogluon/AutoGluon_Multimodal_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 🧠 AutoGluon Multimodal (AutoMM) — Colab Tutorial

This notebook walks through **AutoGluon Multimodal** (AutoMM) covering installation, dataset prep, training, evaluation, prediction, and model saving/loading — mirroring the official tutorial structure.

**What you'll do:**
1. Setup Colab with GPU + install AutoGluon
2. Download and prepare the **PetFinder** sample dataset (image + text + tabular)
3. Train a **`MultiModalPredictor`** for classification
4. Evaluate on a test split and inspect metrics
5. Generate predictions & probabilities
6. Save & reload the trained model for later inference

> **Tip:** In Colab, go to **Runtime → Change runtime type → GPU** (T4 or A100). You can verify with `nvidia-smi` below.



## 1) Setup & Installation

- Upgrade `pip`
- Install **AutoGluon** with multimodal support
- Verify that a **GPU** is visible


In [None]:

# ─────────────────────────────────────────
# 🚀 Setup & Install
# ─────────────────────────────────────────
!pip -q install --upgrade pip
# Using the 'all' extra to ensure multimodal deps (vision, NLP) are installed.
!pip -q install "autogluon[all]"
!python - << 'PY'
import sys, platform
print("Python:", sys.version)
print("Platform:", platform.platform())
PY

# Check GPU (should show T4/A100/V100, etc.)
!nvidia-smi || echo "No GPU detected. In Colab: Runtime → Change runtime type → GPU"



## 2) Imports
We'll use `MultiModalPredictor` for multimodal classification and some utilities for data loading.


In [None]:

import os
import warnings
import numpy as np
import pandas as pd
warnings.filterwarnings('ignore')
np.random.seed(123)

from autogluon.multimodal import MultiModalPredictor
from autogluon.core.utils.loaders import load_zip



## 3) Download & Prepare the Dataset

We'll use a compact **PetFinder** tutorial dataset containing:
- **Images** of pets
- **Text** descriptions
- **Tabular** features (age, breed, etc.)

**Target/label:** `AdoptionSpeed` (classification).

Steps:
1. Download and unzip to a local folder
2. Load `train.csv` and `test.csv`
3. Normalize the image paths and keep the first image per row


In [None]:

# ─────────────────────────────────────────
# 📂 Download & Prepare Dataset
# ─────────────────────────────────────────
download_dir = './ag_automm_tutorial'
zip_url = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'

# Download + unzip using AutoGluon utility
load_zip.unzip(zip_url, unzip_dir=download_dir)

dataset_path = os.path.join(download_dir, 'petfinder_for_tutorial')
train_csv = os.path.join(dataset_path, 'train.csv')
test_csv  = os.path.join(dataset_path, 'test.csv')

train_data = pd.read_csv(train_csv, index_col=0)
test_data  = pd.read_csv(test_csv,  index_col=0)

label_col = 'AdoptionSpeed'
image_col = 'Images'

# Keep only the first image path if multiple are present
train_data[image_col] = train_data[image_col].astype(str).apply(lambda s: s.split(';')[0])
test_data[image_col]  = test_data[image_col].astype(str).apply(lambda s: s.split(';')[0])

# Expand relative paths to absolute paths
def expand_paths(p, base):
    parts = str(p).split(';')
    return ';'.join([os.path.abspath(os.path.join(base, pp)) for pp in parts])

train_data[image_col] = train_data[image_col].apply(lambda p: expand_paths(p, dataset_path))
test_data[image_col]  = test_data[image_col].apply(lambda p: expand_paths(p, dataset_path))

print("Train shape:", train_data.shape)
print("Test  shape:", test_data.shape)
print("Columns:", list(train_data.columns)[:15], "...")
print("Label column:", label_col)

# Peek at the data
display(train_data.head(3))



## 4) Train a Multimodal Model

We create a `MultiModalPredictor` and call `.fit(...)`.

- You can adjust `time_limit` for more thorough training.
- For quick runs in Colab, we keep it small; increase for better accuracy.
- AutoGluon will automatically detect and use the **GPU** when available.


In [None]:

# ─────────────────────────────────────────
# 🧠 Train MultiModalPredictor
# ─────────────────────────────────────────
save_dir = './automm_petfinder_model'
predictor = MultiModalPredictor(label=label_col, path=save_dir)

predictor.fit(
    train_data=train_data,
    time_limit=300,  # seconds; increase for stronger models
    verbose=True,
)



## 5) Evaluate on the Test Set

Use `.evaluate(test_data)` to compute metrics (e.g., accuracy, F1, etc.).


In [None]:

# ─────────────────────────────────────────
# 📈 Evaluate
# ─────────────────────────────────────────
metrics = predictor.evaluate(test_data)
print("Test metrics:", metrics)



## 6) Generate Predictions & Probabilities

Use `.predict(...)` to get class predictions and `.predict_proba(...)` for class probabilities.


In [None]:

# ─────────────────────────────────────────
# 🔮 Predict
# ─────────────────────────────────────────
preds = predictor.predict(test_data)
probs = predictor.predict_proba(test_data)

print("Predictions (first 10):")
print(preds.head(10))

print("\nProbabilities (first 3 rows):")
display(probs.head(3))



## 7) Save & Load the Model

AutoGluon models are saved under the `path` you provided. You can reload them later and do inference without retraining.


In [None]:

# ─────────────────────────────────────────
# 💾 Save & Reload
# ─────────────────────────────────────────
print("Model directory:", predictor.path)

# Reload
reloaded = MultiModalPredictor.load(predictor.path)

# Sanity-check prediction equals (or is close to) the original predictor on same data
reloaded_preds = reloaded.predict(test_data)
print("Reloaded predictions match shape:", reloaded_preds.shape == preds.shape)



## 8) (Optional) Advanced Tips & Tweaks

- **Increase `time_limit`** for better results.
- Use `hyperparameters` to control model families / backbones.
- Try **data subsampling** for faster iteration during prototyping.
- Use `eval_metric` to pick a specific metric aligned with your goal.
- Explore `.fit_summary()` for training details and artifacts.


In [None]:

# Example: show a compact fit summary (if available)
try:
    summary = predictor.fit_summary()
    if isinstance(summary, dict):
        print("fit_summary keys:", list(summary.keys()))
    else:
        print(summary)
except Exception as e:
    print("fit_summary not available or failed:", e)



---

### ✅ You’re Done!
You trained, evaluated, predicted, and saved a **multimodal** model with images, text, and tabular features using **AutoGluon**.

**Next ideas:**
- Swap in your own dataset with similar columns (image paths + text + tabular + label).
- Tune hyperparameters and increase training time for higher accuracy.
- Export embeddings with `predictor.extract_embedding(...)` for downstream tasks.
