<a href="https://colab.research.google.com/github/akshatamadavi/data_mining/blob/main/autogluon/AutoGluon_Multimodal_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# 🧠 AutoGluon Multimodal (AutoMM) — Colab Tutorial

This notebook walks through **AutoGluon Multimodal** (AutoMM) covering installation, dataset prep, training, evaluation, prediction, and model saving/loading — mirroring the official tutorial structure.

**What you'll do:**
1. Setup Colab with GPU + install AutoGluon
2. Download and prepare the **PetFinder** sample dataset (image + text + tabular)
3. Train a **`MultiModalPredictor`** for classification
4. Evaluate on a test split and inspect metrics
5. Generate predictions & probabilities
6. Save & reload the trained model for later inference

> **Tip:** In Colab, go to **Runtime → Change runtime type → GPU** (T4 or A100). You can verify with `nvidia-smi` below.



## 1) Setup & Installation

- Upgrade `pip`
- Install **AutoGluon** with multimodal support
- Verify that a **GPU** is visible


In [1]:
# ─────────────────────────────────────────
# 🚀 Setup & Install
# ─────────────────────────────────────────
!pip -q install --upgrade pip
# Using the 'all' extra to ensure multimodal deps (vision, NLP) are installed.
!pip -q install "autogluon[all]"
import sys, platform
print("Python:", sys.version)
print("Platform:", platform.platform())


# Check GPU (should show T4/A100/V100, etc.)

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.8/1.8 MB[0m [31m73.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m48.4 MB/s[0m eta [36m0:00:00[0m
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Building wheel for nvidia-ml-py3 (pyproject.toml) ... [?25l[?25hdone
 

NameError: name 'PY' is not defined

In [2]:
!nvidia-smi || echo "No GPU detected. In Colab: Runtime → Change runtime type → GPU"



Sat Nov  1 13:09:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-80GB          Off |   00000000:00:05.0 Off |                    0 |
| N/A   30C    P0             55W /  400W |       0MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                


## 2) Imports
We'll use `MultiModalPredictor` for multimodal classification and some utilities for data loading.


In [3]:

import os
import warnings
import numpy as np
import pandas as pd
warnings.filterwarnings('ignore')
np.random.seed(123)

from autogluon.multimodal import MultiModalPredictor
from autogluon.core.utils.loaders import load_zip



## 3) Download & Prepare the Dataset

We'll use a compact **PetFinder** tutorial dataset containing:
- **Images** of pets
- **Text** descriptions
- **Tabular** features (age, breed, etc.)

**Target/label:** `AdoptionSpeed` (classification).

Steps:
1. Download and unzip to a local folder
2. Load `train.csv` and `test.csv`
3. Normalize the image paths and keep the first image per row


In [4]:

# ─────────────────────────────────────────
# 📂 Download & Prepare Dataset
# ─────────────────────────────────────────
download_dir = './ag_automm_tutorial'
zip_url = 'https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip'

# Download + unzip using AutoGluon utility
load_zip.unzip(zip_url, unzip_dir=download_dir)

dataset_path = os.path.join(download_dir, 'petfinder_for_tutorial')
train_csv = os.path.join(dataset_path, 'train.csv')
test_csv  = os.path.join(dataset_path, 'test.csv')

train_data = pd.read_csv(train_csv, index_col=0)
test_data  = pd.read_csv(test_csv,  index_col=0)

label_col = 'AdoptionSpeed'
image_col = 'Images'

# Keep only the first image path if multiple are present
train_data[image_col] = train_data[image_col].astype(str).apply(lambda s: s.split(';')[0])
test_data[image_col]  = test_data[image_col].astype(str).apply(lambda s: s.split(';')[0])

# Expand relative paths to absolute paths
def expand_paths(p, base):
    parts = str(p).split(';')
    return ';'.join([os.path.abspath(os.path.join(base, pp)) for pp in parts])

train_data[image_col] = train_data[image_col].apply(lambda p: expand_paths(p, dataset_path))
test_data[image_col]  = test_data[image_col].apply(lambda p: expand_paths(p, dataset_path))

print("Train shape:", train_data.shape)
print("Test  shape:", test_data.shape)
print("Columns:", list(train_data.columns)[:15], "...")
print("Label column:", label_col)

# Peek at the data
display(train_data.head(3))


Downloading ./ag_automm_tutorial/file.zip from https://automl-mm-bench.s3.amazonaws.com/petfinder_for_tutorial.zip...


100%|██████████| 18.8M/18.8M [00:01<00:00, 16.6MiB/s]


Train shape: (600, 25)
Test  shape: (100, 25)
Columns: ['Type', 'Name', 'Age', 'Breed1', 'Breed2', 'Gender', 'Color1', 'Color2', 'Color3', 'MaturitySize', 'FurLength', 'Vaccinated', 'Dewormed', 'Sterilized', 'Health'] ...
Label column: AdoptionSpeed


Unnamed: 0,Type,Name,Age,Breed1,Breed2,Gender,Color1,Color2,Color3,MaturitySize,...,Quantity,Fee,State,RescuerID,VideoAmt,Description,PetID,PhotoAmt,AdoptionSpeed,Images
0,2,Yumi Hamasaki,4,292,265,2,1,5,7,2,...,1,0,41326,bcc4e1b9557a8b3aaf545ea8e6e86991,0,I rescued Yumi Hamasaki at a food stall far aw...,7d7a39d71,3.0,0,/content/ag_automm_tutorial/petfinder_for_tuto...
1,2,Nene/ Kimie,12,285,0,2,5,6,7,2,...,1,0,41326,f0450bf0efe0fa3ff9321d0b827b1237,0,Has adopted by a friend with new pet name Kimie,0e107c82f,3.0,0,/content/ag_automm_tutorial/petfinder_for_tuto...
2,2,Mattie,12,266,0,2,1,7,0,2,...,1,0,41401,9b52af6d48a4521fd01d4028eb5879a3,0,I rescued Mattie with a broken leg. After surg...,1a8fd6707,5.0,0,/content/ag_automm_tutorial/petfinder_for_tuto...



## 4) Train a Multimodal Model

We create a `MultiModalPredictor` and call `.fit(...)`.

- You can adjust `time_limit` for more thorough training.
- For quick runs in Colab, we keep it small; increase for better accuracy.
- AutoGluon will automatically detect and use the **GPU** when available.


In [6]:
# ─────────────────────────────────────────
# 🧠 Train MultiModalPredictor
# ─────────────────────────────────────────
save_dir = './automm_petfinder_model'
predictor = MultiModalPredictor(label=label_col, path=save_dir)

predictor.fit(
    train_data=train_data,
    time_limit=300,  # seconds; increase for stronger models
)

AutoGluon Version:  1.4.0
Python Version:     3.12.12
Operating System:   Linux
Platform Machine:   x86_64
Platform Version:   #1 SMP Thu Oct  2 10:42:05 UTC 2025
CPU Count:          12
Pytorch Version:    2.7.1+cu126
CUDA Version:       12.6
GPU Count:          1
Memory Avail:       162.69 GB / 167.05 GB (97.4%)
Disk Space Avail:   190.03 GB / 235.68 GB (80.6%)
AutoGluon infers your prediction problem is: 'binary' (because only two unique label-values observed).
	2 unique label values:  [np.int64(0), np.int64(1)]
	If 'binary' is not the correct problem_type, please manually specify the problem_type parameter during Predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression', 'quantile'])

AutoMM starts to create your model. ✨✨✨

To track the learning progress, you can open a terminal and launch Tensorboard:
    ```shell
    # Assume you have installed tensorboard
    tensorboard --logdir /content/automm_petfinder_model
    ```

INFO: Seed set to 0


config.json:   0%|          | 0.00/666 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/440M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/395M [00:00<?, ?B/s]

GPU Count: 1
GPU Count to be Used: 1

INFO: Using 16bit Automatic Mixed Precision (AMP)
INFO: GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO: You are using a CUDA device ('NVIDIA A100-SXM4-80GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO: 
  | Name              | Type                | Params | Mode 
------------------------------------------------------------------
0 | model             | MultimodalFusionMLP | 207 M  | train
1 | validation_metric | BinaryAUROC         | 0      | train
2 | loss_func         | CrossEntropyLoss    | 0      | train
--------------------------------------------

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

Training: |          | 0/? [00:00<?, ?it/s]

Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 0, global step 1: 'val_roc_auc' reached 0.56236 (best 0.56236), saving model to '/content/automm_petfinder_model/epoch=0-step=1.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 0, global step 4: 'val_roc_auc' reached 0.73083 (best 0.73083), saving model to '/content/automm_petfinder_model/epoch=0-step=4.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 1, global step 5: 'val_roc_auc' reached 0.75083 (best 0.75083), saving model to '/content/automm_petfinder_model/epoch=1-step=5.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 1, global step 8: 'val_roc_auc' reached 0.77750 (best 0.77750), saving model to '/content/automm_petfinder_model/epoch=1-step=8.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 2, global step 9: 'val_roc_auc' reached 0.77583 (best 0.77750), saving model to '/content/automm_petfinder_model/epoch=2-step=9.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 2, global step 12: 'val_roc_auc' reached 0.76472 (best 0.77750), saving model to '/content/automm_petfinder_model/epoch=2-step=12.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 3, global step 13: 'val_roc_auc' was not in top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 3, global step 16: 'val_roc_auc' reached 0.77556 (best 0.77750), saving model to '/content/automm_petfinder_model/epoch=3-step=16.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 4, global step 17: 'val_roc_auc' reached 0.78111 (best 0.78111), saving model to '/content/automm_petfinder_model/epoch=4-step=17.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 4, global step 20: 'val_roc_auc' reached 0.78458 (best 0.78458), saving model to '/content/automm_petfinder_model/epoch=4-step=20.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 5, global step 21: 'val_roc_auc' reached 0.79653 (best 0.79653), saving model to '/content/automm_petfinder_model/epoch=5-step=21.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 5, global step 24: 'val_roc_auc' reached 0.80583 (best 0.80583), saving model to '/content/automm_petfinder_model/epoch=5-step=24.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 6, global step 25: 'val_roc_auc' reached 0.80778 (best 0.80778), saving model to '/content/automm_petfinder_model/epoch=6-step=25.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 6, global step 28: 'val_roc_auc' reached 0.79986 (best 0.80778), saving model to '/content/automm_petfinder_model/epoch=6-step=28.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 7, global step 29: 'val_roc_auc' reached 0.80000 (best 0.80778), saving model to '/content/automm_petfinder_model/epoch=7-step=29.ckpt' as top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 7, global step 32: 'val_roc_auc' was not in top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 8, global step 33: 'val_roc_auc' was not in top 3


Validation: |          | 0/? [00:00<?, ?it/s]

INFO: Epoch 8, global step 36: 'val_roc_auc' was not in top 3
INFO: Time limit reached. Elapsed time is 0:05:02. Signaling Trainer to stop.


Validation: |          | 0/? [00:00<?, ?it/s]

Start to fuse 3 checkpoints via the greedy soup algorithm.
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

AutoMM has created your model. 🎉🎉🎉

To load the model, use the code below:
    ```python
    from autogluon.multimodal import MultiModalPredictor
    predictor = MultiModalPredictor.load("/content/automm_petfinder_model")
    ```

If you are not satisfied with the model, try to increase the training time, 
adjust the hyperparameters (https://auto.gluon.ai/stable/tutorials/multimodal/advanced_topics/customization.html),
or post issues on GitHub (https://github.com/autogluon/autogluon/issues).




<autogluon.multimodal.predictor.MultiModalPredictor at 0x7fc88d722540>


## 5) Evaluate on the Test Set

Use `.evaluate(test_data)` to compute metrics (e.g., accuracy, F1, etc.).


In [7]:

# ─────────────────────────────────────────
# 📈 Evaluate
# ─────────────────────────────────────────
metrics = predictor.evaluate(test_data)
print("Test metrics:", metrics)


INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

Test metrics: {'roc_auc': np.float64(0.898)}



## 6) Generate Predictions & Probabilities

Use `.predict(...)` to get class predictions and `.predict_proba(...)` for class probabilities.


In [8]:

# ─────────────────────────────────────────
# 🔮 Predict
# ─────────────────────────────────────────
preds = predictor.predict(test_data)
probs = predictor.predict_proba(test_data)

print("Predictions (first 10):")
print(preds.head(10))

print("\nProbabilities (first 3 rows):")
display(probs.head(3))


INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

Predictions (first 10):
8     0
70    1
82    1
28    0
63    1
0     0
5     1
50    1
81    1
4     1
Name: AdoptionSpeed, dtype: int64

Probabilities (first 3 rows):


Unnamed: 0,0,1
8,0.988723,0.011277
70,0.012735,0.987265
82,0.000183,0.999817



## 7) Save & Load the Model

AutoGluon models are saved under the `path` you provided. You can reload them later and do inference without retraining.


In [9]:

# ─────────────────────────────────────────
# 💾 Save & Reload
# ─────────────────────────────────────────
print("Model directory:", predictor.path)

# Reload
reloaded = MultiModalPredictor.load(predictor.path)

# Sanity-check prediction equals (or is close to) the original predictor on same data
reloaded_preds = reloaded.predict(test_data)
print("Reloaded predictions match shape:", reloaded_preds.shape == preds.shape)


Model directory: /content/automm_petfinder_model


Load pretrained checkpoint: /content/automm_petfinder_model/model.ckpt
INFO: 💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.


Predicting: |          | 0/? [00:00<?, ?it/s]

Reloaded predictions match shape: True



## 8) (Optional) Advanced Tips & Tweaks

- **Increase `time_limit`** for better results.
- Use `hyperparameters` to control model families / backbones.
- Try **data subsampling** for faster iteration during prototyping.
- Use `eval_metric` to pick a specific metric aligned with your goal.
- Explore `.fit_summary()` for training details and artifacts.


In [10]:

# Example: show a compact fit summary (if available)
try:
    summary = predictor.fit_summary()
    if isinstance(summary, dict):
        print("fit_summary keys:", list(summary.keys()))
    else:
        print(summary)
except Exception as e:
    print("fit_summary not available or failed:", e)


fit_summary keys: ['val_roc_auc', 'training_time']



---

### ✅ You’re Done!
You trained, evaluated, predicted, and saved a **multimodal** model with images, text, and tabular features using **AutoGluon**.

**Next ideas:**
- Swap in your own dataset with similar columns (image paths + text + tabular + label).
- Tune hyperparameters and increase training time for higher accuracy.
- Export embeddings with `predictor.extract_embedding(...)` for downstream tasks.
