# Sanity Train: Gemma + DINO on COCO (val)

Run a quick 20-step training on GPU using the repo modules to verify end-to-end wiring.

- Uses the COCO validation split (5k) for both train/val to keep it small.
- Disables Weights & Biases logging.
- Mixed precision on GPU if available.

Tip: The first run downloads models/dataset (1–2GB). Subsequent runs will reuse cache (HF_HOME).

In [1]:
# Environment & versions (local vars, no env mutation)
import os, sys
import torch
import transformers
import datasets as hfds
import pytorch_lightning as pl

HF_CACHE = None

print('Python:', sys.version.split()[0])
print('Torch:', torch.__version__)
print('Transformers:', transformers.__version__)
print('datasets:', hfds.__version__)
print('Lightning:', pl.__version__)
if torch.cuda.is_available():
    print('CUDA available ->', torch.cuda.get_device_name(0))
else:
    print('CUDA not available; this run expects a GPU.')

Python: 3.13.3
Torch: 2.8.0+cu128
Transformers: 4.56.1
datasets: 4.1.0
Lightning: 2.5.5
CUDA available -> NVIDIA GeForce GTX 1650 Ti


In [None]:
# Make project root importable when running from scripts/
import sys, os
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.insert(0, project_root)

In [3]:
# Build COCO validation loaders (used for both train and val)
from src.data.dataloader import make_coco_dataloader

batch_size = int(os.environ.get('SANITY_BS', 2))  # keep small for VRAM
num_workers = int(os.environ.get('SANITY_NUM_WORKERS', 2))
pin_memory = torch.cuda.is_available()

train_loader = make_coco_dataloader(
    split='validation',
    batch_size=batch_size,
    shuffle=True,
    caption_index=None,  # random choice for some variety
    seed=42,
    num_workers=num_workers,
    pin_memory=pin_memory,
    cache_dir=HF_CACHE,
)

val_loader = make_coco_dataloader(
    split='validation',
    batch_size=batch_size,
    shuffle=False,
    caption_index=None,
    seed=42,
    num_workers=num_workers,
    pin_memory=pin_memory,
    cache_dir=HF_CACHE,
)

sample_images, sample_caps = next(iter(train_loader))
print(f'Sample batch -> images: {len(sample_images)}, example caption: {sample_caps[0][:80]}...')

Sample batch -> images: 2, example caption: A bed with a pillow at the tip is neatly made. ...


In [None]:
# Build model and LightningModule
from src.models.caption_modelling import GemmaDinoImageCaptioner
from src.utils.training import LitCaptioner

# Use defaults from the module (Gemma-3 270M + DINOv3 Small).
# Freeze Gemma for a quick adapter-only sanity run.
model = GemmaDinoImageCaptioner(
    include_cls=True, include_registers=False, include_patches=False,
    freeze_gemma=True,
)

optimizer_cfg = {
    'lr': 1e-4,
    'weight_decay': 0.01,
    'betas': (0.9, 0.999),
    'eps': 1e-8,
}

lit = LitCaptioner(model, optimizer_cfg=optimizer_cfg)
print('Model ready. Trainable params:', sum(p.numel() for p in lit.parameters() if p.requires_grad))

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Model ready. Trainable params: 657408


In [5]:
# Trainer: 20 optimizer steps on GPU if available
assert torch.cuda.is_available(), 'This sanity check expects a GPU. Please enable CUDA.'
precision = '16-mixed' if torch.cuda.is_available() else '32-true'
out_dir = os.path.join(os.getcwd(), 'outputs', 'sanity_run')
os.makedirs(out_dir, exist_ok=True)

trainer = pl.Trainer(
    accelerator='gpu',
    devices=1,
    max_steps=20,
    log_every_n_steps=1,
    precision=precision,
    limit_val_batches=5,  # keep validation quick
    default_root_dir=out_dir,
)

trainer.fit(lit, train_dataloaders=train_loader, val_dataloaders=val_loader)
print('Training done. Checkpoints/logs ->', out_dir)

Using 16bit Automatic Mixed Precision (AMP)
💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/divyansh/Documents/SEM4/DL/Project/.venv/lib/python3.13/site-packages/pytorch_lightning/utilities/model_summary/model_summary.py:231: Precision 16-mixed is not supported by the model summary.  Estimated model size in MB will not be accurate. Using 32 bits instead.

  | Name  | Type                    | Params | Mode 
----------------------------------------------------------
0 | model | GemmaDinoImageCaptioner | 297 M  | train
----------------------------------------------------------
657 K     Trainable params
296 M     Non-trainable params
297 M     Total params
1,189.794 Total 

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Training: |          | 0/? [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=20` reached.


Training done. Checkpoints/logs -> /home/divyansh/Documents/SEM4/DL/Project/scripts/outputs/sanity_run


In [7]:
# Tiny generate sanity check on two images
images, refs = next(iter(val_loader))
preds = model.inference_generate(images[:2], max_new_tokens=30, temperature=0.0)
for i, (ref, pred) in enumerate(zip(refs[:2], preds), 1):
    print(f'[{i}] REF:  {ref} END_PRED')
    print(f'    PRED: {pred} END_PRED')
    print('-'*80)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[1] REF:  A child holding a flowered umbrella and petting a yak. END_PRED
    PRED:  10000000000000000000000000000 END_PRED
--------------------------------------------------------------------------------
[2] REF:  A narrow kitchen filled with appliances and cooking utensils. END_PRED
    PRED:  2019

\section{The Role of the Family in the Development of Children}
\section{Introduction}
\section{ END_PRED
--------------------------------------------------------------------------------
