# CLIP Model Evaluation on Google Colab (FASTEST VERSION)

**Downloads images from Kaggle - no Drive needed!**

‚úÖ **Downloads COCO val images from Kaggle (super fast!)**
‚úÖ **Works with base, batchnorm, dropout models**
‚úÖ **Progress bars & saves to Drive**

---

## üìã Upload to Google Drive:

```
My Drive/elec475_lab4/
  models/          ‚Üê Your trained models
    *.pth
  data/
    text_embeddings_val.pt  ‚Üê Your embeddings
```

**No need to upload images!** Downloads from Kaggle automatically.

---

## 1. Setup & Mount Drive

In [1]:
from google.colab import drive
drive.mount('/content/drive')

import os
os.chdir('/content')

import torch
print("=" * 80)
print("GPU CHECK")
print("=" * 80)
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
print("=" * 80)

Mounted at /content/drive
GPU CHECK
CUDA: True
GPU: NVIDIA A100-SXM4-40GB
Memory: 42.5 GB


## 2. Install Dependencies

In [2]:
!pip install -q transformers torch torchvision tqdm pillow matplotlib pandas opendatasets
print("‚úì Dependencies installed")

‚úì Dependencies installed


## 3. Download COCO Images from Kaggle (FAST!)

In [5]:
%%time

import zipfile
from pathlib import Path
import requests
from tqdm.auto import tqdm

# Direct download of val2014 images from COCO
val_zip_url = "http://images.cocodataset.org/zips/val2014.zip"
download_dir = Path("/content/coco_val")
zip_file = download_dir / "val2014.zip"
val_images_dir = download_dir / "val2014"

download_dir.mkdir(exist_ok=True)

# Download val2014.zip if not already downloaded
if not val_images_dir.exists():
    if not zip_file.exists():
        print("Downloading COCO val2014 images...")
        print("Size: ~6.6 GB (this takes ~3-5 minutes)")

        # Stream download with progress bar
        response = requests.get(val_zip_url, stream=True)
        total_size = int(response.headers.get('content-length', 0))

        with open(zip_file, 'wb') as f, tqdm(
            total=total_size, unit='B', unit_scale=True, desc="Downloading"
        ) as pbar:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
                pbar.update(len(chunk))

        print("‚úì Downloaded")

    # Extract
    print(f"\nExtracting images...")
    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall(download_dir)
    print("‚úì Extracted")

    # Clean up zip file to save space
    zip_file.unlink()
    print("‚úì Cleaned up zip file")
else:
    print("‚úì val2014 images already downloaded")

# Set path
VAL_IMAGES_LOCAL = val_images_dir

# Verify
if VAL_IMAGES_LOCAL.exists():
    img_count = len(list(VAL_IMAGES_LOCAL.glob("*.jpg")))
    print(f"\n‚úì Found {img_count} validation images")
    print(f"üìÅ Path: {VAL_IMAGES_LOCAL}")
else:
    print(f"‚ùå ERROR: {VAL_IMAGES_LOCAL} not found")

Downloading COCO val2014 images...
Size: ~6.6 GB (this takes ~3-5 minutes)


Downloading:   0%|          | 0.00/6.65G [00:00<?, ?B/s]

‚úì Downloaded

Extracting images...
‚úì Extracted
‚úì Cleaned up zip file

‚úì Found 40504 validation images
üìÅ Path: /content/coco_val/val2014
CPU times: user 22.1 s, sys: 15.2 s, total: 37.3 s
Wall time: 8min 35s


## 4. Configure Paths

In [6]:
import shutil

# Drive paths
DRIVE_ROOT = Path("/content/drive/MyDrive/elec475_lab4")
MODELS_DIR = DRIVE_ROOT / "models"
DATA_DIR = DRIVE_ROOT / "data"
VAL_EMBEDDINGS_DRIVE = DATA_DIR / "text_embeddings_val.pt"

# Local paths
LOCAL_DATA = Path("/content/data")
LOCAL_DATA.mkdir(exist_ok=True)
VAL_EMBEDDINGS_LOCAL = LOCAL_DATA / "text_embeddings_val.pt"

# Results
RESULTS_DIR = DRIVE_ROOT / "results"
RESULTS_DIR.mkdir(exist_ok=True, parents=True)

print("=" * 80)
print("PATH CONFIGURATION")
print("=" * 80)
print(f"Models: {MODELS_DIR}")
print(f"Images: {VAL_IMAGES_LOCAL} (from Kaggle)")
print(f"Results: {RESULTS_DIR}")
print("=" * 80)

PATH CONFIGURATION
Models: /content/drive/MyDrive/elec475_lab4/models
Images: /content/coco_val/val2014 (from Kaggle)
Results: /content/drive/MyDrive/elec475_lab4/results


## 5. Copy Embeddings from Drive

In [7]:
if not VAL_EMBEDDINGS_LOCAL.exists():
    print("Copying embeddings from Drive...")
    shutil.copy(VAL_EMBEDDINGS_DRIVE, VAL_EMBEDDINGS_LOCAL)
    print("‚úì Copied")
else:
    print("‚úì Embeddings already local")

print(f"Size: {VAL_EMBEDDINGS_LOCAL.stat().st_size / 1e6:.1f} MB")

Copying embeddings from Drive...
‚úì Copied
Size: 433.0 MB


## 6. Clone Repository

In [8]:
if os.path.exists('475_ML-CV_Labs'):
    shutil.rmtree('475_ML-CV_Labs')

!git clone https://github.com/Jcub05/475_ML-CV_Labs.git
os.chdir('475_ML-CV_Labs/Lab4')
print(f"‚úì Directory: {os.getcwd()}")

Cloning into '475_ML-CV_Labs'...
remote: Enumerating objects: 431, done.[K
remote: Counting objects: 100% (185/185), done.[K
remote: Compressing objects: 100% (136/136), done.[K
remote: Total 431 (delta 106), reused 121 (delta 48), pack-reused 246 (from 1)[K
Receiving objects: 100% (431/431), 78.86 MiB | 17.42 MiB/s, done.
Resolving deltas: 100% (182/182), done.
‚úì Directory: /content/475_ML-CV_Labs/Lab4


## 7. Find Models

In [28]:
model_files = sorted(MODELS_DIR.glob("*.pth"))

print("\n" + "=" * 80)
print(f"FOUND {len(model_files)} MODEL(S)")
print("=" * 80)
for i, mf in enumerate(model_files, 1):
    print(f"{i}. {mf.name} ({mf.stat().st_size / 1e6:.1f} MB)")
print("=" * 80)


FOUND 2 MODEL(S)
1. best_model_batch_norm.pth (598.3 MB)
2. best_model_dropout.pth (598.3 MB)


## 8. Load Model & Data

In [29]:
# Replace Cell 8 with this corrected version:

from model import CLIPFineTuneModel
from model_modified import CLIPImageEncoderModified, CLIPFineTuneModelModified
from transformers import CLIPTextModel, CLIPTokenizer

import torch.nn as nn
import torch.nn.functional as F
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def detect_model_type(state_dict):
    """Detect architecture from checkpoint keys."""
    keys = list(state_dict.keys())

    has_projection = any('image_encoder.projection.' in k for k in keys)
    has_projection_head = any('image_encoder.projection_head.' in k for k in keys)

    if has_projection and not has_projection_head:
        # Check for BatchNorm (running_mean/running_var)
        if any('image_encoder.projection.1.running_mean' in k for k in keys):
            return 'batchnorm'
        else:
            return 'dropout'
    else:
        return 'base'

def load_model(checkpoint_path):
    """Load model with correct architecture."""
    print(f"\nLoading: {checkpoint_path.name}")

    checkpoint = torch.load(checkpoint_path, map_location=device)
    state_dict = checkpoint['model_state_dict'] if 'model_state_dict' in checkpoint else checkpoint

    model_type = detect_model_type(state_dict)
    print(f"  Detected: {model_type}")

    if model_type == 'base':
        model = CLIPFineTuneModel(
            embed_dim=512,
            pretrained_resnet=True,
            clip_model_name="openai/clip-vit-base-patch32",
            freeze_text_encoder=True
        ).to(device)
    else:
        # For modified models
        text_encoder = CLIPTextModel.from_pretrained("openai/clip-vit-base-patch32").to(device)
        tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")

        if model_type == 'batchnorm':
            image_encoder = CLIPImageEncoderModified(
                embed_dim=512,
                use_batchnorm=True,
                use_dropout=False
            ).to(device)
        else:  # dropout
            image_encoder = CLIPImageEncoderModified(
                embed_dim=512,
                use_batchnorm=False,
                use_dropout=True,
                dropout_rate=0.1
            ).to(device)

        model = CLIPFineTuneModelModified(
            image_encoder=image_encoder,
            text_encoder=text_encoder,
            tokenizer=tokenizer
        ).to(device)

    model.load_state_dict(state_dict, strict=True)
    model.eval()
    print("  ‚úì Loaded")
    return model, model_type  # Return model type too!

print("‚úì Model loading ready")

‚úì Model loading ready


In [30]:
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=(0.48145466, 0.4578275, 0.40821073),
        std=(0.26862954, 0.26130258, 0.27577711)
    )
])

val_dataset = ValidationDataset(
    images_dir=VAL_IMAGES_LOCAL,
    embeddings_file=VAL_EMBEDDINGS_LOCAL,
    transform=transform
)

val_loader = DataLoader(
    val_dataset,
    batch_size=128,
    shuffle=False,
    num_workers=2,
    pin_memory=True
)

print(f"\n‚úì Dataloader ready ({len(val_dataset)} samples)")

Loading embeddings...
Building dataset...


Checking:   0%|          | 0/40504 [00:00<?, ?it/s]

‚úì 40504 samples

‚úì Dataloader ready (40504 samples)


## 9. Metrics & Evaluation

In [33]:
%%time
import json

all_results = {}

print(f"\n{'='*80}\nEVALUATING {len(model_files)} MODEL(S)\n{'='*80}\n")

for idx, model_file in enumerate(tqdm(model_files, desc="Overall"), 1):
    model_name = model_file.stem
    print(f"\n[{idx}/{len(model_files)}] {model_name}\n{'-'*80}")

    try:
        model, model_type = load_model(model_file)  # ‚Üê UNPACK THE TUPLE!
        metrics = evaluate_model(model, val_loader, model_type, model_name=model_name)  # ‚Üê Pass model_type!
        all_results[model_name] = metrics

        with open(RESULTS_DIR / f"{model_name}_metrics.json", 'w') as f:
            json.dump(metrics, f, indent=2)
        print(f"üíæ {model_name}_metrics.json")

        del model
        torch.cuda.empty_cache()
    except Exception as e:
        print(f"‚ùå ERROR: {e}\n")

print(f"\n{'='*80}\n‚úÖ DONE ({len(all_results)} successful)\n{'='*80}")


EVALUATING 2 MODEL(S)



Overall:   0%|          | 0/2 [00:00<?, ?it/s]


[1/2] best_model_batch_norm
--------------------------------------------------------------------------------

Loading: best_model_batch_norm.pth
  Detected: batchnorm
  ‚úì Loaded
‚ùå ERROR: evaluate_model() got multiple values for argument 'model_name'


[2/2] best_model_dropout
--------------------------------------------------------------------------------

Loading: best_model_dropout.pth
  Detected: dropout
  ‚úì Loaded
‚ùå ERROR: evaluate_model() got multiple values for argument 'model_name'


‚úÖ DONE (0 successful)
CPU times: user 2.46 s, sys: 945 ms, total: 3.4 s
Wall time: 6.43 s


## 10. Evaluate ALL Models

In [32]:
%%time
import json

all_results = {}

print(f"\n{'='*80}\nEVALUATING {len(model_files)} MODEL(S)\n{'='*80}\n")

for idx, model_file in enumerate(tqdm(model_files, desc="Overall"), 1):
    model_name = model_file.stem
    print(f"\n[{idx}/{len(model_files)}] {model_name}\n{'-'*80}")

    try:
        model, model_type = load_model(model_file)  # ‚Üê UNPACK THE TUPLE!
        metrics = evaluate_model(model, val_loader, model_type, model_name=model_name)  # ‚Üê Pass model_type!
        all_results[model_name] = metrics

        with open(RESULTS_DIR / f"{model_name}_metrics.json", 'w') as f:
            json.dump(metrics, f, indent=2)
        print(f"üíæ {model_name}_metrics.json")

        del model
        torch.cuda.empty_cache()
    except Exception as e:
        print(f"‚ùå ERROR: {e}\n")

print(f"\n{'='*80}\n‚úÖ DONE ({len(all_results)} successful)\n{'='*80}")


EVALUATING 2 MODEL(S)



Overall:   0%|          | 0/2 [00:00<?, ?it/s]


[1/2] best_model_batch_norm
--------------------------------------------------------------------------------

Loading: best_model_batch_norm.pth
  Detected: batchnorm
  ‚úì Loaded
‚ùå ERROR: evaluate_model() got multiple values for argument 'model_name'


[2/2] best_model_dropout
--------------------------------------------------------------------------------

Loading: best_model_dropout.pth
  Detected: dropout
  ‚úì Loaded
‚ùå ERROR: evaluate_model() got multiple values for argument 'model_name'


‚úÖ DONE (0 successful)
CPU times: user 2.43 s, sys: 937 ms, total: 3.37 s
Wall time: 6.44 s


## 11. Summary

In [14]:
import pandas as pd

summary_data = []
for model_name, metrics in all_results.items():
    summary_data.append({
        'Model': model_name,
        'I2T R@1': f"{metrics['img2txt_r1']:.2f}%",
        'I2T R@5': f"{metrics['img2txt_r5']:.2f}%",
        'I2T R@10': f"{metrics['img2txt_r10']:.2f}%",
        'T2I R@1': f"{metrics['txt2img_r1']:.2f}%",
        'T2I R@5': f"{metrics['txt2img_r5']:.2f}%",
        'T2I R@10': f"{metrics['txt2img_r10']:.2f}%",
        'Avg': f"{metrics['avg_recall']:.2f}%"
    })

summary_df = pd.DataFrame(summary_data)
print("\n" + "="*80)
print("SUMMARY")
print("="*80)
print(summary_df.to_string(index=False))
print("="*80)

csv_path = RESULTS_DIR / "evaluation_results.csv"
summary_df.to_csv(csv_path, index=False)
json_path = RESULTS_DIR / "detailed_metrics.json"
with open(json_path, 'w') as f:
    json.dump(all_results, f, indent=2)

print(f"\nüíæ Saved to Drive: {RESULTS_DIR}")
print("="*80)


SUMMARY
                Model I2T R@1 I2T R@5 I2T R@10 T2I R@1 T2I R@5 T2I R@10   Avg
best_model_batch_norm   0.00%   0.01%    0.02%   0.00%   0.00%    0.02% 0.01%
   best_model_dropout   0.00%   0.01%    0.01%   0.00%   0.01%    0.01% 0.01%

üíæ Saved to Drive: /content/drive/MyDrive/elec475_lab4/results
