# üñºÔ∏è timm: PyTorch Image Models

**M·ª•c ti√™u:** Master timm library for computer vision

**N·ªôi dung:**
- timm overview & installation
- Model zoo & selection
- Transfer learning patterns
- Feature extraction
- Training with timm
- Advanced techniques

**Level:** Intermediate

**Why timm?**
- üèÜ State-of-the-art pretrained models
- üì¶ Unified interface
- ‚ö° Efficient implementations
- üîß Easy fine-tuning

---

In [None]:
# Installation
# !pip install timm

import timm
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms
from PIL import Image
import numpy as np

print(f"‚úÖ timm version: {timm.__version__}")
print(f"‚úÖ PyTorch version: {torch.__version__}")

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"‚úÖ Device: {device}")

---

## 1. Model Zoo

### Available Models

timm c√≥ **1000+ pretrained models**!

In [None]:
# List all available models
all_models = timm.list_models()
print(f"Total models: {len(all_models)}")
print(f"\nFirst 20 models:")
for model in all_models[:20]:
    print(f"  - {model}")

# Filter by pattern
efficientnet_models = timm.list_models('efficientnet*')
print(f"\nEfficientNet variants: {len(efficientnet_models)}")
print(efficientnet_models[:10])

# Filter pretrained models
resnet_pretrained = timm.list_models('resnet*', pretrained=True)
print(f"\nPretrained ResNets: {len(resnet_pretrained)}")

# Popular architectures
print("\nüèÜ Popular Architectures:")
architectures = [
    'resnet50',
    'efficientnet_b0',
    'vit_base_patch16_224',
    'convnext_tiny',
    'swin_tiny_patch4_window7_224',
    'mobilenetv3_large_100'
]
for arch in architectures:
    print(f"  ‚úì {arch}")

---

## 2. Creating Models

### Basic Usage

In [None]:
# Create model with pretrained weights
model = timm.create_model('resnet50', pretrained=True)

print(f"Model type: {type(model).__name__}")
print(f"Number of parameters: {sum(p.numel() for p in model.parameters()):,}")

# Model info
print(f"\nInput size: {model.default_cfg['input_size']}")
print(f"Mean: {model.default_cfg['mean']}")
print(f"Std: {model.default_cfg['std']}")
print(f"Num classes: {model.default_cfg['num_classes']}")

# Forward pass
x = torch.randn(1, 3, 224, 224)
with torch.no_grad():
    output = model(x)
print(f"\nOutput shape: {output.shape}")

### Custom Number of Classes

In [None]:
# Create model for custom dataset (e.g., 10 classes)
model_custom = timm.create_model(
    'efficientnet_b0',
    pretrained=True,
    num_classes=10  # Replace classification head
)

print(f"Custom model classes: {model_custom.num_classes}")

# Test forward pass
x = torch.randn(2, 3, 224, 224)
output = model_custom(x)
print(f"Output shape: {output.shape}  # (batch_size, num_classes)")

# Model without classification head (feature extractor)
model_features = timm.create_model(
    'resnet50',
    pretrained=True,
    num_classes=0,  # Remove head
    global_pool=''   # Remove global pooling
)

features = model_features(x)
print(f"Feature map shape: {features.shape}")

---

## 3. Transfer Learning Patterns

### Pattern 1: Fine-tune Last Layer Only

In [None]:
# Load pretrained model
model = timm.create_model('resnet34', pretrained=True, num_classes=5)

# Freeze all layers except classifier
for name, param in model.named_parameters():
    if 'fc' not in name:  # fc = final classification layer
        param.requires_grad = False

# Check which parameters are trainable
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"Trainable parameters: {trainable_params:,} / {total_params:,}")
print(f"Percentage: {100 * trainable_params / total_params:.2f}%")

# Setup optimizer (only for trainable parameters)
optimizer = torch.optim.Adam(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=0.001
)

print(f"\n‚úÖ Ready for training (fast, good for small datasets)")

### Pattern 2: Gradual Unfreezing

In [None]:
def unfreeze_model(model, num_layers_to_unfreeze=2):
    """
    Gradually unfreeze layers from end
    """
    # Get all layer names
    layers = list(model.named_parameters())
    
    # Unfreeze last N layers
    for name, param in layers[-num_layers_to_unfreeze:]:
        param.requires_grad = True
        print(f"Unfrozen: {name}")

# Example usage
model = timm.create_model('resnet18', pretrained=True, num_classes=5)

# Freeze all
for param in model.parameters():
    param.requires_grad = False

# Unfreeze last 2 layers
unfreeze_model(model, num_layers_to_unfreeze=4)

print(f"\nüí° Gradual unfreezing strategy:")
print("   1. Train classifier only (fast)")
print("   2. Unfreeze last few layers")
print("   3. Fine-tune with lower LR")
print("   4. Repeat as needed")

### Pattern 3: Differential Learning Rates

In [None]:
# Different LR for different layers
model = timm.create_model('efficientnet_b0', pretrained=True, num_classes=10)

# Split parameters into groups
backbone_params = []
classifier_params = []

for name, param in model.named_parameters():
    if 'classifier' in name:
        classifier_params.append(param)
    else:
        backbone_params.append(param)

# Different learning rates
optimizer = torch.optim.Adam([
    {'params': backbone_params, 'lr': 1e-5},      # Low LR for pretrained
    {'params': classifier_params, 'lr': 1e-3}     # High LR for new layer
])

print("‚úÖ Differential LR setup:")
print(f"   Backbone: {len(backbone_params)} param groups, LR=1e-5")
print(f"   Classifier: {len(classifier_params)} param groups, LR=1e-3")
print("\nüí° This prevents catastrophic forgetting!")

---

## 4. Feature Extraction

### Extract Features for Downstream Tasks

In [None]:
# Create feature extractor
feature_extractor = timm.create_model(
    'resnet50',
    pretrained=True,
    num_classes=0,  # Remove classifier
    global_pool='avg'  # Global average pooling
)
feature_extractor.eval()

# Extract features
images = torch.randn(4, 3, 224, 224)
with torch.no_grad():
    features = feature_extractor(images)

print(f"Feature shape: {features.shape}  # (batch, feature_dim)")
print(f"Feature dimension: {features.shape[1]}")

# Use features for:
# 1. Similarity search
# 2. Clustering
# 3. Classical ML (SVM, KNN)
# 4. Anomaly detection

# Example: Similarity between images
from torch.nn.functional import cosine_similarity

sim = cosine_similarity(features[0].unsqueeze(0), features[1].unsqueeze(0))
print(f"\nSimilarity between image 0 and 1: {sim.item():.4f}")

### Multi-Scale Features

In [None]:
# Extract features from multiple layers
model = timm.create_model('resnet50', pretrained=True, features_only=True)

# Forward pass
x = torch.randn(1, 3, 224, 224)
features = model(x)

print("Multi-scale features:")
for i, feat in enumerate(features):
    print(f"  Level {i}: {feat.shape}")

# Useful for:
# - Object detection (FPN)
# - Segmentation (U-Net style)
# - Multi-task learning

---

## 5. Data Preprocessing

### Using timm's Data Config

In [None]:
# Get model's preprocessing config
model = timm.create_model('efficientnet_b0', pretrained=True)
config = model.default_cfg

print("Model preprocessing config:")
print(f"  Input size: {config['input_size']}")
print(f"  Mean: {config['mean']}")
print(f"  Std: {config['std']}")
print(f"  Interpolation: {config.get('interpolation', 'bilinear')}")
print(f"  Crop pct: {config.get('crop_pct', 0.875)}")

# Create matching transform
from timm.data import create_transform

transform_train = create_transform(
    input_size=config['input_size'][-2:],
    is_training=True,
    auto_augment='rand-m9-mstd0.5-inc1',
    interpolation='bicubic',
    mean=config['mean'],
    std=config['std']
)

transform_val = create_transform(
    input_size=config['input_size'][-2:],
    is_training=False,
    interpolation='bicubic',
    mean=config['mean'],
    std=config['std'],
    crop_pct=config.get('crop_pct', 0.875)
)

print("\n‚úÖ Transforms created matching model's config")

---

## 6. Training Example

### Complete Training Loop

In [None]:
# Training function
def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()
    
    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

@torch.no_grad()
def val_epoch(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for inputs, labels in dataloader:
        inputs, labels = inputs.to(device), labels.to(device)
        
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        
        running_loss += loss.item() * inputs.size(0)
        _, predicted = outputs.max(1)
        total += labels.size(0)
        correct += predicted.eq(labels).sum().item()
    
    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

# Example training setup
print("üìã Training Setup Example:")
print("""
# 1. Create model
model = timm.create_model('efficientnet_b0', pretrained=True, num_classes=10)
model = model.to(device)

# 2. Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)

# 3. Training loop
for epoch in range(num_epochs):
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    val_loss, val_acc = val_epoch(model, val_loader, criterion, device)
    scheduler.step()
    
    print(f'Epoch {epoch}: Train Loss={train_loss:.4f}, Val Acc={val_acc:.4f}')
""")

---

## 7. Advanced Techniques

### Mixed Precision Training

In [None]:
# Mixed precision for faster training
from torch.cuda.amp import autocast, GradScaler

training_example = """
# Setup
scaler = GradScaler()

# Training loop
for inputs, labels in train_loader:
    optimizer.zero_grad()
    
    # Forward with autocast
    with autocast():
        outputs = model(inputs)
        loss = criterion(outputs, labels)
    
    # Backward with scaler
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
"""

print("‚ö° Mixed Precision Training:")
print(training_example)
print("\nüí° Benefits: 2-3x speedup, reduced memory")

### Model EMA (Exponential Moving Average)

In [None]:
# EMA for better generalization
ema_example = """
from timm.utils import ModelEmaV2

# Create EMA model
model = timm.create_model('resnet50', pretrained=True, num_classes=10)
model_ema = ModelEmaV2(model, decay=0.9999)

# Training loop
for inputs, labels in train_loader:
    # Normal training step
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    
    # Update EMA model
    model_ema.update(model)

# Use EMA model for inference
ema_model = model_ema.module
ema_model.eval()
"""

print("üìà Model EMA:")
print(ema_example)
print("\nüí° EMA often gives 0.1-0.5% accuracy boost")

---

## üéØ Key Takeaways

### timm Essentials

1. **Model Creation**
   ```python
   model = timm.create_model('resnet50', pretrained=True, num_classes=10)
   ```

2. **Feature Extraction**
   ```python
   model = timm.create_model('resnet50', pretrained=True, num_classes=0)
   ```

3. **Transfer Learning**
   - Freeze backbone, train classifier
   - Gradual unfreezing
   - Differential learning rates

### Best Practices

‚úÖ **DO:**
- Use model's default preprocessing config
- Start with frozen backbone
- Use differential LRs
- Apply mixed precision training
- Use EMA for better results

‚ùå **DON'T:**
- Ignore model's preprocessing config
- Use same LR for all layers
- Train without augmentation
- Forget to set model.eval() for inference

### Common Workflows

| Task | Recommended Models |
|------|-------------------|
| Image Classification | EfficientNet, ConvNeXt, ViT |
| Feature Extraction | ResNet50, EfficientNet-B0 |
| Real-time | MobileNet, EfficientNet-Lite |
| High Accuracy | ConvNeXt-Large, Swin-Large |

### Quick Reference

```python
# List models
timm.list_models('*efficientnet*', pretrained=True)

# Create & customize
model = timm.create_model('resnet50', pretrained=True, num_classes=10)

# Get config
config = model.default_cfg

# Feature extraction
features = timm.create_model('resnet50', pretrained=True, num_classes=0)

# Multi-scale features
features = timm.create_model('resnet50', features_only=True)
```

---

**Next:** NLP Fundamentals