# Populaire Architecturen

De evolutie van computer vision architecturen heeft een snelle ontwikkeling doorgemaakt van eenvoudige CNNs naar geavanceerde transformer-gebaseerde modellen. In dit notebook bespreken we de belangrijkste mijlpalen en hun bijdragen.

## Convolutional Neural Networks (CNNs)

### AlexNet (2012)

**AlexNet** markeerde het begin van het diepe leren tijdperk voor computer vision:

- **8 lagen**: 5 convolutionele + 3 fully-connected lagen
- **Doorbraak**: Won ImageNet 2012 met grote marge
- **Innovaties**: ReLU activatie, dropout, GPU training

```python
# AlexNet architectuur (vereenvoudigd)
class AlexNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(AlexNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )
```

### VGGNet (2014)

**VGGNet** introduceerde een uniforme architectuur met kleine 3×3 convoluties:

- **Diepte variaties**: VGG16 en VGG19 (16/19 lagen)
- **Uniformiteit**: Consistente 3×3 convoluties met stride 1
- **Populariteit**: Veel gebruikt als feature extractor

### ResNet (2015)

**Residual Networks** losten het probleem van verdwijnende gradienten op:

- **Residual blocks**: Skip connections om informatie door te geven
- **Diepe netwerken**: ResNet-152 met 152 lagen
- **Formule**: $y = F(x) + x$ (identity shortcut)

```python
# Residual Block
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                              stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                              stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, 
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
        
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # Residual connection
        out = F.relu(out)
        return out
```

### DenseNet (2017)

**Densely Connected Networks** maximaliseren informatie flow:

- **Dense blocks**: Elke laag verbonden met alle voorgaande lagen
- **Feature reuse**: Efficiënt gebruik van geleerde features
- **Parameter efficiency**: Minder parameters dan ResNet

## Vision Transformers

### Van Sequences naar Images

**Vision Transformers (ViT)** passen de transformer architectuur toe op beelden:

1. **Image patching**: Beeld opdelen in patches
2. **Linear projection**: Patches naar embeddings
3. **Position embeddings**: Ruimtelijke informatie toevoegen
4. **Transformer encoder**: Self-attention mechanismen

```python
# Vision Transformer (vereenvoudigd)
class VisionTransformer(nn.Module):
    def __init__(self, image_size=224, patch_size=16, num_classes=1000):
        super().__init__()
        
        # Image parameters
        self.patch_size = patch_size
        num_patches = (image_size // patch_size) ** 2
        
        # Patch embedding
        self.patch_embedding = nn.Conv2d(
            3, 768, kernel_size=patch_size, stride=patch_size
        )
        
        # Position embeddings
        self.pos_embedding = nn.Parameter(torch.randn(1, num_patches + 1, 768))
        self.cls_token = nn.Parameter(torch.randn(1, 1, 768))
        
        # Transformer encoder
        self.transformer = nn.Sequential(*[
            TransformerBlock(768, 12) for _ in range(12)
        ])
        
        # Classification head
        self.mlp_head = nn.Sequential(
            nn.LayerNorm(768),
            nn.Linear(768, num_classes)
        )
```

### Self-Attention Mechanisme

Het **attention mechanisme** is de kern van transformers:

$$Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$$

### Voordelen van Vision Transformers

- **Global context**: Directe toegang tot alle beeldregio's
- **Schaalbaarheid**: Prestaties verbeteren met meer data/compute
- **Flexibiliteit**: Gemakkelijk aan te passen voor verschillende taken

### Uitdagingen

- **Data hongering**: Vereisen grote datasets voor training
- **Computationele kost**: Quadratic complexity in sequence length
- **Lokale patronen**: Minder efficiënt voor lokale textuur detectie

## Hybride Architecturen

### Convolutional Transformers

Modellen die het beste van beide werelden combineren:

- **ConvNeXt**: Moderne CNN geïnspireerd door transformers
- **Swin Transformer**: Hierarchical vision transformer met shifted windows
- **DETR**: Detection transformer end-to-end object detection

### Architectuur Evolutie

De trend gaat richting:
- **Grotere modellen**: Meer parameters voor betere prestaties
- **Multi-modaliteit**: Combinatie van vision met tekst/audio
- **Efficiency**: Optimalisatie voor edge deployment
- **Self-supervision**: Vooraf trainen op grote ongeëtiketteerde datasets

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt

# Residual Block implementatie
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                              stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                              stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, 
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
        
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out

# Transformer Block (vereenvoudigd)
class TransformerBlock(nn.Module):
    def __init__(self, embed_dim, num_heads):
        super().__init__()
        self.attention = nn.MultiheadAttention(embed_dim, num_heads, batch_first=True)
        self.norm1 = nn.LayerNorm(embed_dim)
        self.norm2 = nn.LayerNorm(embed_dim)
        
        self.feed_forward = nn.Sequential(
            nn.Linear(embed_dim, 4 * embed_dim),
            nn.ReLU(),
            nn.Linear(4 * embed_dim, embed_dim)
        )
        
    def forward(self, x):
        # Self-attention
        attn_out, _ = self.attention(x, x, x)
        x = self.norm1(x + attn_out)
        
        # Feed-forward
        ff_out = self.feed_forward(x)
        x = self.norm2(x + ff_out)
        
        return x

# Architectuur vergelijking
def compare_architectures():
    """Vergelijk verschillende architecturen"""
    
    architectures = {
        'AlexNet': {
            'layers': 8,
            'params': '60M',
            'year': 2012,
            'innovation': 'Deep CNNs'
        },
        'VGG16': {
            'layers': 16,
            'params': '138M',
            'year': 2014,
            'innovation': '3x3 Convolutions'
        },
        'ResNet50': {
            'layers': 50,
            'params': '25M',
            'year': 2015,
            'innovation': 'Residual Learning'
        },
        'Vision Transformer': {
            'layers': 12,
            'params': '86M',
            'year': 2020,
            'innovation': 'Self-Attention'
        }
    }
    
    # Plot evolutie
    years = [arch['year'] for arch in architectures.values()]
    params = [arch['params'] for arch in architectures.values()]
    names = list(architectures.keys())
    
    plt.figure(figsize=(12, 6))
    bars = plt.bar(range(len(names)), [float(p.rstrip('M')) for p in params])
    plt.xlabel('Architectuur')
    plt.ylabel('Parameters (Millions)')
    plt.title('Evolutie van CNN Architecturen')
    plt.xticks(range(len(names)), names, rotation=45)
    
    # Voeg innovaties toe als labels
    for i, bar in enumerate(bars):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 1,
                architectures[names[i]]['innovation'],
                ha='center', va='bottom', rotation=45, fontsize=8)
    
    plt.tight_layout()
    plt.show()
    
    return architectures

# Vergelijk architecturen
architectures = compare_architectures()
print("\nArchitectuur Vergelijking:")
for name, specs in architectures.items():
    print(f"{name} ({specs['year']}): {specs['layers']} lagen, {specs['params']} parameters")
    print(f"  Innovatie: {specs['innovation']}")

## Transfer Learning met Pre-trained Modellen

Een van de belangrijkste voordelen van moderne architecturen is de mogelijkheid tot **transfer learning**:

In [None]:
import torchvision.models as models
import torch.nn as nn

def create_transfer_model(model_name, num_classes, freeze_backbone=True):
    """Maak een transfer learning model"""
    
    # Laad pre-trained model
    if model_name == 'resnet50':
        model = models.resnet50(pretrained=True)
    elif model_name == 'vgg16':
        model = models.vgg16(pretrained=True)
    elif model_name == 'densenet121':
        model = models.densenet121(pretrained=True)
    else:
        raise ValueError(f"Unknown model: {model_name}")
    
    # Freeze backbone parameters
    if freeze_backbone:
        for param in model.parameters():
            param.requires_grad = False
    
    # Vervang laatste laag voor nieuwe taak
    if 'resnet' in model_name or 'densenet' in model_name:
        num_ftrs = model.fc.in_features
        model.fc = nn.Linear(num_ftrs, num_classes)
    elif 'vgg' in model_name:
        num_ftrs = model.classifier[6].in_features
        model.classifier[6] = nn.Linear(num_ftrs, num_classes)
    
    return model

# Voorbeelden van transfer learning modellen
models_dict = {}
for model_name in ['resnet50', 'vgg16', 'densenet121']:
    model = create_transfer_model(model_name, num_classes=10)
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    
    models_dict[model_name] = {
        'total_params': total_params,
        'trainable_params': trainable_params,
        'compression_ratio': total_params / trainable_params
    }
    
    print(f"{model_name}:")
    print(f"  Total parameters: {total_params:,}")
    print(f"  Trainable parameters: {trainable_params:,}")
    print(f"  Compression ratio: {models_dict[model_name]['compression_ratio']:.1f}x")
    print()

# Visualisatie van parameter efficiency
model_names = list(models_dict.keys())
total_params = [models_dict[name]['total_params'] for name in model_names]
trainable_params = [models_dict[name]['trainable_params'] for name in model_names]

x = range(len(model_names))
width = 0.35

fig, ax = plt.subplots(figsize=(10, 6))
bars1 = ax.bar([i - width/2 for i in x], total_params, width, label='Total Parameters', alpha=0.7)
bars2 = ax.bar([i + width/2 for i in x], trainable_params, width, label='Trainable Parameters', alpha=0.7)

ax.set_xlabel('Model')
ax.set_ylabel('Parameters')
ax.set_title('Transfer Learning: Parameter Efficiency')
ax.set_xticks(x)
ax.set_xticklabels(model_names)
ax.legend()
ax.set_yscale('log')  # Log scale voor betere visualisatie

plt.tight_layout()
plt.show()