# Kapitola 32: Dropout - Jak zabranit preuceni neuronove site

## Cile kapitoly
- Pochopit problem preuceni (overfitting)
- Naucit se pouzivat Dropout jako regularizacni techniku
- Vizualizovat rozdil mezi modelem s a bez Dropoutu
- Experimentovat s parametrem dropout rate

## Predpoklady
- PyTorch zaklady (kapitola 31)
- Zpetna propagace a ucici krivka

## 1. Co je preuceni (Overfitting)?

Predstavte si studenta, ktery se uci na zkousku:

- **Dobre uceni**: Student pochopi principy a dokaze resit i nove ulohy
- **Preuceni**: Student se nauci odpovedi nazpamet, ale nerozumi principum

### Preucena neuronova sit:
- Ma **vysokou presnost na trenovacich datech** (viděl je mnohokrát)
- Ma **nizkou presnost na novych datech** (nedokaze generalizovat)

Je to jako kdyz si sit "zapamatovala" trenovaci data vcetne sumu, misto toho aby se naucila obecne vzory.

In [None]:
# Instalace knihoven
!pip install torch scikit-learn numpy matplotlib -q

In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split

print(f"PyTorch verze: {torch.__version__}")
print("Knihovny uspesne nacteny!")

## 2. Co je Dropout?

**Dropout** je regularizacni technika, ktera behem trenovani **nahodne vypina neurony**.

### Analogie: Tym expertu

Predstavte si tym 5 expertu:
- Bez Dropoutu: Vsichni se spolehou na experta #3 (nejlepsi), ostatni zleniví
- S Dropoutem: Expert #3 muze "onemocnet" -> ostatni se musi take naucit jeho praci

**Vysledek**: Tym je robustnejsi a nespolcha na jednotlivce!

### Jak to funguje?
1. Behem **trenovani**: Nahodne vypneme `p%` neuronu v kazdem kroku
2. Behem **testovani**: Pouzijeme vsechny neurony (Dropout je vypnuty)

In [None]:
# Vizualizace Dropout
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Bez Dropoutu
ax1 = axes[0]
neurons_positions = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 1)]
for i, (x, y) in enumerate(neurons_positions):
    circle = plt.Circle((x, y), 0.15, color='#3498db', ec='black', linewidth=2)
    ax1.add_patch(circle)
ax1.set_xlim(-0.5, 2.5)
ax1.set_ylim(-0.5, 2.5)
ax1.set_aspect('equal')
ax1.axis('off')
ax1.set_title('Bez Dropoutu - vsechny neurony aktivni', fontsize=14)

# S Dropoutem
ax2 = axes[1]
dropout_mask = [True, False, True, True, False, True, True]  # False = vypnuty
for i, ((x, y), active) in enumerate(zip(neurons_positions, dropout_mask)):
    color = '#3498db' if active else '#e74c3c'
    alpha = 1.0 if active else 0.3
    circle = plt.Circle((x, y), 0.15, color=color, ec='black', linewidth=2, alpha=alpha)
    ax2.add_patch(circle)
    if not active:
        ax2.plot([x-0.1, x+0.1], [y-0.1, y+0.1], 'r-', linewidth=3)
        ax2.plot([x-0.1, x+0.1], [y+0.1, y-0.1], 'r-', linewidth=3)
ax2.set_xlim(-0.5, 2.5)
ax2.set_ylim(-0.5, 2.5)
ax2.set_aspect('equal')
ax2.axis('off')
ax2.set_title('S Dropoutem (p=0.3) - nektere neurony vypnuty', fontsize=14)

plt.tight_layout()
plt.show()

print("Modre = aktivni neurony")
print("Cervene s X = vypnute neurony (dropout)")

## 3. Priprava dat - Dataset "Dva mesice" s sumem

In [None]:
# Vytvoreni datasetu s vetsim sumem (zamerně obtiznejsi)
X, y = make_moons(n_samples=500, noise=0.35, random_state=42)

# Rozdeleni na trenovaci a testovaci data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Konverze na PyTorch tenzory
X_train_t = torch.from_numpy(X_train).float()
X_test_t = torch.from_numpy(X_test).float()
y_train_t = torch.from_numpy(y_train).float().view(-1, 1)
y_test_t = torch.from_numpy(y_test).float().view(-1, 1)

# Vizualizace
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', edgecolors='black', alpha=0.7)
plt.title(f'Trenovaci data ({len(X_train)} vzorku)', fontsize=14)
plt.xlabel('X1')
plt.ylabel('X2')

plt.subplot(1, 2, 2)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', edgecolors='black', alpha=0.7)
plt.title(f'Testovaci data ({len(X_test)} vzorku)', fontsize=14)
plt.xlabel('X1')
plt.ylabel('X2')

plt.tight_layout()
plt.show()

print(f"Trenovaci data: {len(X_train)} vzorku")
print(f"Testovaci data: {len(X_test)} vzorku")

## 4. Definice site - S dropoutem a bez

Vytvorime **zamerne prilis velkou sit**, ktera bude mit tendenci se preucit.

In [None]:
class MoonNet(nn.Module):
    """Neuronova sit pro klasifikaci - s moznosti Dropoutu"""
    
    def __init__(self, use_dropout=False, dropout_rate=0.5):
        super(MoonNet, self).__init__()
        self.use_dropout = use_dropout
        
        # Zamerne prilis velka architektura (128 neuronu)
        self.layer1 = nn.Linear(2, 128)
        self.layer2 = nn.Linear(128, 128)
        self.layer3 = nn.Linear(128, 1)
        
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=dropout_rate)  # p = pravdepodobnost vypnuti
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        # Vrstva 1
        x = self.relu(self.layer1(x))
        if self.use_dropout:
            x = self.dropout(x)  # Dropout po prvni vrstve
        
        # Vrstva 2
        x = self.relu(self.layer2(x))
        if self.use_dropout:
            x = self.dropout(x)  # Dropout po druhe vrstve
        
        # Vystupni vrstva
        x = self.sigmoid(self.layer3(x))
        return x

# Ukazka architektury
model_example = MoonNet(use_dropout=True, dropout_rate=0.5)
print("Architektura site:")
print(model_example)

## 5. Funkce pro trenovani a vyhodnoceni

In [None]:
def train_and_evaluate(model, model_name, epochs=3000, verbose=True):
    """Trenuje model a vraci historii trenovani a testovani"""
    
    criterion = nn.BCELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    
    train_losses = []
    test_losses = []
    train_accs = []
    test_accs = []
    
    for epoch in range(epochs):
        # ===== TRENOVANI =====
        model.train()  # Zapneme trenovaci rezim (Dropout aktivni)
        outputs = model(X_train_t)
        loss = criterion(outputs, y_train_t)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # ===== VYHODNOCENI =====
        model.eval()  # Vypneme trenovaci rezim (Dropout NEaktivni)
        with torch.no_grad():
            # Trenovaci presnost
            train_pred = torch.round(model(X_train_t))
            train_acc = (train_pred.eq(y_train_t).sum() / len(y_train_t)).item()
            
            # Testovaci presnost
            test_pred = torch.round(model(X_test_t))
            test_acc = (test_pred.eq(y_test_t).sum() / len(y_test_t)).item()
            
            # Testovaci loss
            test_loss = criterion(model(X_test_t), y_test_t)
        
        train_losses.append(loss.item())
        test_losses.append(test_loss.item())
        train_accs.append(train_acc)
        test_accs.append(test_acc)
        
        if verbose and (epoch + 1) % 500 == 0:
            print(f'Epocha {epoch+1:4d}/{epochs} | '
                  f'Train Loss: {loss.item():.4f} | Test Loss: {test_loss.item():.4f} | '
                  f'Train Acc: {train_acc:.2%} | Test Acc: {test_acc:.2%}')
    
    return {
        'train_losses': train_losses,
        'test_losses': test_losses,
        'train_accs': train_accs,
        'test_accs': test_accs,
        'final_train_acc': train_accs[-1],
        'final_test_acc': test_accs[-1]
    }

## 6. Experiment: Model BEZ Dropoutu

In [None]:
print("="*60)
print("TRENOVANI MODELU BEZ DROPOUTU")
print("="*60)

torch.manual_seed(42)
model_bez_dropout = MoonNet(use_dropout=False)
results_bez = train_and_evaluate(model_bez_dropout, "Bez Dropoutu")

print("\n" + "-"*40)
print(f"Finalni presnost na TRENOVACICH datech: {results_bez['final_train_acc']:.2%}")
print(f"Finalni presnost na TESTOVACICH datech: {results_bez['final_test_acc']:.2%}")
print(f"Rozdil (gap): {results_bez['final_train_acc'] - results_bez['final_test_acc']:.2%}")

## 7. Experiment: Model S Dropoutem

In [None]:
print("="*60)
print("TRENOVANI MODELU S DROPOUTEM (p=0.5)")
print("="*60)

torch.manual_seed(42)
model_s_dropout = MoonNet(use_dropout=True, dropout_rate=0.5)
results_s = train_and_evaluate(model_s_dropout, "S Dropoutem")

print("\n" + "-"*40)
print(f"Finalni presnost na TRENOVACICH datech: {results_s['final_train_acc']:.2%}")
print(f"Finalni presnost na TESTOVACICH datech: {results_s['final_test_acc']:.2%}")
print(f"Rozdil (gap): {results_s['final_train_acc'] - results_s['final_test_acc']:.2%}")

## 8. Porovnani vysledku

In [None]:
# Porovnani ucicich krivek
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Loss - bez dropoutu
axes[0, 0].plot(results_bez['train_losses'], 'b-', label='Train Loss', alpha=0.7)
axes[0, 0].plot(results_bez['test_losses'], 'r-', label='Test Loss', alpha=0.7)
axes[0, 0].set_title('BEZ Dropoutu - Loss', fontsize=14)
axes[0, 0].set_xlabel('Epocha')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Loss - s dropoutem
axes[0, 1].plot(results_s['train_losses'], 'b-', label='Train Loss', alpha=0.7)
axes[0, 1].plot(results_s['test_losses'], 'r-', label='Test Loss', alpha=0.7)
axes[0, 1].set_title('S Dropoutem - Loss', fontsize=14)
axes[0, 1].set_xlabel('Epocha')
axes[0, 1].set_ylabel('Loss')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Accuracy - bez dropoutu
axes[1, 0].plot(results_bez['train_accs'], 'b-', label='Train Accuracy', alpha=0.7)
axes[1, 0].plot(results_bez['test_accs'], 'r-', label='Test Accuracy', alpha=0.7)
axes[1, 0].set_title('BEZ Dropoutu - Presnost', fontsize=14)
axes[1, 0].set_xlabel('Epocha')
axes[1, 0].set_ylabel('Presnost')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Accuracy - s dropoutem
axes[1, 1].plot(results_s['train_accs'], 'b-', label='Train Accuracy', alpha=0.7)
axes[1, 1].plot(results_s['test_accs'], 'r-', label='Test Accuracy', alpha=0.7)
axes[1, 1].set_title('S Dropoutem - Presnost', fontsize=14)
axes[1, 1].set_xlabel('Epocha')
axes[1, 1].set_ylabel('Presnost')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nPOZOROVANI:")
print("- BEZ Dropoutu: Velka mezera mezi train a test (preuceni!)")
print("- S Dropoutem: Mensi mezera, lepsi generalizace")

In [None]:
# Souhrn vysledku
print("\n" + "="*60)
print("SOUHRN VYSLEDKU")
print("="*60)
print(f"{'Model':<20} | {'Train Acc':>12} | {'Test Acc':>12} | {'Gap':>12}")
print("-"*60)

gap_bez = results_bez['final_train_acc'] - results_bez['final_test_acc']
gap_s = results_s['final_train_acc'] - results_s['final_test_acc']

print(f"{'Bez Dropoutu':<20} | {results_bez['final_train_acc']:>11.2%} | {results_bez['final_test_acc']:>11.2%} | {gap_bez:>11.2%}")
print(f"{'S Dropoutem (0.5)':<20} | {results_s['final_train_acc']:>11.2%} | {results_s['final_test_acc']:>11.2%} | {gap_s:>11.2%}")
print("-"*60)

print(f"\nZlepseni testovaci presnosti: {results_s['final_test_acc'] - results_bez['final_test_acc']:.2%}")

## 9. Vizualizace rozhodovaci hranice

In [None]:
def plot_decision_boundary(model, X, y, title):
    """Vizualizuje rozhodovaci hranici modelu"""
    model.eval()
    
    # Vytvoreni mrizky
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    
    # Predikce na mrizce
    grid = torch.from_numpy(np.c_[xx.ravel(), yy.ravel()]).float()
    with torch.no_grad():
        Z = model(grid).numpy()
    Z = Z.reshape(xx.shape)
    
    # Vizualizace
    plt.contourf(xx, yy, Z, levels=50, cmap='coolwarm', alpha=0.6)
    plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm', edgecolors='black', alpha=0.8)
    plt.title(title, fontsize=14)
    plt.xlabel('X1')
    plt.ylabel('X2')

# Porovnani rozhodovacich hranic
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

plt.subplot(1, 2, 1)
plot_decision_boundary(model_bez_dropout, X_test, y_test, 
                       f'BEZ Dropoutu\nTest Acc: {results_bez["final_test_acc"]:.1%}')

plt.subplot(1, 2, 2)
plot_decision_boundary(model_s_dropout, X_test, y_test,
                       f'S Dropoutem (p=0.5)\nTest Acc: {results_s["final_test_acc"]:.1%}')

plt.tight_layout()
plt.show()

print("\nVsimnete si:")
print("- BEZ Dropoutu: Hranice je 'kostrbata', snazi se obkreslit kazdy bod")
print("- S Dropoutem: Hranice je hladsi, lepe vystihuje obecny tvar")

## 10. Experiment: Ruzne hodnoty Dropout rate

In [None]:
# Testovani ruznych dropout rates
dropout_rates = [0.0, 0.2, 0.5, 0.7, 0.9]
results_all = {}

print("Testovani ruznych hodnot dropout rate...")
print("="*60)

for rate in dropout_rates:
    torch.manual_seed(42)
    model = MoonNet(use_dropout=(rate > 0), dropout_rate=rate)
    results = train_and_evaluate(model, f"Dropout={rate}", epochs=2000, verbose=False)
    results_all[rate] = results
    print(f"Dropout {rate}: Train Acc = {results['final_train_acc']:.2%}, "
          f"Test Acc = {results['final_test_acc']:.2%}")

# Vizualizace
plt.figure(figsize=(10, 6))

train_accs = [results_all[r]['final_train_acc'] for r in dropout_rates]
test_accs = [results_all[r]['final_test_acc'] for r in dropout_rates]

x = np.arange(len(dropout_rates))
width = 0.35

plt.bar(x - width/2, train_accs, width, label='Train Accuracy', color='#3498db')
plt.bar(x + width/2, test_accs, width, label='Test Accuracy', color='#e74c3c')

plt.xlabel('Dropout Rate', fontsize=12)
plt.ylabel('Presnost', fontsize=12)
plt.title('Vliv Dropout Rate na presnost', fontsize=14)
plt.xticks(x, [f'p={r}' for r in dropout_rates])
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
plt.ylim(0.5, 1.0)
plt.show()

print("\nPozorovani:")
print("- p=0.0: Velky gap mezi train a test (preuceni)")
print("- p=0.5: Optimalni - maly gap, dobra testovaci presnost")
print("- p=0.9: Prilis vysoke - model se neuci dostatecne")

## 11. Dulezite: train() vs eval() rezimy

In [None]:
# Demonstrace rozdilu mezi train() a eval()
model_demo = MoonNet(use_dropout=True, dropout_rate=0.5)

# Vstup pro testovani
test_input = torch.tensor([[0.5, 0.5]])

print("Demonstrace rozdilu mezi train() a eval() rezimy:")
print("="*60)

# Train rezim (Dropout aktivni)
model_demo.train()
print("\nTrain rezim (Dropout AKTIVNI):")
outputs_train = []
for i in range(5):
    with torch.no_grad():
        out = model_demo(test_input).item()
        outputs_train.append(out)
        print(f"  Pokus {i+1}: {out:.4f}")
print(f"  -> Vystupy se LISI (nahodne neurony vypnute)")

# Eval rezim (Dropout neaktivni)
model_demo.eval()
print("\nEval rezim (Dropout NEAKTIVNI):")
outputs_eval = []
for i in range(5):
    with torch.no_grad():
        out = model_demo(test_input).item()
        outputs_eval.append(out)
        print(f"  Pokus {i+1}: {out:.4f}")
print(f"  -> Vystupy jsou STEJNE (vsechny neurony aktivni)")

## 12. Shrnuti kapitoly

### Co jsme se naucili:

1. **Preuceni (Overfitting)** - model se nauci trenovaci data "nazpamet" a nedokaze generalizovat

2. **Dropout** je regularizacni technika:
   - Behem trenovani nahodne vypina neurony
   - Nutí sit rozdelovat znalosti mezi vice neuronu
   - Vysledek: Robustnejsi model s lepsi generalizaci

3. **Pouziti v PyTorch**:
   - `nn.Dropout(p=0.5)` - vytvoreni vrstvy
   - `model.train()` - aktivuje Dropout
   - `model.eval()` - deaktivuje Dropout

4. **Typicke hodnoty Dropout rate**:
   - p=0.2-0.5 pro skryte vrstvy
   - Vyssi hodnoty mohou bránit uceni

### Prakticka doporuceni:
- Pouzivejte Dropout kdyz vidite velky rozdil mezi train a test presnosti
- Zacinajte s p=0.5 a experimentujte
- Vzdy prepinajte mezi `train()` a `eval()` rezimy

## 13. Kviz

In [None]:
print("="*60)
print("KVIZ - Dropout a Preuceni")
print("="*60)

questions = [
    {
        "question": "1. Co je preuceni (overfitting)?",
        "options": ["a) Model se neuci vubec", "b) Model ma vysokou testovaci presnost", 
                   "c) Model si zapamatoval trenovaci data, ale nedokaze generalizovat", 
                   "d) Model ma nizkou trenovaci presnost"],
        "answer": "c"
    },
    {
        "question": "2. Co dela Dropout behem trenovani?",
        "options": ["a) Pridava nove neurony", "b) Nahodne vypina neurony", 
                   "c) Zvysuje learning rate", "d) Meni aktivacni funkce"],
        "answer": "b"
    },
    {
        "question": "3. Kdyz zavolame model.eval(), co se stane s Dropoutem?",
        "options": ["a) Zustane aktivni", "b) Deaktivuje se", 
                   "c) Zvysi se dropout rate", "d) Smaze se"],
        "answer": "b"
    },
    {
        "question": "4. Typicka hodnota dropout rate pro skryte vrstvy je:",
        "options": ["a) 0.01", "b) 0.2-0.5", "c) 0.95", "d) 1.0"],
        "answer": "b"
    },
    {
        "question": "5. Jake je hlavni 'znamení' preuceni?",
        "options": ["a) Nizka presnost na train i test", 
                   "b) Vysoka presnost na train, nizka na test", 
                   "c) Stejna presnost na train i test", 
                   "d) Model se vubec neuci"],
        "answer": "b"
    }
]

for q in questions:
    print(f"\n{q['question']}")
    for opt in q['options']:
        print(f"   {opt}")

print("\n" + "-"*60)
print("ODPOVEDI: 1-c, 2-b, 3-b, 4-b, 5-b")
print("-"*60)

## 14. Vase vyzva

1. Zkuste zmenit architekturu site (mensi/vetsi) a sledujte vliv na preuceni
2. Experimentujte s ruznymi dropout rates a najdete optimalni hodnotu
3. Pridejte Dropout pouze po prvni nebo pouze po druhe vrstve - jaky je rozdil?

**Tip:** Dropout neni jedina regularizacni technika. Dalsi zahrnuji L1/L2 regularizaci, Early Stopping, a Data Augmentation!