# Kapitola 31: Zpetna propagace - Jak se neuronova sit uci ze svych chyb

## Cile kapitoly
- Pochopit 4 kroky uceni neuronove site
- Naucit se pouzivat PyTorch pro trenovani siti
- Vizualizovat ucici krivku (learning curve)
- Experimentovat s hyperparametry

## Predpoklady
- Aktivacni funkce (kapitola 30)
- Vicevrstve site a XOR problem (kapitola 29)

## 1. Co je zpetna propagace (Backpropagation)?

Zpetna propagace je algoritmus pro **uceni neuronovych siti**. Umoznuje siti se zlepsovat na zaklade svych chyb.

### Analogie: Tym delníku stavejici zed

Predstavte si, ze sit je tym delniku, kteri stavi zed:

1. **Delnci postavi zed** (Forward Pass)
2. **Mistr zmeri chybu** - "Zed je o 10 cm kriva!" (Loss)
3. **Mistr jde od posledniho delnka k prvnimu** a kazdemu rekne, jak moc se podilel na chybe (Backward Pass)
4. **Kazdy delnik upravi svou praci** (Weight Update)

Tento cyklus opakujeme tisickrat a zed je stale presnejsi!

In [None]:
# Instalace PyTorch
!pip install torch matplotlib numpy -q

In [None]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt

print(f"PyTorch verze: {torch.__version__}")
print("Knihovny uspesne nacteny!")

## 2. Ctyri kroky uceni neuronove site

Kazda epocha (trenovaci cyklus) se sklada ze 4 kroku:

| Krok | Nazev | Popis |
|------|-------|-------|
| 1 | **Forward Pass** | Data projdou siti, ziskame predikci |
| 2 | **Loss Calculation** | Porovname predikci se skutecnosti |
| 3 | **Backward Pass** | Spocitame gradienty (mira viny) |
| 4 | **Weight Update** | Upravime vahy podle gradientu |

In [None]:
# Vizualizace 4 kroku uceni
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

steps = [
    ("1. Forward Pass", "Vstup -> Sit -> Predikce", "#3498db"),
    ("2. Vypocet ztráty", "Predikce vs Skutecnost\n= Chyba (Loss)", "#e74c3c"),
    ("3. Backward Pass", "Chyba -> Gradienty\n(zpetne sireni)", "#f39c12"),
    ("4. Update vah", "Vahy -= lr * gradient", "#2ecc71")
]

for ax, (title, text, color) in zip(axes, steps):
    ax.add_patch(plt.Rectangle((0.1, 0.2), 0.8, 0.6, 
                                facecolor=color, alpha=0.7, edgecolor='black', linewidth=2))
    ax.text(0.5, 0.7, title, ha='center', va='center', fontsize=12, fontweight='bold')
    ax.text(0.5, 0.4, text, ha='center', va='center', fontsize=10)
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.axis('off')

plt.suptitle('4 kroky uceni neuronove site', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 3. Problem XOR v PyTorch

Pojdme si ukazat cele uceni na problemu XOR. PyTorch nam neuveriitelne zjednodusi kod!

In [None]:
# Data pro XOR jako PyTorch tenzory
X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

print("XOR pravdivostni tabulka:")
print("Vstup 1 | Vstup 2 | Vystup")
print("-" * 25)
for i in range(4):
    print(f"   {int(X[i][0])}    |    {int(X[i][1])}    |    {int(y[i])}")

In [None]:
# Definice architektury site
class XORNet(nn.Module):
    def __init__(self):
        super(XORNet, self).__init__()
        # Vstupni vrstva (2 neurony) -> Skryta vrstva (4 neurony)
        self.layer1 = nn.Linear(2, 4)
        # Aktivacni funkce pro skrytou vrstvu
        self.activation = nn.ReLU()
        # Skryta vrstva -> Vystupni vrstva (1 neuron)
        self.layer2 = nn.Linear(4, 1)
        # Sigmoid na vystupu (pro pravdepodobnost 0-1)
        self.output_activation = nn.Sigmoid()
    
    def forward(self, x):
        x = self.layer1(x)
        x = self.activation(x)
        x = self.layer2(x)
        x = self.output_activation(x)
        return x

# Vytvoreni modelu
model = XORNet()
print("Architektura site:")
print(model)

In [None]:
# Definice ztratove funkce a optimalizatoru
criterion = nn.MSELoss()  # Mean Squared Error
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)  # Stochastic Gradient Descent

print("Ztratova funkce: Mean Squared Error (MSE)")
print("Optimizer: SGD (Stochastic Gradient Descent)")
print("Learning rate: 0.5")

## 4. Trenovaci smycka - 4 kroky v akci

In [None]:
# Trenovaci smycka
epochs = 5000
losses = []  # Seznam pro ukladani ztraty v kazde epose

print("Zaciname trenovani...")
print("=" * 50)

for epoch in range(epochs):
    # ===== KROK 1: Dopredny pruchod (Forward Pass) =====
    outputs = model(X)
    
    # ===== KROK 2: Vypocet ztraty (Loss Calculation) =====
    loss = criterion(outputs, y)
    losses.append(loss.item())  # Ulozime hodnotu ztraty
    
    # ===== KROK 3: Zpetny pruchod (Backward Pass) =====
    optimizer.zero_grad()  # Vynulovani gradientu pred zpetnym pruchodem
    loss.backward()        # Zpetna propagace - vypocet gradientu
    
    # ===== KROK 4: Aktualizace vah (Weight Update) =====
    optimizer.step()       # Aktualizace vah podle gradientu
    
    # Vypiseme progress kazdych 1000 epoch
    if (epoch + 1) % 1000 == 0:
        print(f'Epocha [{epoch+1:5d}/{epochs}], Ztrata: {loss.item():.6f}')

print("=" * 50)
print("Trenovani dokonceno!")

## 5. Vizualizace ucici krivky (Learning Curve)

Ucici krivka je jako **EKG nasi site** - ukazuje, jak dobre se uci.

In [None]:
# Vizualizace ucici krivky
plt.figure(figsize=(12, 5))

# Levy graf - cela krivka
plt.subplot(1, 2, 1)
plt.plot(losses, 'b-', linewidth=1)
plt.title('Prubeh uceni neuronove site', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Ztrata (Loss)', fontsize=12)
plt.grid(True, alpha=0.3)

# Pravy graf - logaritmicka skala
plt.subplot(1, 2, 2)
plt.plot(losses, 'b-', linewidth=1)
plt.title('Ucici krivka (logaritmicka skala)', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Ztrata (Loss)', fontsize=12)
plt.yscale('log')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nPocatecni ztrata: {losses[0]:.6f}")
print(f"Konecna ztrata:   {losses[-1]:.6f}")
print(f"Zlepseni:         {losses[0]/losses[-1]:.1f}x")

In [None]:
# Finalni test modelu
print("\nFinalni predikce modelu:")
print("=" * 40)

with torch.no_grad():  # Vypneme vypocet gradientu pro testovani
    predictions = model(X)
    rounded = torch.round(predictions)
    
    print("Vstup    | Predikce | Zaokrouhleno | Spravne")
    print("-" * 45)
    
    correct = 0
    for i in range(4):
        pred = predictions[i].item()
        rnd = int(rounded[i].item())
        expected = int(y[i].item())
        is_correct = "ANO" if rnd == expected else "NE"
        if rnd == expected:
            correct += 1
        print(f"[{int(X[i][0])}, {int(X[i][1])}]   |  {pred:.4f}  |      {rnd}       |   {is_correct}")
    
    print("-" * 45)
    print(f"Presnost: {correct}/4 = {correct/4*100:.0f}%")

## 6. Experiment: Vliv rychlosti uceni (Learning Rate)

Rychlost uceni je jeden z nejdulezitejsich hyperparametru. Pojdme experimentovat!

In [None]:
def train_xor_with_lr(learning_rate, epochs=3000):
    """Trenuje XOR model s danou rychlosti uceni"""
    # Novy model
    torch.manual_seed(42)  # Pro reprodukovatelnost
    model = XORNet()
    criterion = nn.MSELoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    
    losses = []
    for epoch in range(epochs):
        outputs = model(X)
        loss = criterion(outputs, y)
        losses.append(loss.item())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    return losses

# Testovani ruznych rychlosti uceni
learning_rates = [0.01, 0.1, 0.5, 1.0, 2.0]
colors = ['blue', 'green', 'orange', 'red', 'purple']

plt.figure(figsize=(14, 5))

# Levy graf - normalni skala
plt.subplot(1, 2, 1)
for lr, color in zip(learning_rates, colors):
    losses = train_xor_with_lr(lr)
    plt.plot(losses, label=f'LR = {lr}', color=color, linewidth=1.5)

plt.title('Vliv rychlosti uceni na trenovani', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Ztrata (Loss)', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)
plt.ylim(0, 0.5)

# Pravy graf - detail prvnich 500 epoch
plt.subplot(1, 2, 2)
for lr, color in zip(learning_rates, colors):
    losses = train_xor_with_lr(lr)
    plt.plot(losses[:500], label=f'LR = {lr}', color=color, linewidth=1.5)

plt.title('Detail prvnich 500 epoch', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Ztrata (Loss)', fontsize=12)
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nPozorovani:")
print("- Prilis mala LR (0.01): Uceni je pomale")
print("- Optimalni LR (0.1-0.5): Rychle a stabilni uceni")
print("- Prilis velka LR (2.0): Nestabilni, muze divergovat")

## 7. Experiment: Ruzne optimalizatory

PyTorch nabizi ruzne optimalizatory. Kazdy ma sve vyhody.

In [None]:
def train_with_optimizer(optimizer_class, optimizer_name, epochs=2000, lr=0.1):
    """Trenuje model s danym optimalizatorem"""
    torch.manual_seed(42)
    model = XORNet()
    criterion = nn.MSELoss()
    optimizer = optimizer_class(model.parameters(), lr=lr)
    
    losses = []
    for epoch in range(epochs):
        outputs = model(X)
        loss = criterion(outputs, y)
        losses.append(loss.item())
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    return losses

# Testovani ruznych optimalizatoru
optimizers = [
    (torch.optim.SGD, 'SGD'),
    (torch.optim.Adam, 'Adam'),
    (torch.optim.RMSprop, 'RMSprop'),
    (torch.optim.Adagrad, 'Adagrad')
]

colors = ['blue', 'green', 'red', 'purple']

plt.figure(figsize=(12, 5))

for (opt_class, opt_name), color in zip(optimizers, colors):
    losses = train_with_optimizer(opt_class, opt_name)
    plt.plot(losses, label=opt_name, color=color, linewidth=2)

plt.title('Porovnani optimalizatoru', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Ztrata (Loss)', fontsize=12)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.yscale('log')
plt.show()

print("\nPrehled optimalizatoru:")
print("- SGD: Zakladni, jednoduchy, ale spolehlivy")
print("- Adam: Adaptivni, casto nejlepsi volba pro zacatek")
print("- RMSprop: Dobry pro rekurentni site")
print("- Adagrad: Adaptivni learning rate pro kazdy parametr")

## 8. Vizualizace gradientu v siti

Pojdme se podivat, jak gradienty "tecou" zpet siti.

In [None]:
# Vytvorime novy model a provedeme jeden krok
torch.manual_seed(42)
model = XORNet()
criterion = nn.MSELoss()

# Forward pass
outputs = model(X)
loss = criterion(outputs, y)

# Backward pass (vypocet gradientu)
loss.backward()

# Zobrazeni gradientu
print("Gradienty vah po jednom backward pass:")
print("=" * 50)

for name, param in model.named_parameters():
    if param.grad is not None:
        grad_mean = param.grad.abs().mean().item()
        grad_max = param.grad.abs().max().item()
        print(f"\n{name}:")
        print(f"  Tvar: {list(param.shape)}")
        print(f"  Prumerny gradient: {grad_mean:.6f}")
        print(f"  Max gradient:      {grad_max:.6f}")

In [None]:
# Vizualizace gradientu jako heatmap
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Layer 1 gradienty
ax1 = axes[0]
grad1 = model.layer1.weight.grad.detach().numpy()
im1 = ax1.imshow(grad1, cmap='coolwarm', aspect='auto')
ax1.set_title('Gradienty vah - Vrstva 1 (2->4)', fontsize=12)
ax1.set_xlabel('Vstupni neurony')
ax1.set_ylabel('Skryte neurony')
plt.colorbar(im1, ax=ax1)

# Layer 2 gradienty
ax2 = axes[1]
grad2 = model.layer2.weight.grad.detach().numpy()
im2 = ax2.imshow(grad2, cmap='coolwarm', aspect='auto')
ax2.set_title('Gradienty vah - Vrstva 2 (4->1)', fontsize=12)
ax2.set_xlabel('Skryte neurony')
ax2.set_ylabel('Vystupni neuron')
plt.colorbar(im2, ax=ax2)

plt.tight_layout()
plt.show()

print("\nCervena = pozitivni gradient (zvysit vahu)")
print("Modra = negativni gradient (snizit vahu)")

## 9. Mini-projekt: Sledovani vah behem trenovani

Pojdme sledovat, jak se vahy meni behem trenovani.

In [None]:
# Trenovani s ukladanim vah
torch.manual_seed(42)
model = XORNet()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

# Ulozime historii vah
weight_history = []
epochs = 2000

for epoch in range(epochs):
    outputs = model(X)
    loss = criterion(outputs, y)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Ulozime vahy kazdych 50 epoch
    if epoch % 50 == 0:
        weights = model.layer1.weight.detach().clone().numpy().flatten()
        weight_history.append(weights)

weight_history = np.array(weight_history)

# Vizualizace evoluce vah
plt.figure(figsize=(12, 5))

for i in range(min(8, weight_history.shape[1])):
    plt.plot(range(0, epochs, 50), weight_history[:, i], 
             label=f'Vaha {i+1}', linewidth=1.5, alpha=0.8)

plt.title('Evoluce vah v prvni vrstve behem trenovani', fontsize=14)
plt.xlabel('Epocha', fontsize=12)
plt.ylabel('Hodnota vahy', fontsize=12)
plt.legend(loc='upper right', ncol=2)
plt.grid(True, alpha=0.3)
plt.show()

print("Vsimněte si, jak se vahy postupne stabilizuji!")

## 10. Shrnuti kapitoly

### Co jsme se naucili:

1. **Zpetna propagace** je algoritmus pro uceni neuronovych siti

2. **4 kroky uceni**:
   - Forward Pass: data projdou siti
   - Loss Calculation: merime chybu
   - Backward Pass: pocitame gradienty
   - Weight Update: upravujeme vahy

3. **PyTorch** automatizuje backpropagation jednim prikazem: `loss.backward()`

4. **Ucici krivka** ukazuje prubeh uceni - chceme videt klesajici trend

5. **Learning rate** je klicovy hyperparametr - ani moc maly, ani moc velky

### Prakticka doporuceni:
- Zacinajte s **Adam optimizerem** a **lr=0.001**
- Sledujte ucici krivku pro diagnostiku problemu
- Experimentujte s hyperparametry

## 11. Kviz

In [None]:
# Kviz
print("="*60)
print("KVIZ - Zpetna propagace")
print("="*60)

questions = [
    {
        "question": "1. Co dela backward pass?",
        "options": ["a) Posila data siti", "b) Pocita gradienty", 
                   "c) Aktualizuje vahy", "d) Pocita loss"],
        "answer": "b"
    },
    {
        "question": "2. Jaky PyTorch prikaz provede zpetnou propagaci?",
        "options": ["a) model.forward()", "b) optimizer.step()", 
                   "c) loss.backward()", "d) model.train()"],
        "answer": "c"
    },
    {
        "question": "3. Co ukazuje ucici krivka?",
        "options": ["a) Presnost modelu", "b) Vyvoj ztraty v case", 
                   "c) Pocet parametru", "d) Strukturu site"],
        "answer": "b"
    },
    {
        "question": "4. Co se stane pri prilis velke learning rate?",
        "options": ["a) Pomale uceni", "b) Stabilni konvergence", 
                   "c) Nestabilita/divergence", "d) Nic se nezmeni"],
        "answer": "c"
    },
    {
        "question": "5. Ktery optimizer je doporucen pro zacatecniky?",
        "options": ["a) SGD", "b) Adam", "c) Adagrad", "d) RMSprop"],
        "answer": "b"
    }
]

for q in questions:
    print(f"\n{q['question']}")
    for opt in q['options']:
        print(f"   {opt}")

print("\n" + "-"*60)
print("ODPOVEDI: 1-b, 2-c, 3-b, 4-c, 5-b")
print("-"*60)

## 12. Vase vyzva

1. Zmente learning rate na velmi vysokou hodnotu (napr. 5.0) a sledujte, co se stane
2. Zkuste pridat treti vrstvu do site - zmeni se rychlost uceni?
3. Experimentujte s poctem neuronu ve skryte vrstve (2, 8, 16)

**Tip:** Vzdy sledujte ucici krivku - je to nejlepsi zpusob, jak diagnostikovat problemy!