# Kapitola 26: Metriky hodnoceni ML modelu

## Jak zjistit, jestli vas model funguje

Natrenovali jste model, ktery ma 95% presnost. Super, ne? Ale co kdyz vas model predpovid√° diagnozu rakoviny a tech 5% chyb znamena, ze propustite 5 z 100 pacientu s rakovinou?

**Presnost neni vsechno!** V teto kapitole se naucime ruzne metriky a kdy kterou pouzit.

---

### Co se naucime:
1. **Confusion Matrix** - zakladni tabulka pro analyzu chyb
2. **Accuracy** - celkova presnost (a kdy nestaci)
3. **Precision** - jak presne jsou pozitivni predikce
4. **Recall** - kolik skutecne pozitivnich jsme nasli
5. **F1 Score** - harmonicky prumer precision a recall
6. **Kdy pouzit kterou metriku**

### Proc je to dulezite?
- Spatna metrika = spatne rozhodnuti
- Kazdy problem potrebuje jinou metriku
- Lekar vs. spam filtr = jine priority

## 1. Instalace a import knihoven

In [None]:
# Instalace knihoven (pro Google Colab)
!pip install pandas numpy matplotlib seaborn scikit-learn -q

print("Knihovny nainstalovany!")

In [None]:
# Import knihoven
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import make_classification, load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, ConfusionMatrixDisplay
)

# Nastaveni vizualizaci
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
np.random.seed(42)

print("Vsechny knihovny uspesne nacteny!")

## 2. Confusion Matrix - Zaklad vsech metrik

**Confusion Matrix** (matice zamen) ukazuje, jake chyby model dela.

```
                    Predikce
                    NEG    POS
Skutecnost  NEG  |  TN  |  FP  |   <- False Positive (falesny poplach)
            POS  |  FN  |  TP  |   <- False Negative (propustili jsme pozitivniho)
```

- **TP** (True Positive): Spravne predikovali pozitivni
- **TN** (True Negative): Spravne predikovali negativni
- **FP** (False Positive): Falesny poplach
- **FN** (False Negative): Propustili jsme skutecne pozitivni

In [None]:
# Demonstrace Confusion Matrix
print("=" * 50)
print("CONFUSION MATRIX - VIZUALNI VYSVETLENI")
print("=" * 50)

# Vytvorime jednoduchy priklad
# Testujeme pacienty na nemoc
skutecnost = [1, 1, 1, 1, 0, 0, 0, 0, 0, 0]  # 4 nemocni, 6 zdravych
predikce   = [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]  # Model predikoval

# Vytvorime confusion matrix
cm = confusion_matrix(skutecnost, predikce)

print("\nSkutecnost: ", skutecnost)
print("Predikce:   ", predikce)
print("\nConfusion Matrix:")
print(cm)

# Extrahujeme hodnoty
TN, FP, FN, TP = cm.ravel()
print(f"\nTrue Negative (TN):  {TN} - spravne predikovali zdravy")
print(f"False Positive (FP): {FP} - falesny poplach (zdravy oznacen jako nemocny)")
print(f"False Negative (FN): {FN} - NEBEZPECNE! propustili jsme nemocneho")
print(f"True Positive (TP):  {TP} - spravne predikovali nemocny")

In [None]:
# Vizualizace Confusion Matrix
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Leva: cisla
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['Zdravy', 'Nemocny'],
            yticklabels=['Zdravy', 'Nemocny'])
axes[0].set_xlabel('Predikce')
axes[0].set_ylabel('Skutecnost')
axes[0].set_title('Confusion Matrix - Cisla')

# Prava: popis
popis = [['TN\n(Spravne zdravy)', 'FP\n(Falesny poplach)'],
         ['FN\n(Propusteny nemocny)', 'TP\n(Spravne nemocny)']]
barvy = [['lightgreen', 'lightyellow'],
         ['lightcoral', 'lightgreen']]

axes[1].set_xlim(0, 2)
axes[1].set_ylim(0, 2)
for i in range(2):
    for j in range(2):
        axes[1].add_patch(plt.Rectangle((j, 1-i), 1, 1, 
                                        facecolor=barvy[i][j], edgecolor='black'))
        axes[1].text(j+0.5, 1.5-i, popis[i][j], ha='center', va='center', fontsize=10)
axes[1].set_xticks([0.5, 1.5])
axes[1].set_yticks([0.5, 1.5])
axes[1].set_xticklabels(['Predikce: 0', 'Predikce: 1'])
axes[1].set_yticklabels(['Skutecnost: 1', 'Skutecnost: 0'])
axes[1].set_title('Vyznam jednotlivych poli')

plt.suptitle('Confusion Matrix - Zaklad vsech metrik', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 3. Metriky odvozene z Confusion Matrix

### Accuracy (Presnost)
```
Accuracy = (TP + TN) / (TP + TN + FP + FN)
```
Kolik predikci bylo celkove spravnych?

### Precision (Preciznost)
```
Precision = TP / (TP + FP)
```
Z tech, co jsme oznacili jako pozitivni, kolik jich opravdu bylo pozitivnich?

### Recall (Citlivost / Sensitivity)
```
Recall = TP / (TP + FN)
```
Kolik skutecne pozitivnich jsme nasli?

### F1 Score
```
F1 = 2 * (Precision * Recall) / (Precision + Recall)
```
Harmonicky prumer precision a recall.

In [None]:
# Vypocet vsech metrik
print("=" * 50)
print("VYPOCET METRIK")
print("=" * 50)

# Pouzijeme nase data z predchoziho prikladu
accuracy = (TP + TN) / (TP + TN + FP + FN)
precision = TP / (TP + FP) if (TP + FP) > 0 else 0
recall = TP / (TP + FN) if (TP + FN) > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

print(f"\nTP={TP}, TN={TN}, FP={FP}, FN={FN}")
print(f"\nAccuracy:  {accuracy:.2%} = ({TP}+{TN})/({TP}+{TN}+{FP}+{FN})")
print(f"Precision: {precision:.2%} = {TP}/({TP}+{FP})")
print(f"Recall:    {recall:.2%} = {TP}/({TP}+{FN})")
print(f"F1 Score:  {f1:.2%}")

# Overeni pomoci sklearn
print("\n--- Overeni pomoci sklearn ---")
print(f"sklearn Accuracy:  {accuracy_score(skutecnost, predikce):.2%}")
print(f"sklearn Precision: {precision_score(skutecnost, predikce):.2%}")
print(f"sklearn Recall:    {recall_score(skutecnost, predikce):.2%}")
print(f"sklearn F1:        {f1_score(skutecnost, predikce):.2%}")

In [None]:
# Vizualizace metrik
metriky = ['Accuracy', 'Precision', 'Recall', 'F1 Score']
hodnoty = [accuracy, precision, recall, f1]
barvy = ['skyblue', 'lightgreen', 'lightcoral', 'plum']

plt.figure(figsize=(10, 5))
bars = plt.bar(metriky, hodnoty, color=barvy, edgecolor='black')
plt.ylim(0, 1)
plt.ylabel('Hodnota')
plt.title('Porovnani metrik')

# Pridame hodnoty na sloupce
for bar, val in zip(bars, hodnoty):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
             f'{val:.1%}', ha='center', fontsize=12)

plt.tight_layout()
plt.show()

## 4. Kdy pouzit kterou metriku?

| Situace | Priorita | Metrika |
|---------|----------|---------|
| Diagnoza rakoviny | Nenechat uniknout nemocneho | **Recall** |
| Spam filtr | Neposlat dulezity email do spamu | **Precision** |
| Vyvazeny problem | Oboji je dulezite | **F1 Score** |
| Jednoduchy pripad | Celkovy prehled | **Accuracy** |

In [None]:
# Demonstrace: Proc Accuracy nestaci
print("=" * 60)
print("PROC ACCURACY NESTACI: NEVYVAZENE TRIDY")
print("=" * 60)

# Predstavte si: 1000 lidi, z toho jen 10 ma vzacnou nemoc
skutecnost_nevyv = [0]*990 + [1]*10  # 990 zdravych, 10 nemocnych

# Model ktery rika VZDY "zdravy" (predikuje vzdy 0)
predikce_hloupy = [0]*1000

acc_hloupy = accuracy_score(skutecnost_nevyv, predikce_hloupy)
prec_hloupy = precision_score(skutecnost_nevyv, predikce_hloupy, zero_division=0)
rec_hloupy = recall_score(skutecnost_nevyv, predikce_hloupy)
f1_hloupy = f1_score(skutecnost_nevyv, predikce_hloupy)

print(f"\nModel ktery vzdy rika 'zdravy':")
print(f"  Accuracy:  {acc_hloupy:.1%} <-- Vypada skvele!")
print(f"  Precision: {prec_hloupy:.1%}")
print(f"  Recall:    {rec_hloupy:.1%} <-- KATASTROFA! Nenasli jsme jedineho nemocneho!")
print(f"  F1 Score:  {f1_hloupy:.1%}")

print("\n>>> 99% accuracy, ale NULA recall - to je nebezpecne!")

In [None]:
# Vizualizace problemu nevyvazenych trid
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Leva: rozlozeni trid
axes[0].bar(['Zdravy (0)', 'Nemocny (1)'], [990, 10], color=['skyblue', 'coral'])
axes[0].set_ylabel('Pocet')
axes[0].set_title('Nevyvazeny dataset: 990 zdravych vs 10 nemocnych')

# Prava: metriky hloupeho modelu
metriky = ['Accuracy', 'Precision', 'Recall', 'F1']
hodnoty = [acc_hloupy, prec_hloupy, rec_hloupy, f1_hloupy]
barvy = ['green' if v > 0.5 else 'red' for v in hodnoty]
axes[1].bar(metriky, hodnoty, color=barvy, alpha=0.7, edgecolor='black')
axes[1].axhline(y=0.5, color='orange', linestyle='--', label='50%')
axes[1].set_ylim(0, 1.1)
axes[1].set_ylabel('Hodnota')
axes[1].set_title('Metriky "hloupeho" modelu (vzdy rika zdravy)')
axes[1].legend()

# Anotace
axes[1].annotate('Vysoka accuracy\nnestaci!', xy=(0, 1), fontsize=10, color='green')
axes[1].annotate('Recall = 0\nnenasli jsme\njedineho nemocneho!', 
                 xy=(2, 0.1), fontsize=10, color='red')

plt.suptitle('Problem: Accuracy muze byt klamava u nevyvazenych trid', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 5. Prakticka ukazka: Breast Cancer dataset

Pouzijeme realny dataset pro diagnozu rakoviny prsu.

In [None]:
# Nacteni datasetu
cancer = load_breast_cancer()
X = cancer.data
y = cancer.target

print("=" * 50)
print("BREAST CANCER DATASET")
print("=" * 50)
print(f"\nPocet vzorku: {len(X)}")
print(f"Pocet priznaku: {X.shape[1]}")
print(f"\nTridy: {cancer.target_names}")
print(f"Rozdeleni: malignantni={sum(y==0)}, benigni={sum(y==1)}")

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

In [None]:
# Trenink a vyhodnoceni modelu
print("=" * 50)
print("TRENINK A VYHODNOCENI MODELU")
print("=" * 50)

# Trenink Logisticke regrese
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Predikce
y_pred = model.predict(X_test)

# Vsechny metriky
print("\nVysledky Logisticke Regrese:")
print(f"  Accuracy:  {accuracy_score(y_test, y_pred):.2%}")
print(f"  Precision: {precision_score(y_test, y_pred):.2%}")
print(f"  Recall:    {recall_score(y_test, y_pred):.2%}")
print(f"  F1 Score:  {f1_score(y_test, y_pred):.2%}")

In [None]:
# Confusion Matrix - detailni analyza
cm = confusion_matrix(y_test, y_pred)

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Leva: Confusion Matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=cancer.target_names)
disp.plot(ax=axes[0], cmap='Blues')
axes[0].set_title('Confusion Matrix')

# Prava: Metriky
metriky = ['Accuracy', 'Precision', 'Recall', 'F1']
hodnoty = [
    accuracy_score(y_test, y_pred),
    precision_score(y_test, y_pred),
    recall_score(y_test, y_pred),
    f1_score(y_test, y_pred)
]
axes[1].barh(metriky, hodnoty, color=['skyblue', 'lightgreen', 'lightcoral', 'plum'])
axes[1].set_xlim(0, 1)
axes[1].set_xlabel('Hodnota')
axes[1].set_title('Metriky modelu')

for i, v in enumerate(hodnoty):
    axes[1].text(v + 0.01, i, f'{v:.1%}', va='center')

plt.suptitle('Vyhodnoceni modelu na Breast Cancer datasetu', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Classification Report - vsechny metriky pohromade
print("=" * 60)
print("CLASSIFICATION REPORT")
print("=" * 60)
print()
print(classification_report(y_test, y_pred, target_names=cancer.target_names))

## 6. Porovnani vice modelu

In [None]:
# Porovnani ruznych modelu
print("=" * 60)
print("POROVNANI VICE MODELU")
print("=" * 60)

modely = {
    'Logisticka Regrese': LogisticRegression(max_iter=10000),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'KNN (K=5)': KNeighborsClassifier(n_neighbors=5)
}

vysledky = []

for nazev, model in modely.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    vysledky.append({
        'Model': nazev,
        'Accuracy': accuracy_score(y_test, y_pred),
        'Precision': precision_score(y_test, y_pred),
        'Recall': recall_score(y_test, y_pred),
        'F1': f1_score(y_test, y_pred)
    })

df_vysledky = pd.DataFrame(vysledky)
print("\nPorovnani modelu:")
print(df_vysledky.to_string(index=False))

In [None]:
# Vizualizace porovnani
fig, ax = plt.subplots(figsize=(12, 6))

x = np.arange(len(df_vysledky))
width = 0.2

bars1 = ax.bar(x - 1.5*width, df_vysledky['Accuracy'], width, label='Accuracy', color='skyblue')
bars2 = ax.bar(x - 0.5*width, df_vysledky['Precision'], width, label='Precision', color='lightgreen')
bars3 = ax.bar(x + 0.5*width, df_vysledky['Recall'], width, label='Recall', color='lightcoral')
bars4 = ax.bar(x + 1.5*width, df_vysledky['F1'], width, label='F1', color='plum')

ax.set_ylabel('Hodnota')
ax.set_title('Porovnani metrik ruznych modelu')
ax.set_xticks(x)
ax.set_xticklabels(df_vysledky['Model'])
ax.legend()
ax.set_ylim(0, 1.1)

plt.tight_layout()
plt.show()

# Nejlepsi model podle F1
nejlepsi = df_vysledky.loc[df_vysledky['F1'].idxmax()]
print(f"\nNejlepsi model podle F1 skore: {nejlepsi['Model']} (F1={nejlepsi['F1']:.2%})")

## 7. Shruti a kontrolni otazky

### Co jsme se naucili:

| Metrika | Vzorec | Kdy pouzit |
|---------|--------|------------|
| **Accuracy** | (TP+TN)/vsechny | Vyvazene tridy |
| **Precision** | TP/(TP+FP) | Kdy FP je drahy |
| **Recall** | TP/(TP+FN) | Kdy FN je drahy |
| **F1** | 2*(P*R)/(P+R) | Kdyz chceme obe |

### Kontrolni otazky:

1. **Proc je vysoka accuracy nekdy zavadejici?**
2. **V jakem pripade je dulezitejsi Recall nez Precision?**
3. **Co rika F1 skore?**
4. **Jak cist Confusion Matrix?**

In [None]:
# Test vasich znalosti
print("=" * 60)
print("KVIZ: TEST VASICH ZNALOSTI")
print("=" * 60)

otazky = [
    {
        "otazka": "Kdy je Recall dulezitejsi nez Precision?",
        "moznosti": ["A) Spam filtr (nechceme mazat dulezite emaily)", 
                     "B) Diagnoza rakoviny (nechceme propustit nemocneho)", 
                     "C) Oboji je stejne dulezite"],
        "spravne": "B"
    },
    {
        "otazka": "Co znamena False Negative (FN)?",
        "moznosti": ["A) Falesny poplach", 
                     "B) Propustili jsme skutecne pozitivniho", 
                     "C) Spravna predikce"],
        "spravne": "B"
    },
    {
        "otazka": "Co rika F1 skore?",
        "moznosti": ["A) Prumer accuracy a precision", 
                     "B) Harmonicky prumer precision a recall", 
                     "C) Celkova presnost"],
        "spravne": "B"
    }
]

for i, q in enumerate(otazky, 1):
    print(f"\nOtazka {i}: {q['otazka']}")
    for m in q['moznosti']:
        print(f"  {m}")
    print(f"  --> Spravna odpoved: {q['spravne']}")

## 8. Vase vyzva

1. **Zmente prah klasifikace** - co se stane s Precision a Recall?
2. **Pouzijte jiny dataset** - zkuste Iris nebo vlastni data
3. **Porovnejte vice modelu** - ktery je nejlepsi pro vas problem?

In [None]:
# Vas prostor pro experimentovani
# -----------------------------

# Tip: Zkuste zmenit prah klasifikace
# probs = model.predict_proba(X_test)[:, 1]
# y_pred_new = (probs > 0.3).astype(int)  # Zmente prah z 0.5 na 0.3

# Tip: Pouzijte classification_report pro vsechny metriky najednou
# print(classification_report(y_test, y_pred))

print("Experimentujte s kodem vyse!")