# Kapitola 23: Train/Test Split - Jak AI zkousime

## Proc nemuzeme zkouset AI na datech, na kterych se ucila?

Predstavte si studenta, ktery se uci na zkousku. Ma k dispozici 100 otazek s odpoveďmi. Co se stane, kdyz se je **vsechny nauci nazpamet** a pak ho zkousite **ze stejnych otazek**?

Bude mit 100% uspesnost! Ale co kdyz mu date **nove otazky**, ktere nikdy nevidel? Mozna se zhroutí...

**Presne stejny problem ma AI:**
- Pokud trenujeme model na vsech datech
- A pak ho testujeme na stejnych datech
- Vysledky budou **prilis optimisticke**

---

### Co se naucime:
1. **Train/Test Split** - jak rozdelit data na trenovaci a testovaci
2. **Overfitting** - kdy se model "nauci nazpamet" misto zobecneni
3. **Cross-validation** - robustnejsi zpusob testovani
4. **Validation set** - proc potrebujeme i treti sadu dat
5. **Prakticke priklady** - jak to vse udelat v Pythonu

## 1. Instalace a import knihoven

In [None]:
# Instalace knihoven (pro Google Colab)
!pip install pandas numpy matplotlib seaborn scikit-learn -q

print("Knihovny nainstalovany!")

In [None]:
# Import knihoven
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score, KFold
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, r2_score
from sklearn.datasets import make_classification, load_iris

# Nastaveni vizualizaci
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)
np.random.seed(42)  # Pro reprodukovatelnost

print("Vsechny knihovny uspesne nacteny!")

## 2. Proc potrebujeme delit data?

### Problem: Model, ktery se "nauci nazpamet"

Ukazeme si na jednoduchern prikladu, co se stane, kdyz model trenujeme a testujeme na stejnych datech.

In [None]:
# Vytvorime jednoducha data - predikce ceny domu podle plochy
np.random.seed(42)

# Plocha domu (m2)
plocha = np.random.uniform(50, 200, 50)

# Cena = 50000 * plocha + nahodny sum
cena = 50000 * plocha + np.random.normal(0, 500000, 50)

# DataFrame
domy = pd.DataFrame({'plocha_m2': plocha, 'cena_kc': cena})

print("Nase data o domech:")
print(domy.head(10))

# Vizualizace
plt.figure(figsize=(10, 6))
plt.scatter(domy['plocha_m2'], domy['cena_kc']/1000000, c='coral', s=80, alpha=0.7)
plt.xlabel('Plocha (m2)')
plt.ylabel('Cena (mil. Kc)')
plt.title('Vztah mezi plochou a cenou domu')
plt.show()

In [None]:
# SPATNY PRISTUP: Trenujeme a testujeme na stejnych datech
print("=" * 60)
print("SPATNY PRISTUP: Trenovani a testovani na STEJNYCH datech")
print("=" * 60)

X = domy[['plocha_m2']]
y = domy['cena_kc']

# Trenujeme model na VSECH datech
model = LinearRegression()
model.fit(X, y)

# Testujeme na STEJNYCH datech
predikce = model.predict(X)

# Vypocitame chybu (MSE) a R2 skore
mse = mean_squared_error(y, predikce)
r2 = r2_score(y, predikce)

print(f"\nVysledky na TRENOVACICH datech:")
print(f"  R2 skore: {r2:.4f}")
print(f"  Prumerna chyba: {np.sqrt(mse)/1000:.0f} tis. Kc")
print("\n>>> Pozor! Toto je PRILIS OPTIMISTICKE!")
print(">>> Model videl vsechna tato data behem uceni.")

## 3. Train/Test Split - Spravny pristup

Rozdelime data na dve casti:
- **Trenovaci sada (Train set)** - model se na nich uci
- **Testovaci sada (Test set)** - model je nikdy nevidel, pouzijeme pro hodnoceni

### Typicke pomery rozdeleni:
- **80/20** - 80% trenovaci, 20% testovaci (nejcastejsi)
- **70/30** - 70% trenovaci, 30% testovaci
- **90/10** - pro velke datasety

In [None]:
# SPRAVNY PRISTUP: Rozdelime data
print("=" * 60)
print("SPRAVNY PRISTUP: Train/Test Split")
print("=" * 60)

X = domy[['plocha_m2']]
y = domy['cena_kc']

# Rozdelime na 80% trenovaci, 20% testovaci
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2,      # 20% pro test
    random_state=42     # Pro reprodukovatelnost
)

print(f"\nRozdeleni dat:")
print(f"  Trenovaci sada: {len(X_train)} vzorku ({len(X_train)/len(X)*100:.0f}%)")
print(f"  Testovaci sada: {len(X_test)} vzorku ({len(X_test)/len(X)*100:.0f}%)")

In [None]:
# Trenujeme model POUZE na trenovacich datech
model = LinearRegression()
model.fit(X_train, y_train)  # Ucime se POUZE na train datech

# Predikce na TRENOVACICH datech
pred_train = model.predict(X_train)
r2_train = r2_score(y_train, pred_train)
mse_train = mean_squared_error(y_train, pred_train)

# Predikce na TESTOVACICH datech (model je nikdy nevidel!)
pred_test = model.predict(X_test)
r2_test = r2_score(y_test, pred_test)
mse_test = mean_squared_error(y_test, pred_test)

print("=" * 60)
print("POROVNANI VYSLEDKU")
print("=" * 60)
print(f"\nVysledky na TRENOVACICH datech:")
print(f"  R2 skore: {r2_train:.4f}")
print(f"  Prumerna chyba: {np.sqrt(mse_train)/1000:.0f} tis. Kc")

print(f"\nVysledky na TESTOVACICH datech (NEVIDENA DATA):")
print(f"  R2 skore: {r2_test:.4f}")
print(f"  Prumerna chyba: {np.sqrt(mse_test)/1000:.0f} tis. Kc")

print("\n>>> Testovaci skore je REALISTICKE - ukazuje, jak model funguje")
print(">>> na datech, ktere nikdy nevidel!")

In [None]:
# Vizualizace rozdeleni
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Levy graf - rozdeleni dat
axes[0].scatter(X_train, y_train/1000000, c='blue', s=80, alpha=0.7, label=f'Train ({len(X_train)})')
axes[0].scatter(X_test, y_test/1000000, c='red', s=80, alpha=0.7, label=f'Test ({len(X_test)})')

# Pridame regresni primku
x_line = np.linspace(X['plocha_m2'].min(), X['plocha_m2'].max(), 100).reshape(-1, 1)
y_line = model.predict(x_line)
axes[0].plot(x_line, y_line/1000000, 'g--', linewidth=2, label='Model')

axes[0].set_xlabel('Plocha (m2)')
axes[0].set_ylabel('Cena (mil. Kc)')
axes[0].set_title('Train/Test Split vizualizace')
axes[0].legend()

# Pravy graf - porovnani skore
kategorie = ['Train skore', 'Test skore']
skore = [r2_train, r2_test]
barvy = ['blue', 'red']
bars = axes[1].bar(kategorie, skore, color=barvy, alpha=0.7, edgecolor='black')
axes[1].set_ylabel('R2 Skore')
axes[1].set_title('Porovnani Train vs Test skore')
axes[1].set_ylim(0, 1)

# Pridame hodnoty na sloupce
for bar, s in zip(bars, skore):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02, 
                 f'{s:.3f}', ha='center', fontsize=12)

plt.suptitle('Spravny pristup: Trenuj na train, testuj na test', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 4. Overfitting - Kdyz se model "nauci nazpamet"

**Overfitting** nastava, kdyz model:
- Ma VYBORNE vysledky na trenovacich datech
- Ma SPATNE vysledky na testovacich datech

To znamena, ze model se "naucil nazpamet" trenovaci data, misto aby pochopil obecne vzory.

### Vizualni ukazka: KNN s ruznymi hodnotami K

In [None]:
# Vytvorime klasifikacni ulohu
from sklearn.datasets import make_moons

# Vytvorime "mesicovita" data
X, y = make_moons(n_samples=200, noise=0.3, random_state=42)

# Rozdelime data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f"Trenovaci data: {len(X_train)} vzorku")
print(f"Testovaci data: {len(X_test)} vzorku")

# Vizualizace
plt.figure(figsize=(10, 5))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', s=60, alpha=0.7, label='Train')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', s=60, alpha=0.7, marker='s', label='Test')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Klasifikacni uloha - "Mesice"')
plt.colorbar(label='Trida')
plt.legend()
plt.show()

In [None]:
# Porovname KNN s ruznymi hodnotami K
print("=" * 60)
print("VLIV PARAMETRU K NA OVERFITTING")
print("=" * 60)

hodnoty_k = [1, 3, 5, 10, 20, 50]
train_scores = []
test_scores = []

for k in hodnoty_k:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    
    train_acc = accuracy_score(y_train, knn.predict(X_train))
    test_acc = accuracy_score(y_test, knn.predict(X_test))
    
    train_scores.append(train_acc)
    test_scores.append(test_acc)
    
    status = ""
    if train_acc > test_acc + 0.1:
        status = "OVERFITTING!"
    elif test_acc > train_acc:
        status = "OK"
    else:
        status = "Dobry model"
    
    print(f"K={k:2d}: Train={train_acc:.3f}, Test={test_acc:.3f} --> {status}")

In [None]:
# Vizualizace overfitting vs underfitting
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Graf 1: Train vs Test skore
axes[0].plot(hodnoty_k, train_scores, 'b-o', linewidth=2, markersize=8, label='Train skore')
axes[0].plot(hodnoty_k, test_scores, 'r-s', linewidth=2, markersize=8, label='Test skore')
axes[0].fill_between(hodnoty_k, train_scores, test_scores, alpha=0.2, color='purple')
axes[0].set_xlabel('Hodnota K (pocet sousedu)')
axes[0].set_ylabel('Presnost (Accuracy)')
axes[0].set_title('Overfitting: Kdyz je Train >> Test')
axes[0].legend()
axes[0].axvline(x=5, color='green', linestyle='--', label='Optimalni K')

# Anotace
axes[0].annotate('OVERFITTING\n(prilis slozity)', xy=(1, 0.95), fontsize=10, color='red')
axes[0].annotate('UNDERFITTING\n(prilis jednoduchy)', xy=(40, 0.75), fontsize=10, color='orange')

# Graf 2: Rozdil mezi train a test
rozdil = np.array(train_scores) - np.array(test_scores)
barvy = ['red' if r > 0.05 else 'green' for r in rozdil]
bars = axes[1].bar(range(len(hodnoty_k)), rozdil, color=barvy, alpha=0.7, edgecolor='black')
axes[1].set_xticks(range(len(hodnoty_k)))
axes[1].set_xticklabels([f'K={k}' for k in hodnoty_k])
axes[1].set_ylabel('Train - Test skore')
axes[1].set_title('Rozdil mezi Train a Test (mensi = lepsi)')
axes[1].axhline(y=0.05, color='orange', linestyle='--', label='Hranice overfittingu')
axes[1].legend()

plt.suptitle('Jak poznat overfitting', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 5. Cross-Validation - Robustnejsi testovani

**Problem s jednim rozdelenim:** Vysledky zavisi na tom, JAKE data jsou v train a test sade.

**Reseni: K-Fold Cross-Validation**
1. Rozdelime data na K casti (folds)
2. Trenujeme K-krat, pokazde jinak
3. Prumerujeme vysledky

```
5-Fold Cross-Validation:
Fold 1: [TEST] [train] [train] [train] [train]
Fold 2: [train] [TEST] [train] [train] [train]
Fold 3: [train] [train] [TEST] [train] [train]
Fold 4: [train] [train] [train] [TEST] [train]
Fold 5: [train] [train] [train] [train] [TEST]
```

In [None]:
# Vizualizace K-Fold Cross-Validation
print("=" * 60)
print("K-FOLD CROSS-VALIDATION VIZUALIZACE")
print("=" * 60)

# Vytvorime KFold objekt
kfold = KFold(n_splits=5, shuffle=True, random_state=42)

fig, axes = plt.subplots(5, 1, figsize=(12, 6))

for fold, (train_idx, test_idx) in enumerate(kfold.split(X)):
    # Vytvorime vizualizaci pro kazdy fold
    colors = ['lightblue'] * len(X)
    for idx in test_idx:
        colors[idx] = 'coral'
    
    axes[fold].bar(range(len(X)), [1]*len(X), color=colors, edgecolor='black', linewidth=0.5)
    axes[fold].set_ylabel(f'Fold {fold+1}')
    axes[fold].set_yticks([])
    axes[fold].set_xlim(-1, len(X))
    
    # Anotace
    train_pct = len(train_idx)/len(X)*100
    test_pct = len(test_idx)/len(X)*100
    axes[fold].text(len(X)+5, 0.5, f'Train: {train_pct:.0f}%, Test: {test_pct:.0f}%', 
                    fontsize=10, va='center')

axes[0].set_title('5-Fold Cross-Validation: Modra=Train, Oranzova=Test')
axes[-1].set_xlabel('Vzorky v datasetu')

plt.tight_layout()
plt.show()

In [None]:
# Prakticky priklad cross-validation
print("=" * 60)
print("PRAKTICKA CROSS-VALIDATION")
print("=" * 60)

# Pouzijeme Iris dataset
iris = load_iris()
X_iris = iris.data
y_iris = iris.target

print(f"\nIris dataset: {len(X_iris)} vzorku, {X_iris.shape[1]} priznaku")

# Vytvorime model
model = KNeighborsClassifier(n_neighbors=5)

# Spustime 5-fold cross-validation
cv_scores = cross_val_score(model, X_iris, y_iris, cv=5)

print(f"\nVysledky 5-fold cross-validation:")
for i, score in enumerate(cv_scores, 1):
    print(f"  Fold {i}: {score:.4f}")

print(f"\n" + "="*40)
print(f"Prumerne skore: {cv_scores.mean():.4f} (+/- {cv_scores.std()*2:.4f})")
print("="*40)
print("\n+/- ukazuje rozptyl - mensi = stabilnejsi model")

In [None]:
# Porovnani ruznych hodnot K pomoci cross-validation
print("=" * 60)
print("HLEDANI OPTIMALNIHO K POMOCI CROSS-VALIDATION")
print("=" * 60)

hodnoty_k = range(1, 31)
cv_prumery = []
cv_odchylky = []

for k in hodnoty_k:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_iris, y_iris, cv=5)
    cv_prumery.append(scores.mean())
    cv_odchylky.append(scores.std())

# Najdeme nejlepsi K
nejlepsi_k = hodnoty_k[np.argmax(cv_prumery)]
nejlepsi_skore = max(cv_prumery)

print(f"\nNejlepsi K: {nejlepsi_k}")
print(f"Nejlepsi prumerne skore: {nejlepsi_skore:.4f}")

In [None]:
# Vizualizace hledani optimalniho K
plt.figure(figsize=(12, 6))

plt.plot(hodnoty_k, cv_prumery, 'b-o', linewidth=2, markersize=6, label='Prumerne CV skore')
plt.fill_between(hodnoty_k, 
                 np.array(cv_prumery) - np.array(cv_odchylky),
                 np.array(cv_prumery) + np.array(cv_odchylky),
                 alpha=0.2, color='blue', label='Smerodatna odchylka')

# Oznacime nejlepsi K
plt.axvline(x=nejlepsi_k, color='green', linestyle='--', linewidth=2, 
            label=f'Nejlepsi K={nejlepsi_k}')
plt.scatter([nejlepsi_k], [nejlepsi_skore], c='green', s=200, zorder=5, marker='*')

plt.xlabel('Hodnota K (pocet sousedu)')
plt.ylabel('Prumerne CV skore')
plt.title('Hledani optimalniho K pomoci 5-Fold Cross-Validation')
plt.legend()
plt.grid(True, alpha=0.3)

# Anotace oblasti
plt.annotate('Overfitting\n(K prilis male)', xy=(2, 0.95), fontsize=10, color='red')
plt.annotate('Underfitting\n(K prilis velke)', xy=(25, 0.93), fontsize=10, color='orange')

plt.tight_layout()
plt.show()

## 6. Train / Validation / Test Split

Kdyz ladime hyperparametry (jako hodnotu K), potrebujeme **tri sady**:

| Sada | Ucel | Procento |
|------|------|----------|
| **Train** | Uceni modelu | 60% |
| **Validation** | Ladeni hyperparametru | 20% |
| **Test** | Finalni hodnoceni | 20% |

### Proc?
- Pokud bychom ladili hyperparametry na test sade, mohli bychom "prekrmit" model na test data
- Test sada musi zustat nedotcena az do finalniho hodnoceni

In [None]:
# Train / Validation / Test Split
print("=" * 60)
print("TRAIN / VALIDATION / TEST SPLIT")
print("=" * 60)

# Nejprve rozdelime na train+val a test
X_temp, X_test, y_temp, y_test = train_test_split(
    X_iris, y_iris, test_size=0.2, random_state=42
)

# Pak rozdelime train+val na train a validation
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42  # 0.25 * 0.8 = 0.2
)

print(f"\nRozdeleni dat:")
print(f"  Train:      {len(X_train)} vzorku ({len(X_train)/len(X_iris)*100:.0f}%)")
print(f"  Validation: {len(X_val)} vzorku ({len(X_val)/len(X_iris)*100:.0f}%)")
print(f"  Test:       {len(X_test)} vzorku ({len(X_test)/len(X_iris)*100:.0f}%)")

In [None]:
# Proces ladeni hyperparametru
print("=" * 60)
print("LADENI HYPERPARAMETRU NA VALIDATION SADE")
print("=" * 60)

nejlepsi_k = None
nejlepsi_val_skore = 0

print("\nHledame nejlepsi K:")
for k in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)  # Trenujeme na TRAIN
    
    val_skore = accuracy_score(y_val, knn.predict(X_val))  # Hodnotime na VALIDATION
    
    if val_skore > nejlepsi_val_skore:
        nejlepsi_val_skore = val_skore
        nejlepsi_k = k
        print(f"  K={k}: Validation skore = {val_skore:.4f} <-- NOVE NEJLEPSI!")

print(f"\nNejlepsi K: {nejlepsi_k}")
print(f"Nejlepsi validation skore: {nejlepsi_val_skore:.4f}")

In [None]:
# Finalni hodnoceni na TEST sade
print("=" * 60)
print("FINALNI HODNOCENI NA TEST SADE")
print("=" * 60)

# Trenujeme finalni model na TRAIN + VALIDATION
X_train_final = np.vstack([X_train, X_val])
y_train_final = np.hstack([y_train, y_val])

finalni_model = KNeighborsClassifier(n_neighbors=nejlepsi_k)
finalni_model.fit(X_train_final, y_train_final)

# Hodnotime na TEST (model tato data NIKDY nevidel)
test_skore = accuracy_score(y_test, finalni_model.predict(X_test))

print(f"\nFinalni model: KNN s K={nejlepsi_k}")
print(f"Validation skore: {nejlepsi_val_skore:.4f}")
print(f"TEST SKORE: {test_skore:.4f}")
print("\n>>> Test skore je nejrealistictejsi odhad vykonnosti modelu!")

## 7. Shruti a kontrolni otazky

### Co jsme se naucili:

| Koncept | Popis |
|---------|-------|
| **Train/Test Split** | Rozdeleni dat na trenovaci a testovaci |
| **Overfitting** | Model se nauci data nazpamet, nezobecnuje |
| **Cross-Validation** | Opakované testovani pro robustni vysledky |
| **Validation Set** | Pro ladeni hyperparametru |

### Kontrolni otazky:

1. **Proc nemuzeme trenovat a testovat na stejnych datech?**
2. **Jak poznate overfitting z grafu Train vs Test skore?**
3. **Kolik foldu je typicke pro cross-validation?**
4. **Proc potrebujeme validation sadu krome test sady?**

In [None]:
# Shrnujici vizualizace
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# 1. Train/Test Split
sizes1 = [80, 20]
labels1 = ['Train (80%)', 'Test (20%)']
colors1 = ['lightblue', 'coral']
axes[0].pie(sizes1, labels=labels1, colors=colors1, autopct='%1.0f%%', startangle=90)
axes[0].set_title('1. Train/Test Split')

# 2. Cross-Validation
folds = 5
for i in range(folds):
    y_pos = i * 0.15
    for j in range(folds):
        color = 'coral' if j == i else 'lightblue'
        axes[1].add_patch(plt.Rectangle((j*0.18, y_pos), 0.16, 0.12, 
                                        facecolor=color, edgecolor='black'))
axes[1].set_xlim(-0.05, 1)
axes[1].set_ylim(-0.05, 0.85)
axes[1].set_aspect('equal')
axes[1].axis('off')
axes[1].set_title('2. K-Fold Cross-Validation')

# 3. Train/Val/Test
sizes3 = [60, 20, 20]
labels3 = ['Train (60%)', 'Validation (20%)', 'Test (20%)']
colors3 = ['lightblue', 'lightgreen', 'coral']
axes[2].pie(sizes3, labels=labels3, colors=colors3, autopct='%1.0f%%', startangle=90)
axes[2].set_title('3. Train/Val/Test Split')

plt.suptitle('Metody rozdeleni dat pro ML', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# Test vasich znalosti
print("=" * 60)
print("KVIZ: TEST VASICH ZNALOSTI")
print("=" * 60)

otazky = [
    {
        "otazka": "Proc delime data na train a test?",
        "moznosti": ["A) Aby trenovani bylo rychlejsi", 
                     "B) Abychom zjistili, jak model funguje na novych datech", 
                     "C) Protoze nemame dost pameti"],
        "spravne": "B"
    },
    {
        "otazka": "Co je overfitting?",
        "moznosti": ["A) Model ma spatne vysledky vsude", 
                     "B) Model funguje skvele na train, ale spatne na test", 
                     "C) Model se trénuje prilis pomalu"],
        "spravne": "B"
    },
    {
        "otazka": "K cemu slouzi cross-validation?",
        "moznosti": ["A) K rychlejsimu trenovani", 
                     "B) K ziskani robustnejsiho odhadu vykonnosti", 
                     "C) K vizualizaci dat"],
        "spravne": "B"
    }
]

for i, q in enumerate(otazky, 1):
    print(f"\nOtazka {i}: {q['otazka']}")
    for m in q['moznosti']:
        print(f"  {m}")
    print(f"  --> Spravna odpoved: {q['spravne']}")

print(f"\n" + "="*40)
print("Vsechny odpovedi byly B - vzory v odpovídání jsou ale nebezpecne!")

## 8. Vase vyzva

Zkuste nasledujici experimenty:

1. **Zmente pomer rozdeleni** - zkuste 70/30, 90/10 a sledujte, jak se meni vysledky
2. **Pouzijte jiny model** - nahradte KNN za LogisticRegression
3. **Experimentujte s poctem foldu** - zkuste 3-fold, 10-fold cross-validation

In [None]:
# Vas prostor pro experimentovani
# -----------------------------

# Tip 1: Zmente test_size
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Tip 2: Zmente pocet foldu
# cv_scores = cross_val_score(model, X, y, cv=10)  # 10-fold

# Tip 3: Pouzijte jiny model
# from sklearn.ensemble import RandomForestClassifier
# model = RandomForestClassifier()

print("Experimentujte s kodem vyse!")