# AI Z√°klady - Hodiny 28-30: Klasifikace v praxi

## Obsah:
1. **P≈ôedzpracov√°n√≠ dat pro klasifikaci**
2. **Proces tr√©nov√°n√≠ modelu**
3. **Testov√°n√≠ a vyhodnocen√≠ model≈Ø**
4. **Klasifikace obr√°zk≈Ø - CIFAR-10**
5. **Pokroƒçil√© techniky a optimalizace**
6. **Kompletn√≠ projekt klasifikace**
7. **Interaktivn√≠ aplikace**
8. **Dom√°c√≠ √∫kol**

## 1. P≈ôedzpracov√°n√≠ dat pro klasifikaci

### 1.1 D≈Øle≈æitost p≈ôedzpracov√°n√≠

Kvalita dat je kl√≠ƒçov√° pro √∫spƒõch klasifikaƒçn√≠ho modelu. "Garbage in, garbage out" - ≈°patn√° data vedou k ≈°patn√Ωm v√Ωsledk≈Øm.

In [None]:
# Import v≈°ech pot≈ôebn√Ωch knihoven
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, OneHotEncoder
from sklearn.impute import SimpleImputer, KNNImputer
from sklearn.feature_selection import SelectKBest, f_classif, RFE
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
from sklearn.pipeline import Pipeline
import warnings
warnings.filterwarnings('ignore')

# Pro pr√°ci s obr√°zky
from PIL import Image
import cv2
from skimage import feature, transform

# Nastaven√≠ vizualizace
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.unicode_minus'] = False

# Pro interaktivn√≠ aplikace
import gradio as gr

# Pro CIFAR-10
from sklearn.datasets import fetch_openml

### 1.2 Vytvo≈ôen√≠ syntetick√©ho datasetu s r≈Øzn√Ωmi probl√©my

In [None]:
# Vytvo≈ôen√≠ datasetu s bƒõ≈æn√Ωmi probl√©my
def create_problematic_dataset():
    np.random.seed(42)
    n_samples = 1000
    
    # Generov√°n√≠ dat
    data = {
        # Numerick√© p≈ô√≠znaky s r≈Øzn√Ωmi rozsahy
        'age': np.random.randint(18, 80, n_samples),
        'income': np.random.exponential(50000, n_samples),  # Velmi zkosen√© rozdƒõlen√≠
        'score': np.random.uniform(0, 100, n_samples),
        
        # Kategorick√© p≈ô√≠znaky
        'education': np.random.choice(['high_school', 'bachelor', 'master', 'phd'], n_samples),
        'city': np.random.choice(['Praha', 'Brno', 'Ostrava', 'Plzen', 'Other'], n_samples),
        
        # Bin√°rn√≠ p≈ô√≠znak
        'has_car': np.random.choice([0, 1], n_samples),
        
        # P≈ô√≠znak s outliers
        'spending': np.concatenate([
            np.random.normal(5000, 1000, n_samples-20),
            np.random.normal(50000, 5000, 20)  # 20 outliers
        ])
    }
    
    # P≈ôid√°n√≠ chybƒõj√≠c√≠ch hodnot
    for col in ['income', 'score', 'education']:
        missing_indices = np.random.choice(n_samples, size=int(0.1*n_samples), replace=False)
        if col in ['income', 'score']:
            data[col][missing_indices] = np.nan
        else:
            data[col] = list(data[col])
            for idx in missing_indices:
                data[col][idx] = np.nan
    
    # C√≠lov√° promƒõnn√°
    # Pravdƒõpodobnost z√°vis√≠ na nƒõkolika p≈ô√≠znac√≠ch
    prob = (
        (data['age'] > 40).astype(float) * 0.3 +
        (np.array([1 if inc > 60000 else 0 for inc in data['income']]) * 0.4) +
        (data['has_car'] * 0.3)
    )
    data['target'] = (prob + np.random.normal(0, 0.2, n_samples) > 0.5).astype(int)
    
    return pd.DataFrame(data)

# Vytvo≈ôen√≠ datasetu
df = create_problematic_dataset()
print("Dataset vytvo≈ôen!")
print(f"Rozmƒõry: {df.shape}")
print(f"\nPrvn√≠ch 5 ≈ô√°dk≈Ø:")
print(df.head())
print(f"\nInformace o datech:")
print(df.info())

### 1.3 Exploraƒçn√≠ anal√Ωza dat (EDA)

In [None]:
# Komplexn√≠ EDA
def exploratory_data_analysis(df):
    fig, axes = plt.subplots(3, 3, figsize=(18, 15))
    axes = axes.ravel()
    
    # 1. Rozdƒõlen√≠ c√≠lov√© promƒõnn√©
    df['target'].value_counts().plot(kind='bar', ax=axes[0], color=['skyblue', 'salmon'])
    axes[0].set_title('Rozdƒõlen√≠ c√≠lov√© promƒõnn√©', fontsize=14)
    axes[0].set_xlabel('T≈ô√≠da')
    axes[0].set_ylabel('Poƒçet')
    
    # 2. Chybƒõj√≠c√≠ hodnoty
    missing_data = df.isnull().sum()
    missing_data = missing_data[missing_data > 0]
    if len(missing_data) > 0:
        missing_data.plot(kind='bar', ax=axes[1], color='orange')
        axes[1].set_title('Chybƒõj√≠c√≠ hodnoty', fontsize=14)
        axes[1].set_ylabel('Poƒçet chybƒõj√≠c√≠ch')
    else:
        axes[1].text(0.5, 0.5, '≈Ω√°dn√© chybƒõj√≠c√≠ hodnoty', 
                    ha='center', va='center', fontsize=14)
        axes[1].axis('off')
    
    # 3. Distribuce vƒõku
    df['age'].hist(bins=30, ax=axes[2], color='green', alpha=0.7, edgecolor='black')
    axes[2].set_title('Distribuce vƒõku', fontsize=14)
    axes[2].set_xlabel('Vƒõk')
    axes[2].set_ylabel('ƒåetnost')
    
    # 4. Distribuce p≈ô√≠jmu (log scale)
    income_clean = df['income'].dropna()
    axes[3].hist(income_clean, bins=50, color='purple', alpha=0.7, edgecolor='black')
    axes[3].set_title('Distribuce p≈ô√≠jmu', fontsize=14)
    axes[3].set_xlabel('P≈ô√≠jem')
    axes[3].set_ylabel('ƒåetnost')
    axes[3].set_yscale('log')
    
    # 5. Box plot pro detekci outliers
    numerical_cols = ['age', 'income', 'score', 'spending']
    df_clean = df[numerical_cols].dropna()
    df_clean.boxplot(ax=axes[4])
    axes[4].set_title('Boxplot numerick√Ωch promƒõnn√Ωch', fontsize=14)
    axes[4].set_xticklabels(axes[4].get_xticklabels(), rotation=45)
    
    # 6. Korelaƒçn√≠ matice
    corr_matrix = df.select_dtypes(include=[np.number]).corr()
    sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
               center=0, ax=axes[5], cbar_kws={'label': 'Korelace'})
    axes[5].set_title('Korelaƒçn√≠ matice', fontsize=14)
    
    # 7. Distribuce kategorick√Ωch promƒõnn√Ωch - vzdƒõl√°n√≠
    education_counts = df['education'].value_counts()
    axes[6].pie(education_counts.values, labels=education_counts.index, 
               autopct='%1.1f%%', startangle=90)
    axes[6].set_title('Rozdƒõlen√≠ podle vzdƒõl√°n√≠', fontsize=14)
    
    # 8. Target vs Age
    for target in [0, 1]:
        subset = df[df['target'] == target]['age']
        axes[7].hist(subset, bins=20, alpha=0.5, label=f'Target {target}')
    axes[7].set_title('Vƒõk podle c√≠lov√© promƒõnn√©', fontsize=14)
    axes[7].set_xlabel('Vƒõk')
    axes[7].legend()
    
    # 9. Target vs Income
    df.boxplot(column='income', by='target', ax=axes[8])
    axes[8].set_title('P≈ô√≠jem podle c√≠lov√© promƒõnn√©', fontsize=14)
    axes[8].set_xlabel('Target')
    axes[8].set_ylabel('P≈ô√≠jem')
    
    plt.tight_layout()
    plt.show()
    
    # Statistick√© shrnut√≠
    print("\nSTATISTICK√â SHRNUT√ç:")
    print("="*50)
    print(df.describe())
    
    print("\nROZDƒöLEN√ç C√çLOV√â PROMƒöNN√â:")
    print(df['target'].value_counts(normalize=True))
    
    print("\nCHYBƒöJ√çC√ç HODNOTY:")
    print(df.isnull().sum())

exploratory_data_analysis(df)

### 1.4 Kompletn√≠ pipeline p≈ôedzpracov√°n√≠

In [None]:
# P≈ôedzpracov√°n√≠ dat krok po kroku
def preprocess_data(df, show_steps=True):
    """
    Kompletn√≠ p≈ôedzpracov√°n√≠ dat vƒçetnƒõ vizualizace krok≈Ø
    """
    df_processed = df.copy()
    
    if show_steps:
        fig, axes = plt.subplots(2, 3, figsize=(18, 12))
        axes = axes.ravel()
    
    # Krok 1: O≈°et≈ôen√≠ chybƒõj√≠c√≠ch hodnot
    print("KROK 1: O≈°et≈ôen√≠ chybƒõj√≠c√≠ch hodnot")
    print("-" * 40)
    
    # Numerick√© promƒõnn√© - imputace medi√°nem
    numeric_columns = df_processed.select_dtypes(include=[np.number]).columns
    numeric_columns = numeric_columns.drop('target')  # Vylouƒç√≠me c√≠lovou promƒõnnou
    
    imputer_numeric = SimpleImputer(strategy='median')
    df_processed[numeric_columns] = imputer_numeric.fit_transform(df_processed[numeric_columns])
    
    # Kategorick√© promƒõnn√© - imputace nejƒçastƒõj≈°√≠ hodnotou
    categorical_columns = df_processed.select_dtypes(include=['object']).columns
    
    for col in categorical_columns:
        mode_value = df_processed[col].mode()[0] if not df_processed[col].mode().empty else 'Unknown'
        df_processed[col].fillna(mode_value, inplace=True)
    
    print(f"Chybƒõj√≠c√≠ hodnoty po imputaci: {df_processed.isnull().sum().sum()}")
    
    # Krok 2: O≈°et≈ôen√≠ outliers
    print("\nKROK 2: Detekce a o≈°et≈ôen√≠ outliers")
    print("-" * 40)
    
    # IQR metoda pro detekci outliers
    for col in numeric_columns:
        Q1 = df_processed[col].quantile(0.25)
        Q3 = df_processed[col].quantile(0.75)
        IQR = Q3 - Q1
        lower_bound = Q1 - 1.5 * IQR
        upper_bound = Q3 + 1.5 * IQR
        
        outliers = df_processed[(df_processed[col] < lower_bound) | 
                               (df_processed[col] > upper_bound)][col]
        
        if len(outliers) > 0:
            print(f"{col}: {len(outliers)} outliers detekov√°no")
            # Omezen√≠ outliers na hranice
            df_processed[col] = df_processed[col].clip(lower_bound, upper_bound)
    
    if show_steps:
        # Vizualizace p≈ôed a po o≈°et≈ôen√≠ outliers
        df[numeric_columns].boxplot(ax=axes[0])
        axes[0].set_title('P≈ôed o≈°et≈ôen√≠m outliers', fontsize=12)
        axes[0].tick_params(axis='x', rotation=45)
        
        df_processed[numeric_columns].boxplot(ax=axes[1])
        axes[1].set_title('Po o≈°et≈ôen√≠ outliers', fontsize=12)
        axes[1].tick_params(axis='x', rotation=45)
    
    # Krok 3: Encoding kategorick√Ωch promƒõnn√Ωch
    print("\nKROK 3: Encoding kategorick√Ωch promƒõnn√Ωch")
    print("-" * 40)
    
    # One-hot encoding pro nomin√°ln√≠ promƒõnn√©
    df_encoded = pd.get_dummies(df_processed, columns=['city'], prefix='city')
    
    # Ordinal encoding pro ordin√°ln√≠ promƒõnn√©
    education_mapping = {
        'high_school': 1,
        'bachelor': 2,
        'master': 3,
        'phd': 4
    }
    df_encoded['education_level'] = df_encoded['education'].map(education_mapping)
    df_encoded.drop('education', axis=1, inplace=True)
    
    print(f"Poƒçet p≈ô√≠znak≈Ø po encoding: {len(df_encoded.columns)}")
    
    # Krok 4: ≈†k√°lov√°n√≠ numerick√Ωch p≈ô√≠znak≈Ø
    print("\nKROK 4: ≈†k√°lov√°n√≠ p≈ô√≠znak≈Ø")
    print("-" * 40)
    
    # Oddƒõlen√≠ p≈ô√≠znak≈Ø a c√≠le
    X = df_encoded.drop('target', axis=1)
    y = df_encoded['target']
    
    # Seznam numerick√Ωch sloupc≈Ø pro ≈°k√°lov√°n√≠
    numeric_features = ['age', 'income', 'score', 'spending', 'education_level']
    
    # StandardScaler pro norm√°ln√≠ rozdƒõlen√≠
    scaler_standard = StandardScaler()
    X_scaled = X.copy()
    X_scaled[numeric_features] = scaler_standard.fit_transform(X[numeric_features])
    
    if show_steps:
        # Vizualizace distribuc√≠ p≈ôed a po ≈°k√°lov√°n√≠
        X[numeric_features[:3]].hist(ax=axes[2], bins=20)
        axes[2].set_title('P≈ôed ≈°k√°lov√°n√≠m', fontsize=12)
        
        pd.DataFrame(X_scaled[numeric_features[:3]]).hist(ax=axes[3], bins=20)
        axes[3].set_title('Po ≈°k√°lov√°n√≠ (Standard)', fontsize=12)
    
    # Krok 5: Feature selection
    print("\nKROK 5: V√Ωbƒõr p≈ô√≠znak≈Ø")
    print("-" * 40)
    
    # Pou≈æit√≠ SelectKBest
    selector = SelectKBest(score_func=f_classif, k=10)
    X_selected = selector.fit_transform(X_scaled, y)
    
    # Z√≠sk√°n√≠ n√°zv≈Ø vybran√Ωch p≈ô√≠znak≈Ø
    selected_features = X.columns[selector.get_support()]
    feature_scores = selector.scores_
    
    if show_steps:
        # Vizualizace d≈Øle≈æitosti p≈ô√≠znak≈Ø
        feature_importance = pd.DataFrame({
            'feature': X.columns,
            'score': feature_scores
        }).sort_values('score', ascending=False)
        
        feature_importance.head(10).plot(x='feature', y='score', kind='barh', ax=axes[4])
        axes[4].set_title('Top 10 p≈ô√≠znak≈Ø podle F-score', fontsize=12)
        axes[4].set_xlabel('F-score')
    
    print(f"Vybran√© p≈ô√≠znaky: {list(selected_features)}")
    
    # Krok 6: Rozdƒõlen√≠ na tr√©novac√≠ a testovac√≠ data
    print("\nKROK 6: Rozdƒõlen√≠ dat")
    print("-" * 40)
    
    X_train, X_test, y_train, y_test = train_test_split(
        X_selected, y, test_size=0.2, random_state=42, stratify=y
    )
    
    print(f"Tr√©novac√≠ set: {X_train.shape}")
    print(f"Testovac√≠ set: {X_test.shape}")
    print(f"Rozdƒõlen√≠ t≈ô√≠d v tr√©novac√≠m setu:\n{y_train.value_counts(normalize=True)}")
    
    if show_steps:
        # Vizualizace rozdƒõlen√≠
        train_counts = y_train.value_counts()
        test_counts = y_test.value_counts()
        
        x = np.arange(2)
        width = 0.35
        
        axes[5].bar(x - width/2, train_counts.values, width, label='Train', color='blue', alpha=0.7)
        axes[5].bar(x + width/2, test_counts.values, width, label='Test', color='orange', alpha=0.7)
        axes[5].set_xlabel('T≈ô√≠da')
        axes[5].set_ylabel('Poƒçet vzork≈Ø')
        axes[5].set_title('Rozdƒõlen√≠ t≈ô√≠d v train/test', fontsize=12)
        axes[5].set_xticks(x)
        axes[5].set_xticklabels(['0', '1'])
        axes[5].legend()
        
        plt.tight_layout()
        plt.show()
    
    return X_train, X_test, y_train, y_test, scaler_standard, selector

# Proveden√≠ p≈ôedzpracov√°n√≠
X_train, X_test, y_train, y_test, scaler, selector = preprocess_data(df)

## 2. Proces tr√©nov√°n√≠ modelu

### 2.1 V√Ωbƒõr a porovn√°n√≠ r≈Øzn√Ωch klasifik√°tor≈Ø

In [None]:
# Tr√©nov√°n√≠ a porovn√°n√≠ r≈Øzn√Ωch model≈Ø
def train_and_compare_models(X_train, X_test, y_train, y_test):
    """
    Tr√©nov√°n√≠ r≈Øzn√Ωch klasifik√°tor≈Ø a jejich porovn√°n√≠
    """
    # Definice model≈Ø
    models = {
        'Logistic Regression': LogisticRegression(random_state=42),
        'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
        'SVM': SVC(probability=True, random_state=42),
        'KNN': KNeighborsClassifier(n_neighbors=5),
        'Decision Tree': DecisionTreeClassifier(random_state=42),
        'Naive Bayes': GaussianNB()
    }
    
    results = {}
    
    # Vizualizace
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    axes = axes.ravel()
    
    print("="*70)
    print("TR√âNOV√ÅN√ç A HODNOCEN√ç MODEL≈Æ")
    print("="*70)
    
    for idx, (name, model) in enumerate(models.items()):
        print(f"\n{name}:")
        print("-" * len(name))
        
        # Tr√©nov√°n√≠
        model.fit(X_train, y_train)
        
        # Predikce
        y_pred = model.predict(X_test)
        y_pred_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else None
        
        # Metriky
        accuracy = accuracy_score(y_test, y_pred)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)
        
        # Cross-validation
        cv_scores = cross_val_score(model, X_train, y_train, cv=5)
        
        results[name] = {
            'model': model,
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'f1': f1,
            'cv_mean': cv_scores.mean(),
            'cv_std': cv_scores.std(),
            'predictions': y_pred,
            'probabilities': y_pred_proba
        }
        
        print(f"Accuracy: {accuracy:.3f}")
        print(f"Precision: {precision:.3f}")
        print(f"Recall: {recall:.3f}")
        print(f"F1-Score: {f1:.3f}")
        print(f"CV Score: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")
        
        # Matice z√°mƒõn
        cm = confusion_matrix(y_test, y_pred)
        
        # Vizualizace matice z√°mƒõn
        ax = axes[idx]
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                   xticklabels=['0', '1'], yticklabels=['0', '1'], ax=ax)
        ax.set_title(f'{name}\nAccuracy: {accuracy:.3f}', fontsize=12)
        ax.set_xlabel('Predikce')
        ax.set_ylabel('Skuteƒçnost')
    
    plt.tight_layout()
    plt.show()
    
    return results

# Tr√©nov√°n√≠ model≈Ø
model_results = train_and_compare_models(X_train, X_test, y_train, y_test)

### 2.2 Hyperparameter tuning

In [None]:
# Grid Search pro optimalizaci hyperparametr≈Ø
def hyperparameter_tuning(X_train, y_train):
    """
    Optimalizace hyperparametr≈Ø pomoc√≠ Grid Search
    """
    print("="*70)
    print("HYPERPARAMETER TUNING - RANDOM FOREST")
    print("="*70)
    
    # Definice parametr≈Ø pro Grid Search
    param_grid = {
        'n_estimators': [50, 100, 200],
        'max_depth': [None, 10, 20, 30],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4]
    }
    
    # Vytvo≈ôen√≠ modelu
    rf = RandomForestClassifier(random_state=42)
    
    # Grid Search
    grid_search = GridSearchCV(
        estimator=rf,
        param_grid=param_grid,
        cv=5,
        scoring='f1',
        n_jobs=-1,
        verbose=1
    )
    
    print("Prob√≠h√° Grid Search...")
    grid_search.fit(X_train, y_train)
    
    # V√Ωsledky
    print(f"\nNejlep≈°√≠ parametry: {grid_search.best_params_}")
    print(f"Nejlep≈°√≠ F1 sk√≥re: {grid_search.best_score_:.3f}")
    
    # Vizualizace v√Ωsledk≈Ø Grid Search
    results_df = pd.DataFrame(grid_search.cv_results_)
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
    
    # Graf 1: Vliv poƒçtu strom≈Ø
    n_estimators_scores = results_df.groupby('param_n_estimators')['mean_test_score'].mean()
    ax1.plot(n_estimators_scores.index, n_estimators_scores.values, 'bo-', linewidth=2, markersize=8)
    ax1.set_xlabel('Poƒçet strom≈Ø')
    ax1.set_ylabel('Pr≈Ømƒõrn√© F1 sk√≥re')
    ax1.set_title('Vliv poƒçtu strom≈Ø na v√Ωkon', fontsize=14)
    ax1.grid(True, alpha=0.3)
    
    # Graf 2: Heatmapa pro dva parametry
    pivot_table = results_df.pivot_table(
        values='mean_test_score',
        index='param_max_depth',
        columns='param_min_samples_split'
    )
    
    sns.heatmap(pivot_table, annot=True, fmt='.3f', cmap='YlOrRd', ax=ax2)
    ax2.set_title('F1 sk√≥re pro r≈Øzn√© kombinace parametr≈Ø', fontsize=14)
    ax2.set_xlabel('min_samples_split')
    ax2.set_ylabel('max_depth')
    
    plt.tight_layout()
    plt.show()
    
    return grid_search.best_estimator_

# Optimalizace hyperparametr≈Ø
best_model = hyperparameter_tuning(X_train, y_train)

## 3. Testov√°n√≠ a vyhodnocen√≠ model≈Ø

### 3.1 Detailn√≠ evaluace nejlep≈°√≠ho modelu

In [None]:
# Kompletn√≠ evaluace modelu
def comprehensive_model_evaluation(model, X_test, y_test, model_name="Model"):
    """
    Detailn√≠ evaluace modelu vƒçetnƒõ r≈Øzn√Ωch metrik a vizualizac√≠
    """
    # Predikce
    y_pred = model.predict(X_test)
    y_pred_proba = model.predict_proba(X_test)[:, 1]
    
    # Vytvo≈ôen√≠ subplot≈Ø
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    axes = axes.ravel()
    
    # 1. Matice z√°mƒõn
    cm = confusion_matrix(y_test, y_pred)
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
               xticklabels=['Negative', 'Positive'],
               yticklabels=['Negative', 'Positive'],
               ax=axes[0])
    axes[0].set_title(f'Matice z√°mƒõn - {model_name}', fontsize=14)
    axes[0].set_xlabel('Predikce')
    axes[0].set_ylabel('Skuteƒçnost')
    
    # 2. ROC k≈ôivka
    fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
    roc_auc = auc(fpr, tpr)
    
    axes[1].plot(fpr, tpr, color='darkorange', lw=2, 
                label=f'ROC k≈ôivka (AUC = {roc_auc:.2f})')
    axes[1].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='N√°hodn√Ω klasifik√°tor')
    axes[1].set_xlim([0.0, 1.0])
    axes[1].set_ylim([0.0, 1.05])
    axes[1].set_xlabel('False Positive Rate')
    axes[1].set_ylabel('True Positive Rate')
    axes[1].set_title('ROC k≈ôivka', fontsize=14)
    axes[1].legend(loc="lower right")
    axes[1].grid(True, alpha=0.3)
    
    # 3. Precision-Recall k≈ôivka
    from sklearn.metrics import precision_recall_curve
    precision_curve, recall_curve, _ = precision_recall_curve(y_test, y_pred_proba)
    
    axes[2].plot(recall_curve, precision_curve, color='green', lw=2)
    axes[2].set_xlabel('Recall')
    axes[2].set_ylabel('Precision')
    axes[2].set_title('Precision-Recall k≈ôivka', fontsize=14)
    axes[2].grid(True, alpha=0.3)
    axes[2].fill_between(recall_curve, precision_curve, alpha=0.2, color='green')
    
    # 4. Distribuce pravdƒõpodobnost√≠
    axes[3].hist(y_pred_proba[y_test == 0], bins=30, alpha=0.5, 
                label='Negativn√≠ t≈ô√≠da', color='blue', density=True)
    axes[3].hist(y_pred_proba[y_test == 1], bins=30, alpha=0.5, 
                label='Pozitivn√≠ t≈ô√≠da', color='red', density=True)
    axes[3].axvline(x=0.5, color='black', linestyle='--', label='Pr√°h')
    axes[3].set_xlabel('Predikovan√° pravdƒõpodobnost')
    axes[3].set_ylabel('Hustota')
    axes[3].set_title('Distribuce predikovan√Ωch pravdƒõpodobnost√≠', fontsize=14)
    axes[3].legend()
    
    # 5. Feature importance (pokud model podporuje)
    if hasattr(model, 'feature_importances_'):
        importances = model.feature_importances_
        indices = np.argsort(importances)[::-1][:10]
        
        axes[4].bar(range(10), importances[indices], color='purple', alpha=0.7)
        axes[4].set_xlabel('Index p≈ô√≠znaku')
        axes[4].set_ylabel('D≈Øle≈æitost')
        axes[4].set_title('Top 10 nejd≈Øle≈æitƒõj≈°√≠ch p≈ô√≠znak≈Ø', fontsize=14)
        axes[4].set_xticks(range(10))
        axes[4].set_xticklabels([f'F{i}' for i in indices], rotation=45)
    else:
        axes[4].text(0.5, 0.5, 'Feature importance\nnen√≠ k dispozici', 
                    ha='center', va='center', fontsize=14)
        axes[4].axis('off')
    
    # 6. Metriky podle prahu
    thresholds_to_test = np.linspace(0.1, 0.9, 9)
    metrics_by_threshold = []
    
    for threshold in thresholds_to_test:
        y_pred_threshold = (y_pred_proba >= threshold).astype(int)
        metrics_by_threshold.append({
            'threshold': threshold,
            'precision': precision_score(y_test, y_pred_threshold, zero_division=0),
            'recall': recall_score(y_test, y_pred_threshold),
            'f1': f1_score(y_test, y_pred_threshold)
        })
    
    metrics_df = pd.DataFrame(metrics_by_threshold)
    
    axes[5].plot(metrics_df['threshold'], metrics_df['precision'], 'b-', label='Precision', linewidth=2)
    axes[5].plot(metrics_df['threshold'], metrics_df['recall'], 'r-', label='Recall', linewidth=2)
    axes[5].plot(metrics_df['threshold'], metrics_df['f1'], 'g-', label='F1-Score', linewidth=2)
    axes[5].set_xlabel('Pr√°h')
    axes[5].set_ylabel('Hodnota metriky')
    axes[5].set_title('Metriky podle rozhodovac√≠ho prahu', fontsize=14)
    axes[5].legend()
    axes[5].grid(True, alpha=0.3)
    axes[5].axvline(x=0.5, color='black', linestyle='--', alpha=0.5)
    
    plt.tight_layout()
    plt.show()
    
    # Klasifikaƒçn√≠ report
    print("\nKLASIFIKAƒåN√ç REPORT:")
    print("=" * 50)
    print(classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
    
    # Dal≈°√≠ metriky
    print("\nDAL≈†√ç METRIKY:")
    print("=" * 50)
    print(f"AUC-ROC: {roc_auc:.3f}")
    print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
    
    # Anal√Ωza chyb
    false_positives = ((y_pred == 1) & (y_test == 0)).sum()
    false_negatives = ((y_pred == 0) & (y_test == 1)).sum()
    
    print(f"\nFalse Positives: {false_positives}")
    print(f"False Negatives: {false_negatives}")

# Evaluace nejlep≈°√≠ho modelu
comprehensive_model_evaluation(best_model, X_test, y_test, "Optimized Random Forest")

## 4. Klasifikace obr√°zk≈Ø - CIFAR-10

### 4.1 Naƒçten√≠ a p≈ô√≠prava CIFAR-10 datasetu

In [None]:
# Vytvo≈ôen√≠ zjednodu≈°en√© verze CIFAR-10 pro sklearn
def prepare_cifar10_subset():
    """
    P≈ô√≠prava mal√©ho subsetu CIFAR-10 pro demonstraci
    """
    print("P≈ô√≠prava CIFAR-10 subsetu...")
    
    # Pro demonstraci vytvo≈ô√≠me syntetick√° data podobn√° CIFAR-10
    # V re√°ln√© aplikaci byste pou≈æili skuteƒçn√Ω dataset
    
    n_samples_per_class = 100
    n_classes = 10
    image_size = 32
    n_channels = 3
    
    # T≈ô√≠dy CIFAR-10
    classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
               'dog', 'frog', 'horse', 'ship', 'truck']
    
    # Generov√°n√≠ syntetick√Ωch dat (v praxi byste naƒçetli skuteƒçn√© obr√°zky)
    X_images = []
    y_labels = []
    
    np.random.seed(42)
    
    for class_idx in range(n_classes):
        for _ in range(n_samples_per_class):
            # Simulace obr√°zku s r≈Øzn√Ωmi charakteristikami pro ka≈ædou t≈ô√≠du
            if class_idx in [0, 8]:  # airplane, ship - v√≠ce modr√© (nebe/voda)
                image = np.random.normal(loc=[0.3, 0.5, 0.8], scale=0.2, 
                                       size=(image_size, image_size, n_channels))
            elif class_idx in [2, 6]:  # bird, frog - v√≠ce zelen√©
                image = np.random.normal(loc=[0.4, 0.7, 0.3], scale=0.2, 
                                       size=(image_size, image_size, n_channels))
            elif class_idx in [3, 4, 5, 7]:  # animals - hnƒõd√© t√≥ny
                image = np.random.normal(loc=[0.6, 0.4, 0.3], scale=0.2, 
                                       size=(image_size, image_size, n_channels))
            else:  # vehicles - ≈°ed√© t√≥ny
                image = np.random.normal(loc=[0.5, 0.5, 0.5], scale=0.2, 
                                       size=(image_size, image_size, n_channels))
            
            # P≈ôid√°n√≠ nƒõkter√Ωch vzor≈Ø
            if class_idx % 2 == 0:
                # Horizont√°ln√≠ pruhy
                for i in range(0, image_size, 4):
                    image[i:i+2, :, :] *= 1.2
            else:
                # Vertik√°ln√≠ pruhy
                for j in range(0, image_size, 4):
                    image[:, j:j+2, :] *= 1.2
            
            # Omezen√≠ hodnot na rozsah [0, 1]
            image = np.clip(image, 0, 1)
            
            X_images.append(image)
            y_labels.append(class_idx)
    
    X_images = np.array(X_images)
    y_labels = np.array(y_labels)
    
    # Vizualizace uk√°zkov√Ωch obr√°zk≈Ø
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.ravel()
    
    for i in range(10):
        # N√°hodn√Ω obr√°zek z ka≈æd√© t≈ô√≠dy
        idx = np.where(y_labels == i)[0][0]
        axes[i].imshow(X_images[idx])
        axes[i].set_title(f'{classes[i]}', fontsize=12)
        axes[i].axis('off')
    
    plt.suptitle('Uk√°zky obr√°zk≈Ø z ka≈æd√© t≈ô√≠dy', fontsize=16)
    plt.tight_layout()
    plt.show()
    
    return X_images, y_labels, classes

# P≈ô√≠prava dat
X_cifar, y_cifar, class_names = prepare_cifar10_subset()

### 4.2 Feature extraction pro obr√°zky

In [None]:
# Extrakce p≈ô√≠znak≈Ø z obr√°zk≈Ø
def extract_image_features(images, method='histogram'):
    """
    Extrakce p≈ô√≠znak≈Ø z obr√°zk≈Ø pro pou≈æit√≠ v sklearn
    """
    features = []
    
    for img in images:
        if method == 'histogram':
            # Barevn√Ω histogram
            hist_features = []
            for channel in range(3):
                hist, _ = np.histogram(img[:, :, channel], bins=16, range=(0, 1))
                hist_features.extend(hist)
            features.append(hist_features)
            
        elif method == 'statistics':
            # Statistick√© p≈ô√≠znaky
            stat_features = []
            for channel in range(3):
                channel_data = img[:, :, channel]
                stat_features.extend([
                    np.mean(channel_data),
                    np.std(channel_data),
                    np.min(channel_data),
                    np.max(channel_data),
                    np.median(channel_data)
                ])
            features.append(stat_features)
            
        elif method == 'combined':
            # Kombinace r≈Øzn√Ωch p≈ô√≠znak≈Ø
            combined_features = []
            
            # Barevn√© histogramy
            for channel in range(3):
                hist, _ = np.histogram(img[:, :, channel], bins=8, range=(0, 1))
                combined_features.extend(hist)
            
            # Statistiky
            for channel in range(3):
                channel_data = img[:, :, channel]
                combined_features.extend([
                    np.mean(channel_data),
                    np.std(channel_data)
                ])
            
            # Edge detection features
            gray = np.mean(img, axis=2)
            edges = np.abs(np.diff(gray, axis=0)).mean() + np.abs(np.diff(gray, axis=1)).mean()
            combined_features.append(edges)
            
            features.append(combined_features)
    
    return np.array(features)

# Extrakce p≈ô√≠znak≈Ø r≈Øzn√Ωmi metodami
print("Extrakce p≈ô√≠znak≈Ø z obr√°zk≈Ø...")

# Porovn√°n√≠ r≈Øzn√Ωch metod
feature_methods = ['histogram', 'statistics', 'combined']
feature_results = {}

for method in feature_methods:
    X_features = extract_image_features(X_cifar, method=method)
    print(f"\nMetoda '{method}': {X_features.shape[1]} p≈ô√≠znak≈Ø")
    
    # Rozdƒõlen√≠ dat
    X_train_img, X_test_img, y_train_img, y_test_img = train_test_split(
        X_features, y_cifar, test_size=0.3, random_state=42, stratify=y_cifar
    )
    
    # Standardizace
    scaler = StandardScaler()
    X_train_img_scaled = scaler.fit_transform(X_train_img)
    X_test_img_scaled = scaler.transform(X_test_img)
    
    feature_results[method] = {
        'X_train': X_train_img_scaled,
        'X_test': X_test_img_scaled,
        'y_train': y_train_img,
        'y_test': y_test_img,
        'scaler': scaler
    }

### 4.3 Klasifikace obr√°zk≈Ø

In [None]:
# Klasifikace obr√°zk≈Ø pomoc√≠ r≈Øzn√Ωch model≈Ø
def image_classification_comparison():
    """
    Porovn√°n√≠ r≈Øzn√Ωch klasifik√°tor≈Ø na obr√°zkov√Ωch datech
    """
    # Pou≈æijeme combined features
    data = feature_results['combined']
    X_train = data['X_train']
    X_test = data['X_test']
    y_train = data['y_train']
    y_test = data['y_test']
    
    # Definice model≈Ø
    models = {
        'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
        'SVM': SVC(probability=True, random_state=42),
        'KNN': KNeighborsClassifier(n_neighbors=5),
        'Logistic Regression': LogisticRegression(max_iter=1000, random_state=42)
    }
    
    results = {}
    
    # Vytvo≈ôen√≠ subplot≈Ø
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    axes = axes.ravel()
    
    for idx, (name, model) in enumerate(models.items()):
        print(f"\nTr√©nov√°n√≠ {name}...")
        
        # Tr√©nov√°n√≠
        model.fit(X_train, y_train)
        
        # Predikce
        y_pred = model.predict(X_test)
        
        # Metriky
        accuracy = accuracy_score(y_test, y_pred)
        
        results[name] = {
            'model': model,
            'accuracy': accuracy,
            'predictions': y_pred
        }
        
        print(f"Accuracy: {accuracy:.3f}")
        
        # Matice z√°mƒõn
        cm = confusion_matrix(y_test, y_pred)
        
        # Vizualizace
        ax = axes[idx]
        im = ax.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)
        ax.figure.colorbar(im, ax=ax)
        ax.set(xticks=np.arange(cm.shape[1]),
               yticks=np.arange(cm.shape[0]),
               xticklabels=range(10),
               yticklabels=range(10),
               title=f'{name}\nAccuracy: {accuracy:.3f}',
               ylabel='Skuteƒçn√° t≈ô√≠da',
               xlabel='Predikovan√° t≈ô√≠da')
        
        # Rotace popisk≈Ø
        plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
        
        # P≈ôid√°n√≠ textu do bunƒõk
        thresh = cm.max() / 2.
        for i in range(cm.shape[0]):
            for j in range(cm.shape[1]):
                ax.text(j, i, format(cm[i, j], 'd'),
                       ha="center", va="center",
                       color="white" if cm[i, j] > thresh else "black",
                       fontsize=8)
    
    plt.tight_layout()
    plt.show()
    
    # Celkov√© porovn√°n√≠
    print("\n" + "="*50)
    print("CELKOV√â POROVN√ÅN√ç MODEL≈Æ")
    print("="*50)
    
    model_names = list(results.keys())
    accuracies = [results[name]['accuracy'] for name in model_names]
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(model_names, accuracies, color=['blue', 'green', 'red', 'orange'])
    plt.ylabel('Accuracy')
    plt.title('Porovn√°n√≠ p≈ôesnosti model≈Ø na klasifikaci obr√°zk≈Ø', fontsize=14)
    plt.ylim(0, 1)
    
    # P≈ôid√°n√≠ hodnot na grafy
    for bar, acc in zip(bars, accuracies):
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                f'{acc:.3f}', ha='center', va='bottom')
    
    plt.grid(True, alpha=0.3, axis='y')
    plt.tight_layout()
    plt.show()
    
    return results

# Spu≈°tƒõn√≠ klasifikace
image_classification_results = image_classification_comparison()

### 4.4 Vizualizace ≈°patnƒõ klasifikovan√Ωch obr√°zk≈Ø

In [None]:
# Anal√Ωza chyb klasifikace
def analyze_misclassified_images():
    """
    Vizualizace a anal√Ωza ≈°patnƒõ klasifikovan√Ωch obr√°zk≈Ø
    """
    # Pou≈æijeme nejlep≈°√≠ model
    best_model_name = max(image_classification_results, 
                         key=lambda x: image_classification_results[x]['accuracy'])
    best_result = image_classification_results[best_model_name]
    
    y_pred = best_result['predictions']
    y_test = feature_results['combined']['y_test']
    
    # Najdeme ≈°patnƒõ klasifikovan√©
    misclassified_idx = np.where(y_pred != y_test)[0]
    
    print(f"Poƒçet ≈°patnƒõ klasifikovan√Ωch obr√°zk≈Ø: {len(misclassified_idx)} z {len(y_test)}")
    
    # Vizualizace prvn√≠ch 12 chyb
    n_show = min(12, len(misclassified_idx))
    
    if n_show > 0:
        fig, axes = plt.subplots(3, 4, figsize=(15, 12))
        axes = axes.ravel()
        
        # Z√≠sk√°n√≠ index≈Ø v p≈Øvodn√≠m datasetu
        test_indices = np.arange(len(X_cifar))[len(X_cifar)*0.7:]
        
        for i in range(n_show):
            idx = misclassified_idx[i]
            original_idx = test_indices[idx]
            
            axes[i].imshow(X_cifar[original_idx])
            axes[i].set_title(f'Skuteƒçnost: {class_names[y_test[idx]]}\n' +
                             f'Predikce: {class_names[y_pred[idx]]}',
                             fontsize=10)
            axes[i].axis('off')
        
        # Skryt√≠ pr√°zdn√Ωch subplot
        for i in range(n_show, 12):
            axes[i].axis('off')
        
        plt.suptitle(f'≈†patnƒõ klasifikovan√© obr√°zky - {best_model_name}', fontsize=16)
        plt.tight_layout()
        plt.show()
    
    # Matice z√°mƒõn pro anal√Ωzu ƒçast√Ωch chyb
    cm = confusion_matrix(y_test, y_pred)
    
    # Nejƒçastƒõj≈°√≠ z√°mƒõny
    print("\nNejƒçastƒõj≈°√≠ z√°mƒõny:")
    print("="*40)
    
    # Vytvo≈ôen√≠ kopie matice z√°mƒõn bez diagon√°ly
    cm_no_diag = cm.copy()
    np.fill_diagonal(cm_no_diag, 0)
    
    # Najdeme 5 nejvƒõt≈°√≠ch hodnot
    for _ in range(5):
        max_idx = np.unravel_index(cm_no_diag.argmax(), cm_no_diag.shape)
        if cm_no_diag[max_idx] > 0:
            print(f"{class_names[max_idx[0]]} ‚Üí {class_names[max_idx[1]]}: "
                  f"{cm_no_diag[max_idx]} p≈ô√≠pad≈Ø")
            cm_no_diag[max_idx] = 0

analyze_misclassified_images()

## 5. Interaktivn√≠ aplikace pro klasifikaci

### 5.1 Gradio aplikace pro klasifikaci

In [None]:
# Vytvo≈ôen√≠ interaktivn√≠ aplikace
def create_classification_app():
    """
    Interaktivn√≠ aplikace pro klasifikaci dat
    """
    # P≈ô√≠prava model≈Ø
    # Pou≈æijeme n√°≈° nejlep≈°√≠ model z p≈ôedchoz√≠ch krok≈Ø
    
    def classify_tabular_data(age, income, score, education, has_car, spending):
        """
        Klasifikace tabulkov√Ωch dat
        """
        # Vytvo≈ôen√≠ DataFrame s jedn√≠m ≈ô√°dkem
        input_data = pd.DataFrame({
            'age': [age],
            'income': [income],
            'score': [score],
            'education': [education],
            'city': ['Praha'],  # Default hodnota
            'has_car': [int(has_car)],
            'spending': [spending],
            'target': [0]  # Dummy hodnota
        })
        
        # Aplikace stejn√©ho p≈ôedzpracov√°n√≠
        # One-hot encoding
        input_encoded = pd.get_dummies(input_data, columns=['city'], prefix='city')
        
        # Ordinal encoding pro education
        education_mapping = {
            'high_school': 1,
            'bachelor': 2,
            'master': 3,
            'phd': 4
        }
        input_encoded['education_level'] = input_encoded['education'].map(education_mapping)
        input_encoded.drop(['education', 'target'], axis=1, inplace=True)
        
        # Zaji≈°tƒõn√≠ spr√°vn√©ho po≈ôad√≠ sloupc≈Ø
        expected_columns = ['age', 'income', 'score', 'has_car', 'spending', 
                           'education_level', 'city_Brno', 'city_Ostrava', 
                           'city_Other', 'city_Plzen', 'city_Praha']
        
        # P≈ôid√°n√≠ chybƒõj√≠c√≠ch sloupc≈Ø
        for col in expected_columns:
            if col not in input_encoded.columns:
                input_encoded[col] = 0
        
        input_encoded = input_encoded[expected_columns]
        
        # ≈†k√°lov√°n√≠
        numeric_features = ['age', 'income', 'score', 'spending', 'education_level']
        input_encoded[numeric_features] = scaler.transform(input_encoded[numeric_features])
        
        # Feature selection
        input_selected = selector.transform(input_encoded)
        
        # Predikce
        prediction = best_model.predict(input_selected)[0]
        probabilities = best_model.predict_proba(input_selected)[0]
        
        # Vytvo≈ôen√≠ grafu
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
        
        # Graf pravdƒõpodobnost√≠
        classes = ['T≈ô√≠da 0', 'T≈ô√≠da 1']
        colors = ['blue', 'red']
        bars = ax1.bar(classes, probabilities, color=colors, alpha=0.7)
        ax1.set_ylim(0, 1)
        ax1.set_ylabel('Pravdƒõpodobnost')
        ax1.set_title('Pravdƒõpodobnosti t≈ô√≠d', fontsize=14)
        
        # P≈ôid√°n√≠ hodnot
        for bar, prob in zip(bars, probabilities):
            height = bar.get_height()
            ax1.text(bar.get_x() + bar.get_width()/2., height + 0.01,
                    f'{prob:.3f}', ha='center', va='bottom')
        
        # D≈Øvƒõra v predikci
        confidence = max(probabilities)
        ax2.pie([confidence, 1-confidence], labels=['D≈Øvƒõra', 'Nejistota'],
               colors=['green', 'lightgray'], autopct='%1.1f%%', startangle=90)
        ax2.set_title('D≈Øvƒõra v predikci', fontsize=14)
        
        plt.tight_layout()
        
        # Textov√Ω v√Ωstup
        result_text = f"""## V√Ωsledky klasifikace

**Predikovan√° t≈ô√≠da:** {prediction}
**D≈Øvƒõra:** {confidence:.1%}

### Pravdƒõpodobnosti:
- T≈ô√≠da 0: {probabilities[0]:.3f}
- T≈ô√≠da 1: {probabilities[1]:.3f}

### Interpretace:
"""
        
        if confidence > 0.8:
            result_text += "Model je velmi jist√Ω ve sv√© predikci."
        elif confidence > 0.6:
            result_text += "Model je relativnƒõ jist√Ω, ale existuje urƒçit√° nejistota."
        else:
            result_text += "Model si nen√≠ p≈ô√≠li≈° jist√Ω - predikce je na hranƒõ."
        
        return fig, result_text
    
    # Vytvo≈ôen√≠ Gradio interface
    with gr.Blocks(title="Klasifik√°tor") as demo:
        gr.Markdown("# ü§ñ Interaktivn√≠ klasifik√°tor")
        gr.Markdown("""Zadejte hodnoty pro klasifikaci. Model p≈ôedpov√≠ t≈ô√≠du a zobraz√≠ svou d≈Øvƒõru v predikci.""")
        
        with gr.Row():
            with gr.Column():
                age_input = gr.Slider(minimum=18, maximum=80, value=35, 
                                     label="Vƒõk", step=1)
                income_input = gr.Number(value=50000, label="P≈ô√≠jem")
                score_input = gr.Slider(minimum=0, maximum=100, value=75, 
                                       label="Sk√≥re")
                education_input = gr.Dropdown(
                    choices=['high_school', 'bachelor', 'master', 'phd'],
                    value='bachelor',
                    label="Vzdƒõl√°n√≠"
                )
                has_car_input = gr.Checkbox(value=True, label="Vlastn√≠ auto")
                spending_input = gr.Number(value=5000, label="V√Ωdaje")
                
                classify_btn = gr.Button("üîç Klasifikovat", variant="primary")
            
            with gr.Column():
                output_text = gr.Markdown("### V√Ωsledky se zobraz√≠ zde...")
        
        output_plot = gr.Plot(label="Vizualizace")
        
        classify_btn.click(
            classify_tabular_data,
            inputs=[age_input, income_input, score_input, education_input, 
                   has_car_input, spending_input],
            outputs=[output_plot, output_text]
        )
        
        gr.Markdown("""### üìä O aplikaci
        
Tato aplikace demonstruje proces klasifikace v praxi:
1. **P≈ôedzpracov√°n√≠ dat** - ≈°k√°lov√°n√≠ a encoding
2. **Feature selection** - v√Ωbƒõr d≈Øle≈æit√Ωch p≈ô√≠znak≈Ø
3. **Predikce** - pou≈æit√≠ natr√©novan√©ho modelu
4. **Interpretace** - zobrazen√≠ pravdƒõpodobnost√≠ a d≈Øvƒõry
        """)
    
    return demo

# Spu≈°tƒõn√≠ aplikace
app = create_classification_app()
app.launch(share=True)

## 6. Shrnut√≠ a kl√≠ƒçov√© koncepty

### Co jsme se nauƒçili:

1. **P≈ôedzpracov√°n√≠ dat**
   - O≈°et≈ôen√≠ chybƒõj√≠c√≠ch hodnot
   - Detekce a o≈°et≈ôen√≠ outliers
   - Encoding kategorick√Ωch promƒõnn√Ωch
   - ≈†k√°lov√°n√≠ p≈ô√≠znak≈Ø
   - Feature selection

2. **Tr√©nov√°n√≠ modelu**
   - V√Ωbƒõr vhodn√©ho klasifik√°toru
   - Cross-validation
   - Hyperparameter tuning
   - Ensemble metody

3. **Evaluace modelu**
   - Matice z√°mƒõn
   - ROC k≈ôivka a AUC
   - Precision, Recall, F1-Score
   - Anal√Ωza chyb

4. **Klasifikace obr√°zk≈Ø**
   - Feature extraction
   - Pr√°ce s vysokodimenzion√°ln√≠mi daty
   - Transfer learning (koncept)

### Best practices:

- **V≈ædy zaƒçnƒõte s EDA** - pochopte sv√° data
- **Nezapome≈àte na preprocessing** - kvalita dat je kl√≠ƒçov√°
- **Pou≈æijte cross-validation** - pro robustn√≠ odhady
- **Porovnejte v√≠ce model≈Ø** - ≈æ√°dn√Ω nen√≠ univerz√°lnƒõ nejlep≈°√≠
- **Interpretujte v√Ωsledky** - nestaƒç√≠ jen vysok√° p≈ôesnost

## 7. Dom√°c√≠ √∫kol

### √ökol 1: Pokroƒçil√© p≈ôedzpracov√°n√≠
Implementujte:
- PCA pro redukci dimenzionality
- SMOTE pro vyv√°≈æen√≠ t≈ô√≠d
- Feature engineering - vytvo≈ôen√≠ nov√Ωch p≈ô√≠znak≈Ø
- Porovnejte v√Ωsledky s p≈Øvodn√≠m p≈ô√≠stupem

### √ökol 2: Ensemble metody
Vytvo≈ôte:
- Voting classifier kombinuj√≠c√≠ r≈Øzn√© modely
- Stacking s meta-learnerem
- Porovnejte s jednotliv√Ωmi modely

### √ökol 3: Real-world dataset
St√°hnƒõte skuteƒçn√Ω dataset (nap≈ô. z Kaggle):
- Proveƒète kompletn√≠ anal√Ωzu
- Vytvo≈ôte pipeline pro preprocessing
- Natr√©nujte a vyhodno≈•te model
- Vytvo≈ôte prezentaci v√Ωsledk≈Ø

### Bonusov√Ω √∫kol: AutoML
Vyzkou≈°ejte AutoML n√°stroj:
- Pou≈æijte nap≈ô. AutoSklearn nebo TPOT
- Porovnejte s manu√°ln√≠m p≈ô√≠stupem
- Analyzujte, co AutoML vybral

---

üí° **Tip**: Dokumentujte sv≈Øj proces! Dobr√° dokumentace je stejnƒõ d≈Øle≈æit√° jako dobr√Ω k√≥d.