---
## 1. Teoretick√Ω √övod

### 1.1 Hyperparameters vs Parameters

| Typ | Popis | P≈ô√≠klady |
|-----|-------|----------|
| **Parameters** | Nauƒçen√© bƒõhem tr√©nov√°n√≠ | V√°hy strom≈Ø, split thresholds |
| **Hyperparameters** | Nastaven√© p≈ôed tr√©ninkem | n_estimators, max_depth |

### 1.2 Grid Search

Systematick√© prohled√°v√°n√≠ v≈°ech kombinac√≠ hyperparametr≈Ø:

$$\text{Total combinations} = \prod_{i=1}^{n} |H_i|$$

kde $|H_i|$ je poƒçet hodnot pro i-t√Ω hyperparametr.

### 1.3 TimeSeriesSplit

Pro finanƒçn√≠ data **MUS√çME** pou≈æ√≠vat TimeSeriesSplit m√≠sto standardn√≠ K-Fold CV:

```
Fold 1: [Train: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà] [Test: ‚ñà‚ñà]
Fold 2: [Train: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà] [Test: ‚ñà‚ñà]
Fold 3: [Train: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà] [Test: ‚ñà‚ñà]
Fold 4: [Train: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà] [Test: ‚ñà‚ñà]
Fold 5: [Train: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà] [Test: ‚ñà‚ñà]
```

**Proƒç?**
- Zachov√°v√° chronologick√© po≈ôad√≠
- Zabra≈àuje data leakage
- Simuluje re√°ln√© pou≈æit√≠ (tr√©nujeme na minulosti, predikujeme budoucnost)

---
## 2. Setup Prost≈ôed√≠

In [None]:
# Instalace (pro Colab)
!pip install pandas numpy scikit-learn joblib matplotlib seaborn tqdm -q

print("‚úì Knihovny nainstalov√°ny")

In [None]:
# Import knihoven
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from tqdm.notebook import tqdm
import warnings
import os
import joblib
import json

# Scikit-learn
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier, GradientBoostingClassifier
from sklearn.multioutput import MultiOutputRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import (
    TimeSeriesSplit, GridSearchCV, RandomizedSearchCV, cross_val_score
)
from sklearn.metrics import (
    mean_absolute_error, mean_squared_error, r2_score,
    accuracy_score, f1_score, make_scorer
)

warnings.filterwarnings('ignore')
np.random.seed(42)

print("‚úì Knihovny naƒçteny")

In [None]:
# P≈ôipojen√≠ Google Drive
try:
    from google.colab import drive
    drive.mount('/content/drive')
    DRIVE_PATH = '/content/drive/MyDrive/MachineLearning'
    RUNNING_ON_COLAB = True
    print(f"‚úì Google Drive p≈ôipojen: {DRIVE_PATH}")
except:
    DRIVE_PATH = '.'
    RUNNING_ON_COLAB = False
    print("‚ÑπÔ∏è Lok√°ln√≠ prost≈ôed√≠")

# Cesty
DATA_PATH = f"{DRIVE_PATH}/data"
MODEL_PATH = f"{DRIVE_PATH}/models"
os.makedirs(MODEL_PATH, exist_ok=True)

---
## 3. Naƒçten√≠ Dat

In [None]:
# Naƒçten√≠ kompletn√≠ho datasetu
complete_path = f"{DATA_PATH}/complete/all_sectors_complete_10y.csv"
df = pd.read_csv(complete_path, parse_dates=['date'])

print(f"üìà Dataset naƒçten: {len(df):,} z√°znam≈Ø")

In [None]:
# Definice features
FEATURE_COLS = [
    'open', 'high', 'low', 'close', 'volume',
    'returns', 'volatility_12m', 'rsi_14',
    'macd', 'macd_signal', 'macd_hist',
    'sma_3', 'sma_6', 'sma_12',
    'ema_3', 'ema_6', 'ema_12',
    'volume_change', 'price_momentum'
]

FUNDAMENTAL_COLS = [
    'PE', 'PB', 'PS', 'EV_EBITDA',
    'ROE', 'ROA', 'Profit_Margin',
    'Debt_to_Equity', 'Current_Ratio',
    'Revenue_Growth_YoY', 'Earnings_Growth_YoY'
]

# Dostupn√© features
available_features = [f for f in FEATURE_COLS if f in df.columns]
available_fundamentals = [f for f in FUNDAMENTAL_COLS if f in df.columns]

print(f"üìä Features: {len(available_features)}")
print(f"üìä Fundamentals: {len(available_fundamentals)}")

In [None]:
# P≈ô√≠prava dat pro tuning

# 1. Pro Fundamental Predictor (RF Regressor)
regressor_df = df.dropna(subset=available_features + available_fundamentals)
X_reg = regressor_df[available_features].values
y_reg = regressor_df[available_fundamentals].values

print(f"üìä Regressor Data: {X_reg.shape[0]:,} samples, {X_reg.shape[1]} features ‚Üí {y_reg.shape[1]} targets")

In [None]:
# 2. Pro Price Classifier
THRESHOLD = 0.03

# Vytvo≈ôen√≠ target
classifier_df = df.copy()
classifier_df['future_return'] = classifier_df.groupby('ticker')['close'].shift(-1) / classifier_df['close'] - 1

conditions = [
    classifier_df['future_return'] < -THRESHOLD,
    classifier_df['future_return'] > THRESHOLD,
]
choices = [0, 2]
classifier_df['target'] = np.select(conditions, choices, default=1)

# ƒåi≈°tƒõn√≠
all_features = available_features + available_fundamentals
classifier_df = classifier_df.dropna(subset=all_features + ['target'])

X_clf = classifier_df[all_features].values
y_clf = classifier_df['target'].values.astype(int)

print(f"üìä Classifier Data: {X_clf.shape[0]:,} samples, {X_clf.shape[1]} features ‚Üí 3 classes")

---
## 4. TimeSeriesSplit Konfigurace

In [None]:
# TimeSeriesSplit konfigurace
N_SPLITS = 5

tscv = TimeSeriesSplit(n_splits=N_SPLITS)

# Vizualizace split≈Ø
print(f"üìä TimeSeriesSplit s {N_SPLITS} foldy")
print("="*60)

for i, (train_idx, test_idx) in enumerate(tscv.split(X_clf)):
    train_pct = len(train_idx) / len(X_clf) * 100
    test_pct = len(test_idx) / len(X_clf) * 100
    
    train_bar = '‚ñà' * int(train_pct / 2)
    test_bar = '‚ñì' * int(test_pct / 2)
    
    print(f"Fold {i+1}: {train_bar}{test_bar} (Train: {len(train_idx):,} | Test: {len(test_idx):,})")

---
## 5. Grid Search pro RF Regressor (Fundamental Predictor)

### 5.1 Parametrov√Ω Prostor

| Parametr | Hodnoty | Popis |
|----------|---------|-------|
| `n_estimators` | [100, 200, 300] | Poƒçet strom≈Ø |
| `max_depth` | [10, 15, 20, None] | Maxim√°ln√≠ hloubka |
| `min_samples_split` | [2, 5, 10] | Min. samples pro split |
| `min_samples_leaf` | [1, 2, 4] | Min. samples v listu |

In [None]:
# Parametrov√Ω prostor pro RF Regressor
rf_reg_param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 15, 20],
    'min_samples_split': [5, 10],
    'min_samples_leaf': [2, 4]
}

# Poƒçet kombinac√≠
n_combinations = np.prod([len(v) for v in rf_reg_param_grid.values()])
print(f"üîç Grid Search pro RF Regressor")
print(f"   Parametr≈Ø: {len(rf_reg_param_grid)}")
print(f"   Kombinac√≠: {n_combinations}")
print(f"   Celkem fit≈Ø: {n_combinations * N_SPLITS}")

In [None]:
%%time

# Pro √∫sporu ƒçasu pou≈æijeme jen jeden target (P/E)
# V praxi by se optimalizovalo pro v≈°echny

pe_idx = available_fundamentals.index('PE') if 'PE' in available_fundamentals else 0
y_reg_pe = y_reg[:, pe_idx]

# Standardizace
scaler_reg = StandardScaler()
X_reg_scaled = scaler_reg.fit_transform(X_reg)

# Grid Search
print("üöÄ Spou≈°t√≠m Grid Search pro RF Regressor (P/E target)...")

rf_reg = RandomForestRegressor(random_state=42, n_jobs=-1)

grid_search_reg = GridSearchCV(
    estimator=rf_reg,
    param_grid=rf_reg_param_grid,
    cv=tscv,
    scoring='neg_mean_absolute_error',
    n_jobs=-1,
    verbose=1
)

grid_search_reg.fit(X_reg_scaled, y_reg_pe)

print("\n‚úÖ Grid Search dokonƒçen!")

In [None]:
# V√Ωsledky Grid Search pro Regressor
print("üìä V√ùSLEDKY - RF REGRESSOR")
print("="*60)
print(f"\nüèÜ Nejlep≈°√≠ parametry:")
for param, value in grid_search_reg.best_params_.items():
    print(f"   {param}: {value}")

print(f"\nüìä Nejlep≈°√≠ MAE: {-grid_search_reg.best_score_:.4f}")

# Top 5 kombinac√≠
results_reg = pd.DataFrame(grid_search_reg.cv_results_)
results_reg = results_reg.sort_values('rank_test_score')

print(f"\nüìä Top 5 kombinac√≠:")
print("-"*60)
for i, row in results_reg.head(5).iterrows():
    print(f"   Rank {row['rank_test_score']}: MAE={-row['mean_test_score']:.4f} ¬± {row['std_test_score']:.4f}")
    print(f"      params: {row['params']}")

---
## 6. Grid Search pro RF Classifier (Price Classifier)

In [None]:
# Parametrov√Ω prostor pro RF Classifier
rf_clf_param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [8, 12, 16],
    'min_samples_split': [5, 10],
    'min_samples_leaf': [2, 5],
    'class_weight': ['balanced', None]
}

n_combinations_clf = np.prod([len(v) for v in rf_clf_param_grid.values()])
print(f"üîç Grid Search pro RF Classifier")
print(f"   Kombinac√≠: {n_combinations_clf}")
print(f"   Celkem fit≈Ø: {n_combinations_clf * N_SPLITS}")

In [None]:
%%time

# Standardizace
scaler_clf = StandardScaler()
X_clf_scaled = scaler_clf.fit_transform(X_clf)

# Grid Search
print("üöÄ Spou≈°t√≠m Grid Search pro RF Classifier...")

rf_clf = RandomForestClassifier(random_state=42, n_jobs=-1)

grid_search_clf = GridSearchCV(
    estimator=rf_clf,
    param_grid=rf_clf_param_grid,
    cv=tscv,
    scoring='f1_weighted',
    n_jobs=-1,
    verbose=1
)

grid_search_clf.fit(X_clf_scaled, y_clf)

print("\n‚úÖ Grid Search dokonƒçen!")

In [None]:
# V√Ωsledky Grid Search pro Classifier
print("üìä V√ùSLEDKY - RF CLASSIFIER")
print("="*60)
print(f"\nüèÜ Nejlep≈°√≠ parametry:")
for param, value in grid_search_clf.best_params_.items():
    print(f"   {param}: {value}")

print(f"\nüìä Nejlep≈°√≠ F1-Score: {grid_search_clf.best_score_:.4f}")

# Top 5 kombinac√≠
results_clf = pd.DataFrame(grid_search_clf.cv_results_)
results_clf = results_clf.sort_values('rank_test_score')

print(f"\nüìä Top 5 kombinac√≠:")
print("-"*60)
for i, row in results_clf.head(5).iterrows():
    print(f"   Rank {row['rank_test_score']}: F1={row['mean_test_score']:.4f} ¬± {row['std_test_score']:.4f}")
    print(f"      params: {row['params']}")

---
## 7. Porovn√°n√≠ s Gradient Boosting

In [None]:
# Porovn√°n√≠ RF vs Gradient Boosting pro klasifikaci

gb_param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.05, 0.1],
    'min_samples_split': [5, 10]
}

print("üöÄ Grid Search pro Gradient Boosting Classifier...")

gb_clf = GradientBoostingClassifier(random_state=42)

# Men≈°√≠ sample pro rychlost
sample_size = min(5000, len(X_clf_scaled))
sample_idx = np.random.choice(len(X_clf_scaled), sample_size, replace=False)
sample_idx = np.sort(sample_idx)  # Zachovat chronologick√© po≈ôad√≠

X_sample = X_clf_scaled[sample_idx]
y_sample = y_clf[sample_idx]

In [None]:
%%time

grid_search_gb = GridSearchCV(
    estimator=gb_clf,
    param_grid=gb_param_grid,
    cv=TimeSeriesSplit(n_splits=3),
    scoring='f1_weighted',
    n_jobs=-1,
    verbose=1
)

grid_search_gb.fit(X_sample, y_sample)

print("\n‚úÖ Grid Search GB dokonƒçen!")

In [None]:
# Porovn√°n√≠ RF vs GB
print("üìä POROVN√ÅN√ç ALGORITM≈Æ")
print("="*60)
print(f"\n{'Algoritmus':<25} {'Nejlep≈°√≠ F1':>15} {'Nejlep≈°√≠ Params'}")
print("-"*60)
print(f"{'Random Forest':<25} {grid_search_clf.best_score_:>15.4f}")
print(f"{'Gradient Boosting':<25} {grid_search_gb.best_score_:>15.4f}")

# V√≠tƒõz
if grid_search_clf.best_score_ > grid_search_gb.best_score_:
    print(f"\nüèÜ V√≠tƒõz: Random Forest")
else:
    print(f"\nüèÜ V√≠tƒõz: Gradient Boosting")

---
## 8. Vizualizace V√Ωsledk≈Ø

In [None]:
# Vizualizace v√Ωsledk≈Ø Grid Search
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. RF Regressor - vliv n_estimators
ax1 = axes[0, 0]
for depth in rf_reg_param_grid['max_depth']:
    mask = results_reg['param_max_depth'] == depth
    subset = results_reg[mask].groupby('param_n_estimators')['mean_test_score'].mean()
    ax1.plot(subset.index, -subset.values, marker='o', label=f'depth={depth}')
ax1.set_xlabel('n_estimators')
ax1.set_ylabel('MAE')
ax1.set_title('RF Regressor: MAE vs n_estimators', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# 2. RF Classifier - vliv n_estimators
ax2 = axes[0, 1]
for depth in rf_clf_param_grid['max_depth']:
    mask = results_clf['param_max_depth'] == depth
    subset = results_clf[mask].groupby('param_n_estimators')['mean_test_score'].mean()
    ax2.plot(subset.index, subset.values, marker='o', label=f'depth={depth}')
ax2.set_xlabel('n_estimators')
ax2.set_ylabel('F1-Score')
ax2.set_title('RF Classifier: F1 vs n_estimators', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# 3. Heatmap - RF Classifier (n_estimators vs max_depth)
ax3 = axes[1, 0]
pivot = results_clf.pivot_table(
    values='mean_test_score',
    index='param_max_depth',
    columns='param_n_estimators',
    aggfunc='mean'
)
sns.heatmap(pivot, annot=True, fmt='.3f', cmap='YlGnBu', ax=ax3)
ax3.set_title('RF Classifier: F1 Heatmap', fontweight='bold')

# 4. CV Score distribuce
ax4 = axes[1, 1]
ax4.boxplot([results_reg['mean_test_score'] * -1, results_clf['mean_test_score']],
            labels=['Regressor (MAE)', 'Classifier (F1)'])
ax4.set_ylabel('Score')
ax4.set_title('Distribuce CV Score', fontweight='bold')

plt.tight_layout()
plt.savefig(f"{DATA_PATH}/hyperparameter_tuning.png", dpi=150, bbox_inches='tight')
plt.show()

print(f"\nüíæ Graf ulo≈æen: {DATA_PATH}/hyperparameter_tuning.png")

---
## 9. Ulo≈æen√≠ Optim√°ln√≠ch Parametr≈Ø

In [None]:
# Ulo≈æen√≠ optim√°ln√≠ch parametr≈Ø
optimal_params = {
    'fundamental_predictor': {
        'best_params': grid_search_reg.best_params_,
        'best_score': float(-grid_search_reg.best_score_),
        'metric': 'MAE'
    },
    'price_classifier': {
        'best_params': grid_search_clf.best_params_,
        'best_score': float(grid_search_clf.best_score_),
        'metric': 'F1-weighted'
    },
    'gradient_boosting': {
        'best_params': grid_search_gb.best_params_,
        'best_score': float(grid_search_gb.best_score_),
        'metric': 'F1-weighted'
    },
    'cv_config': {
        'method': 'TimeSeriesSplit',
        'n_splits': N_SPLITS
    },
    'created': datetime.now().isoformat()
}

params_path = f"{MODEL_PATH}/optimal_hyperparameters.json"
with open(params_path, 'w') as f:
    json.dump(optimal_params, f, indent=2, default=str)

print(f"üíæ Optim√°ln√≠ parametry ulo≈æeny: {params_path}")

In [None]:
# Ulo≈æen√≠ nejlep≈°√≠ch model≈Ø
best_reg_path = f"{MODEL_PATH}/fundamental_predictor_tuned.pkl"
best_clf_path = f"{MODEL_PATH}/price_classifier_tuned.pkl"

joblib.dump(grid_search_reg.best_estimator_, best_reg_path)
joblib.dump(grid_search_clf.best_estimator_, best_clf_path)

print(f"üíæ Tuned Regressor: {best_reg_path}")
print(f"üíæ Tuned Classifier: {best_clf_path}")

---
## 10. Shrnut√≠

### ‚úÖ Dokonƒçeno:

| √ökol | Status |
|------|--------|
| Grid Search pro RF Regressor | ‚úÖ |
| Grid Search pro RF Classifier | ‚úÖ |
| Porovn√°n√≠ s Gradient Boosting | ‚úÖ |
| Vizualizace v√Ωsledk≈Ø | ‚úÖ |
| Ulo≈æen√≠ optim√°ln√≠ch parametr≈Ø | ‚úÖ |

### üìÅ Vytvo≈ôen√© soubory:

| Soubor | Popis |
|--------|-------|
| `models/optimal_hyperparameters.json` | Nejlep≈°√≠ parametry |
| `models/fundamental_predictor_tuned.pkl` | Optimalizovan√Ω regressor |
| `models/price_classifier_tuned.pkl` | Optimalizovan√Ω classifier |

### ‚û°Ô∏è Dal≈°√≠ notebook:

**Notebook 06: Final Evaluation**
- Kompletn√≠ evaluace v≈°ech model≈Ø
- Vizualizace v√Ωsledk≈Ø pro diplomovou pr√°ci

In [None]:
# Fin√°ln√≠ shrnut√≠
print("="*70)
print("üìä NOTEBOOK 05 - SHRNUT√ç HYPERPARAMETER TUNING")
print("="*70)

print(f"\nüå≤ RF Regressor (Fundamental Predictor):")
print(f"   Best MAE: {-grid_search_reg.best_score_:.4f}")
print(f"   Best params: {grid_search_reg.best_params_}")

print(f"\nüå≤ RF Classifier (Price Classifier):")
print(f"   Best F1: {grid_search_clf.best_score_:.4f}")
print(f"   Best params: {grid_search_clf.best_params_}")

print(f"\nüöÄ Gradient Boosting Classifier:")
print(f"   Best F1: {grid_search_gb.best_score_:.4f}")
print(f"   Best params: {grid_search_gb.best_params_}")

print(f"\n‚úÖ Hyperparameter tuning dokonƒçen!")