# üè• LightGBM - Focus Saisonnalit√© Hivernale (Vision 2026)
## Objectif MAE ~10 | Test sur Trimestre 4 (90 jours)

Ce notebook int√®gre la connaissance m√©tier : les pics de fin d'ann√©e sont li√©s aux pathologies hivernales.
**Strat√©gie :**
1. **Feature `is_winter`** : Flag pour les mois de Novembre, D√©cembre, Janvier, F√©vrier.
2. **Test Set √âtendu** : 90 jours (Octobre √† D√©cembre) pour capturer tout le d√©but de la vague hivernale.

In [None]:
import pandas as pd
import numpy as np
import lightgbm as lgb
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit
import plotly.express as px
import warnings
warnings.filterwarnings('ignore')

In [None]:
# 1. Chargement et Feature Engineering Saisonnier
df_adm = pd.read_csv('../data/raw/admissions_hopital_pitie_2024.csv')
df_adm['date_entree'] = pd.to_datetime(df_adm['date_entree'])
dd = df_adm.groupby('date_entree').size().rename('admissions').asfreq('D', fill_value=0)

def create_seasonal_features(df_ts):
    df = pd.DataFrame(index=df_ts.index)
    df['admissions'] = df_ts.values
    
    # Temporel & Saisonnier
    df['day'] = df.index.dayofweek
    df['month'] = df.index.month
    # Flag Pathologies Hivernales (Novembre √† F√©vrier)
    df['is_winter'] = df['month'].isin([11, 12, 1, 2]).astype(int)
    
    # Lags & Dynamique
    for l in [1, 2, 7, 14]:
        df[f'lag_{l}'] = df['admissions'].shift(l)
    
    df['diff_1'] = df['lag_1'] - df['lag_2']
    
    # Stats Mobiles (Contextes Courts et Longs)
    for w in [7, 28]:
        df[f'mean_{w}'] = df['admissions'].shift(1).rolling(window=w).mean()
        df[f'std_{w}'] = df['admissions'].shift(1).rolling(window=w).std()
    
    return df.dropna()

full_df = create_seasonal_features(dd)
X = full_df.drop('admissions', axis=1)
y = full_df['admissions']

# Test sur les 3 derniers mois (90 jours)
test_days = 90
X_train, X_test = X.iloc[:-test_days], X.iloc[-test_days:]
y_train, y_test = y.iloc[:-test_days], y.iloc[-test_days:]

print(f"Split: Train {len(X_train)}j | Test {len(X_test)}j (Trimestre 4)")

In [None]:
# 2. Optimisation Cibl√©e
param_dist = {
    'num_leaves': [31, 63, 127],
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [1000, 2000],
    'feature_fraction': [0.8, 0.9],
    'bagging_fraction': [0.8, 0.9],
    'bagging_freq': [5]
}

# TimeSeriesSplit avec 4 folds pour plus de robustesse sur l'ann√©e
tscv = TimeSeriesSplit(n_splits=4)
rs = RandomizedSearchCV(
    lgb.LGBMRegressor(objective='regression_l1', random_state=42, verbose=-1),
    param_distributions=param_dist,
    n_iter=15,
    cv=tscv,
    scoring='neg_mean_absolute_error',
    n_jobs=-1
)

print("Recherche du meilleur mod√®le hivernal...")
rs.fit(X_train, y_train)
best_lgbm = rs.best_estimator_

In [None]:
# 3. √âvaluation Finale sur 90 jours
preds = best_lgbm.predict(X_test)
mae = mean_absolute_error(y_test, preds)
print(f"\n‚ùÑÔ∏è MAE FINALE (Oct-Nov-Dec) : {mae:.2f}")

import matplotlib.pyplot as plt
plt.figure(figsize=(15, 6))
plt.plot(y_test.index, y_test, label='R√©el (Trimestre 4)', color='#1a3a5f', alpha=0.6)
plt.plot(y_test.index, preds, label='LightGBM Saisonnier', color='#c8102e', linestyle='--')
plt.title(f'Performance Trimestrielle : Focus Hiver (MAE: {mae:.2f})')
plt.legend()
plt.grid(alpha=0.3)
plt.show()