# 5. Моделирование и эксперименты

## 🎯 Цели этапа
1. Обучить и сравнить различные ML-алгоритмы
2. Провести подбор гиперпараметров для лучших моделей
3. Выбрать оптимальную модель для прогнозирования оттока
4. Проанализировать feature importance

## 📊 Данные для моделирования
- Train: 12,802 наблюдений, 23 признака (после SMOTE)
- Test: 2,001 наблюдение, 23 признака (оригинальное распределение)
- Дисбаланс: 50/50 в train, 80/20 в test

In [1]:
import joblib
import pandas as pd

import sys
import os

os.chdir('..')

sys.path.insert(0, 'src')

from model_training import TrainModels
from hyperparametr_config import MODEL_PARAMS_CONFIG
from hyperparametr_tuner import HyperparametrTuner

os.chdir('notebooks')

X_train = joblib.load('../data/processed/X_train.pkl')
X_test = joblib.load('../data/processed/X_test.pkl')
y_train = joblib.load('../data/processed/y_train.pkl')
y_test = joblib.load('../data/processed/y_test.pkl')

trainer = TrainModels(X_train, X_test, y_train, y_test)

models = trainer.fit_models()

predictions_models = trainer.evaluate_models()

for model_name, metric in predictions_models.items():
    print(f'Модель: {model_name}. Результаты:')
    print(f'\nClassification report:\n{metric["classification_report"]}')
    print('-'*50)

Модель: LogisticRegression. Результаты:

Classification report:
              precision    recall  f1-score   support

           0       0.91      0.75      0.82      1593
           1       0.42      0.72      0.53       408

    accuracy                           0.74      2001
   macro avg       0.67      0.74      0.68      2001
weighted avg       0.81      0.74      0.76      2001

--------------------------------------------------
Модель: KNeighborsClassifier. Результаты:

Classification report:
              precision    recall  f1-score   support

           0       0.90      0.76      0.82      1593
           1       0.42      0.68      0.52       408

    accuracy                           0.74      2001
   macro avg       0.66      0.72      0.67      2001
weighted avg       0.80      0.74      0.76      2001

--------------------------------------------------
Модель: DecisionTreeClassifier. Результаты:

Classification report:
              precision    recall  f1-score   

In [2]:
tuner = HyperparametrTuner(
    X_train=X_train,
    y_train=y_train,
    params_config=MODEL_PARAMS_CONFIG,
    n_trials=10
)

best_models_to_tune = ['catboost', 'lightgbm']
best_params = tuner.tune_models(models_to_tune=best_models_to_tune)

tuned_models = tuner.get_tuned_models()

trainer.add_tuned_models(tuned_models)

all_predictions = trainer.evaluate_models()

comparison_df = trainer.compare_models_performance()
print("📊 Сравнение моделей:")
print(comparison_df.round(4))

best_model, best_score = trainer.get_best_model()
print(f"\n🏆 Лучшая модель: {best_model} с ROC-AUC = {best_score:.4f}")

[I 2025-10-10 15:38:44,113] A new study created in memory with name: no-name-2bc20071-9337-4881-b873-0c5badf5f2ae


🎯 Запуск подбора гиперпараметров для: ['catboost', 'lightgbm']

🔍 Оптимизация catboost...


[I 2025-10-10 15:38:50,254] Trial 0 finished with value: 0.8404004667093229 and parameters: {'iterations': 781, 'depth': 8, 'learning_rate': 0.1205712628744377, 'l2_leaf_reg': 6.387926357773329, 'border_count': 66, 'random_strength': 1.6443457513284063, 'bagging_temperature': 0.05808361216819946}. Best is trial 0 with value: 0.8404004667093229.
[I 2025-10-10 15:38:56,956] Trial 1 finished with value: 0.8362753585005083 and parameters: {'iterations': 1150, 'depth': 7, 'learning_rate': 0.11114989443094977, 'l2_leaf_reg': 1.185260448662222, 'border_count': 249, 'random_strength': 8.341182143924176, 'bagging_temperature': 0.21233911067827616}. Best is trial 0 with value: 0.8404004667093229.
[I 2025-10-10 15:39:00,066] Trial 2 finished with value: 0.8603786367571228 and parameters: {'iterations': 636, 'depth': 4, 'learning_rate': 0.028145092716060652, 'l2_leaf_reg': 5.72280788469014, 'border_count': 128, 'random_strength': 2.983168487960615, 'bagging_temperature': 0.6118528947223795}. Best 

✅ catboost: лучший roc_auc = 0.8604
   Лучшие параметры: {'iterations': 636, 'depth': 4, 'learning_rate': 0.028145092716060652, 'l2_leaf_reg': 5.72280788469014, 'border_count': 128, 'random_strength': 2.983168487960615, 'bagging_temperature': 0.6118528947223795}

🔍 Оптимизация lightgbm...


[I 2025-10-10 15:39:20,341] Trial 0 finished with value: 0.8600342504422447 and parameters: {'n_estimators': 531, 'max_depth': 8, 'learning_rate': 0.1205712628744377, 'num_leaves': 68, 'min_child_samples': 24, 'subsample': 0.662397808134481, 'colsample_bytree': 0.6232334448672797, 'reg_alpha': 8.661761457749352, 'reg_lambda': 6.011150117432088}. Best is trial 0 with value: 0.8600342504422447.
[I 2025-10-10 15:39:22,044] Trial 1 finished with value: 0.8513606082276336 and parameters: {'n_estimators': 914, 'max_depth': 3, 'learning_rate': 0.2708160864249968, 'num_leaves': 87, 'min_child_samples': 29, 'subsample': 0.6727299868828402, 'colsample_bytree': 0.6733618039413735, 'reg_alpha': 3.0424224295953772, 'reg_lambda': 5.247564316322379}. Best is trial 0 with value: 0.8600342504422447.
[I 2025-10-10 15:39:23,599] Trial 2 finished with value: 0.8589766268960066 and parameters: {'n_estimators': 597, 'max_depth': 4, 'learning_rate': 0.08012737503998542, 'num_leaves': 31, 'min_child_samples':

✅ lightgbm: лучший roc_auc = 0.8609
   Лучшие параметры: {'n_estimators': 691, 'max_depth': 6, 'learning_rate': 0.011711509955524094, 'num_leaves': 69, 'min_child_samples': 25, 'subsample': 0.6260206371941118, 'colsample_bytree': 0.9795542149013333, 'reg_alpha': 9.656320330745594, 'reg_lambda': 8.08397348116461}
✅ Добавлено 2 настроенных моделей
📊 Сравнение моделей:
                    Model  Test ROC-AUC  Test F1  Train ROC-AUC  Train F1  \
7          catboost_tuned        0.8711   0.6017         0.9137    0.8255   
6      CatBoostClassifier        0.8628   0.5901         0.9631    0.8939   
8          lightgbm_tuned        0.8628   0.5845         0.8944    0.8066   
3  RandomForestClassifier        0.8468   0.5807         1.0000    1.0000   
5          LGBMClassifier        0.8455   0.5722         0.9867    0.9366   
4           XGBClassifier        0.8335   0.5622         0.9992    0.9877   
0      LogisticRegression        0.8013   0.5335         0.7986    0.7220   
1    KNeighbors