# 01_EDA_and_Train.ipynb

**Uplift-моделирование: EDA, препроцессинг, обучение и сохранение**



Uplift-моделирование: построить модель, которая оценивает прирост конверсии (uplift) от отправки предложения конкретному клиенту. Цель — определить клиентов, на которых коммуникация оказывает максимальный положительный эффект, и тем самым оптимизировать бюджет маркетинга.


<u> **Бизнес-ценность проекта:** </u> <p>
***1.*** Сокращение маркетингового бюджета за счёт отправки предложений только «убеждаемым» клиентам<p>
***2.*** Повышение ROI кампаний<p>
***3.*** Улучшение клиентского опыта благодаря персонализированной коммуникации<p>

<u> **Описание датасета:** </u>
https://www.kaggle.com/datasets/davinwijaya/customer-retention <P>
Набор данных содержит информацию о клиентах розничной компании и их реакции на маркетинговые коммуникации. Каждый клиент получил одно из трёх предложений (или не получил ничего), и зафиксирован результат — совершил ли клиент покупку после предложения.

***recency*** — количество месяцев с момента последней покупки клиента <p>
***history*** — общая сумма покупок клиента за всё время (в долларах)<p>
***used_discount*** — использовал ли клиент скидки в прошлом <p>
***used_bogo*** — использовал ли клиент акции типа «Купи один — получи второй бесплатно» в прошлом<p>
***zip_code*** — почтовый индекс клиента (категориальный признак: Suburban, Urban, Rural)<p>
***is_referral*** — пришёл ли клиент по реферальной программе <p>
***channel*** — канал привлечения клиента (Phone, Web, Multichannel)<p>
***offer*** — тип полученного предложения (Discount, Buy One Get One, No Offer)<p>
***conversion*** — целевая переменная (1 — клиент совершил покупку, 0 — нет)<p>

# Import

In [1]:
import yaml
import json
import joblib
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder, KBinsDiscretizer
from sklearn.compose import ColumnTransformer
from lightgbm import LGBMClassifier, LGBMRegressor
from sklearn.metrics import roc_auc_score
import optuna

import warnings
warnings.filterwarnings('ignore')
%matplotlib inline


In [2]:
# находим корень проекта (там, где лежит config/)
PROJECT_ROOT = Path(__file__).parent.parent if "__file__" in locals() else Path.cwd().parents[0]
# PROJECT_ROOT = Path.cwd().parents[0]

config_path = PROJECT_ROOT / 'config' / 'params2.yml'

# Загружаем конфиг
with config_path.open('r', encoding='utf-8') as f:
    config = yaml.safe_load(f)

# Формируем полный путь к данным
raw_path = PROJECT_ROOT / config['data']['raw_path']

# Загружаем датасет 
df = pd.read_csv(raw_path)

print(f"Датасет загружен: {df.shape[0]} строк, {df.shape[1]} колонок")
print(f"Средняя конверсия: {df['conversion'].mean():.2%}")
df.head()


Датасет загружен: 64000 строк, 9 колонок
Средняя конверсия: 14.68%


Unnamed: 0,recency,history,used_discount,used_bogo,zip_code,is_referral,channel,offer,conversion
0,10,142.44,1,0,Surburban,0,Phone,Buy One Get One,0
1,6,329.08,1,1,Rural,1,Web,No Offer,0
2,7,180.65,0,1,Surburban,1,Web,Buy One Get One,0
3,9,675.83,1,0,Rural,1,Web,Discount,0
4,2,45.34,1,0,Urban,0,Web,Buy One Get One,0


# EDA 

In [3]:
# Папка для отчетов
report_path = PROJECT_ROOT / config['folders']['report']
report_path.mkdir(parents=True, exist_ok=True)

# Гистограммы ключевых признаков 
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

df['recency'].hist(bins=30, ax=axes[0,0], color='skyblue')
axes[0,0].set_title('Recency')

df['history'].hist(bins=50, ax=axes[0,1], color='lightgreen')
axes[0,1].set_title('History')

df['used_discount'].value_counts().plot(kind='bar', ax=axes[1,0], color='salmon')
axes[1,0].set_title('Used Discount')

df['used_bogo'].value_counts().plot(kind='bar', ax=axes[1,1], color='gold')
axes[1,1].set_title('Used BOGO')

plt.tight_layout()

# Сохраняем гистограммы в файл
hist_path = report_path / 'feature_histograms.png'
fig.savefig(hist_path)
plt.close(fig)  
print(f"Гистограммы сохранены: {hist_path}")

# Корреляционная матрица
corr_cols = ['recency', 'history', 'used_discount', 'used_bogo', 'is_referral', 'conversion']
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(df[corr_cols].corr(), annot=True, cmap='coolwarm', fmt='.2f', ax=ax)
plt.title('Корреляционная матрица')

# Сохраняем корреляции
corr_path = report_path / 'correlation_matrix.png'
fig.savefig(corr_path)
plt.close(fig)
print(f"Корреляционная матрица сохранена: {corr_path}")


Гистограммы сохранены: /Users/bariatmamaeva/Desktop/uplift-marketing-mlops/report/feature_histograms.png
Корреляционная матрица сохранена: /Users/bariatmamaeva/Desktop/uplift-marketing-mlops/report/correlation_matrix.png


# Features engineering 

In [4]:
# Папка для отчетов
report_path = PROJECT_ROOT / config['folders']['report']
report_path.mkdir(parents=True, exist_ok=True)

# Создание взаимодействий (numeric_interactions)
interaction_cols = []
for interaction in config.get('features', {}).get('numeric_interactions', []):
    if interaction == 'history_discount':
        df['history_discount'] = df['history'] * df['used_discount']
        interaction_cols.append('history_discount')
    elif interaction == 'history_bogo':
        df['history_bogo'] = df['history'] * df['used_bogo']
        interaction_cols.append('history_bogo')
    elif interaction == 'recency_history':
        df['recency_history'] = df['recency'] * df['history']
        interaction_cols.append('recency_history')
    else:
        # Универсальный способ, если в конфиге что-то другое
        parts = interaction.split('_and_') if '_and_' in interaction else interaction.split('_')
        if len(parts) >= 2:
            col_name = '_'.join(parts)
            if all(p in df.columns for p in parts):
                df[col_name] = df[parts[0]] * df[parts[1]]
                interaction_cols.append(col_name)

# Биннинг recency 
recency_bins = config.get('features', {}).get('recency_bins', 5)
recency_strategy = config.get('features', {}).get('recency_bin_strategy', 'uniform')

kbins = KBinsDiscretizer(
    n_bins=recency_bins,
    encode='onehot-dense',
    strategy=recency_strategy
)
binned = kbins.fit_transform(df[['recency']])
bin_cols = [f'recency_bin_{i}' for i in range(recency_bins)]
df[bin_cols] = binned
interaction_cols.extend(bin_cols)

# Treatment
df['treatment'] = (df['offer'] != 'No Offer').astype(int)
interaction_cols.append('treatment')

# Логирование новых признаков
features_log_path = report_path / 'created_features.json'
with open(features_log_path, 'w') as f:
    json.dump(interaction_cols, f, indent=2)

print(f"Новые признаки созданы по конфигу: {interaction_cols}")
print(f"Список новых признаков сохранён: {features_log_path}")


Новые признаки созданы по конфигу: ['history_discount', 'history_bogo', 'recency_history', 'recency_bin_0', 'recency_bin_1', 'recency_bin_2', 'treatment']
Список новых признаков сохранён: /Users/bariatmamaeva/Desktop/uplift-marketing-mlops/report/created_features.json


# Preprocessing

In [5]:
y = df['conversion']
X = df.drop(columns=['offer', 'conversion', 'treatment'])

# Определение колонок по конфигу 
categorical_cols = config.get('features', {}).get('categorical', [])
numeric_cols = [col for col in X.columns if col not in categorical_cols]

# Создание препроцессора 
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), numeric_cols),
    ('cat', OneHotEncoder(handle_unknown='ignore', sparse_output=False), categorical_cols)
])

# Применение препроцессинга
X_preprocessed = preprocessor.fit_transform(X)
print(f"Признаков после препроцессинга: {X_preprocessed.shape[1]}")

# Сохранение (MLOps)
models_path = PROJECT_ROOT / config['folders']['models']
models_path.mkdir(parents=True, exist_ok=True)

preprocessor_path = models_path / 'preprocessor.joblib'
joblib.dump(preprocessor, preprocessor_path)
print(f"Препроцессор сохранён: {preprocessor_path}")


Признаков после препроцессинга: 18
Препроцессор сохранён: /Users/bariatmamaeva/Desktop/uplift-marketing-mlops/models/preprocessor.joblib


# Split

In [6]:
processed_path = PROJECT_ROOT / config['data']['processed_path']

# Стратифицированное деление 
stratify_key = df['treatment'].astype(str) + df['conversion'].astype(str)
train_df, test_df = train_test_split(
    df,
    test_size=config['train']['test_size'],
    random_state=config['train']['random_state'],
    stratify=stratify_key
)

# Сохраняем данные
train_df.to_csv(processed_path / config['data']['train_file'], index=False)
test_df.to_csv(processed_path / config['data']['test_file'], index=False)

print(f"Данные сохранены в {processed_path}")
print(f" train: {len(train_df)} строк")
print(f" test: {len(test_df)} строк")

Данные сохранены в /Users/bariatmamaeva/Desktop/uplift-marketing-mlops/data/processed
 train: 51200 строк
 test: 12800 строк


# Train X-Learner with Optuna

In [7]:
feature_cols = [col for col in train_df.columns if col not in ['conversion', 'treatment', 'offer']]
X_train_feat = preprocessor.transform(train_df[feature_cols])
X_test_feat = preprocessor.transform(test_df[feature_cols])

t_train, t_test = train_df['treatment'], test_df['treatment']
y_train, y_test = train_df['conversion'], test_df['conversion']

# Оптимизация параметров через Optuna для effect модели
def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', *config['model']['n_estimators_range']),
        'max_depth': trial.suggest_int('max_depth', *config['model']['max_depth_range']),
        'learning_rate': trial.suggest_float('learning_rate', *config['model']['learning_rate_range']),
        'num_leaves': trial.suggest_int('num_leaves', *config['model']['num_leaves_range']),
        'min_child_samples': trial.suggest_int('min_child_samples', *config['model']['min_child_samples_range']),
        'random_state': config['train']['random_state'],
        'n_jobs': -1
    }
    model = LGBMClassifier(**params)
    model.fit(X_train_feat[t_train==1], y_train[t_train==1])
    pred = model.predict_proba(X_test_feat[t_test==1])[:, 1]
    return roc_auc_score(y_test[t_test==1], pred)

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=config['train']['optuna_trials'])

best_params = study.best_params
best_params.update({'random_state': config['train']['random_state'], 'n_jobs': -1})

# Сохраняем лучшие параметры
report_path = PROJECT_ROOT / config['folders']['report']
report_path.mkdir(parents=True, exist_ok=True)
with open(report_path / 'best_params.json', 'w') as f:
    json.dump(best_params, f, indent=2)

# Обучение X-Learner

# Модель вероятности назначения treatment
prop_model = LGBMClassifier(**best_params)
prop_model.fit(X_train_feat, t_train)

# Модель эффекта на treatment
effect_t = LGBMClassifier(**best_params)
effect_t.fit(X_train_feat[t_train==1], y_train[t_train==1])

# Модель эффекта на контроль
effect_c = LGBMClassifier(**best_params)
effect_c.fit(X_train_feat[t_train==0], y_train[t_train==0])

# Собираем в словарь и сохраняем
uplift_model_full = {
    'prop_model': prop_model,
    'effect_t': effect_t,
    'effect_c': effect_c
}

models_path = PROJECT_ROOT / config['folders']['models']
models_path.mkdir(parents=True, exist_ok=True)
joblib.dump(uplift_model_full, models_path / 'uplift_model.joblib')

# Сохраняем препроцессор
joblib.dump(preprocessor, models_path / 'preprocessor.joblib')

[I 2026-01-04 19:50:34,807] A new study created in memory with name: no-name-e552218e-ea2f-49bf-813c-cd17e71d7d71


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000693 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595




[I 2026-01-04 19:50:39,723] Trial 0 finished with value: 0.5963976828472609 and parameters: {'n_estimators': 337, 'max_depth': 11, 'learning_rate': 0.010883944244329964, 'num_leaves': 182, 'min_child_samples': 100}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000362 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:50:43,615] Trial 1 finished with value: 0.5641843283607105 and parameters: {'n_estimators': 398, 'max_depth': 15, 'learning_rate': 0.10335932884199857, 'num_leaves': 123, 'min_child_samples': 51}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000364 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:50:47,407] Trial 2 finished with value: 0.5619137292620884 and parameters: {'n_estimators': 250, 'max_depth': 13, 'learning_rate': 0.08814888322249922, 'num_leaves': 146, 'min_child_samples': 8}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000450 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:50:49,905] Trial 3 finished with value: 0.5577813296043098 and parameters: {'n_estimators': 420, 'max_depth': 15, 'learning_rate': 0.12492128385215451, 'num_leaves': 54, 'min_child_samples': 14}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000442 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595








[I 2026-01-04 19:50:52,174] Trial 4 finished with value: 0.5871390589381709 and parameters: {'n_estimators': 481, 'max_depth': 7, 'learning_rate': 0.04029133131014662, 'num_leaves': 146, 'min_child_samples': 38}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000401 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:50:53,356] Trial 5 finished with value: 0.5835743829966624 and parameters: {'n_estimators': 177, 'max_depth': 14, 'learning_rate': 0.07499842294703848, 'num_leaves': 121, 'min_child_samples': 100}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000443 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:50:56,038] Trial 6 finished with value: 0.553826044760507 and parameters: {'n_estimators': 316, 'max_depth': 16, 'learning_rate': 0.13816628366384012, 'num_leaves': 101, 'min_child_samples': 29}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000337 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595




[I 2026-01-04 19:50:58,703] Trial 7 finished with value: 0.5854357892005414 and parameters: {'n_estimators': 242, 'max_depth': 15, 'learning_rate': 0.04040883851541605, 'num_leaves': 140, 'min_child_samples': 83}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000425 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:00,726] Trial 8 finished with value: 0.5548002501789879 and parameters: {'n_estimators': 281, 'max_depth': 14, 'learning_rate': 0.18281144370661584, 'num_leaves': 68, 'min_child_samples': 28}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000388 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595




[I 2026-01-04 19:51:02,651] Trial 9 finished with value: 0.56566971680756 and parameters: {'n_estimators': 163, 'max_depth': 18, 'learning_rate': 0.16592612345136676, 'num_leaves': 177, 'min_child_samples': 71}. Best is trial 0 with value: 0.5963976828472609.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000389 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:51:04,920] Trial 10 finished with value: 0.6021012432247299 and parameters: {'n_estimators': 359, 'max_depth': 8, 'learning_rate': 0.02075344246109634, 'num_leaves': 198, 'min_child_samples': 98}. Best is trial 10 with value: 0.6021012432247299.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000372 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:51:08,682] Trial 11 finished with value: 0.601151734305498 and parameters: {'n_estimators': 362, 'max_depth': 9, 'learning_rate': 0.011322608207091397, 'num_leaves': 199, 'min_child_samples': 100}. Best is trial 10 with value: 0.6021012432247299.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000471 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595








[I 2026-01-04 19:51:11,671] Trial 12 finished with value: 0.6011320165416265 and parameters: {'n_estimators': 379, 'max_depth': 8, 'learning_rate': 0.014146616415604638, 'num_leaves': 200, 'min_child_samples': 75}. Best is trial 10 with value: 0.6021012432247299.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000512 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595








[I 2026-01-04 19:51:12,680] Trial 13 finished with value: 0.6023084276286095 and parameters: {'n_estimators': 473, 'max_depth': 5, 'learning_rate': 0.05531042554706341, 'num_leaves': 173, 'min_child_samples': 86}. Best is trial 13 with value: 0.6023084276286095.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000506 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:51:13,768] Trial 14 finished with value: 0.5983968176317822 and parameters: {'n_estimators': 494, 'max_depth': 5, 'learning_rate': 0.05708960548596311, 'num_leaves': 170, 'min_child_samples': 81}. Best is trial 13 with value: 0.6023084276286095.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000363 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595




[I 2026-01-04 19:51:14,722] Trial 15 finished with value: 0.6026451084467155 and parameters: {'n_estimators': 444, 'max_depth': 5, 'learning_rate': 0.05586586176740245, 'num_leaves': 23, 'min_child_samples': 63}. Best is trial 15 with value: 0.6026451084467155.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000362 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:51:15,674] Trial 16 finished with value: 0.5978258404844734 and parameters: {'n_estimators': 438, 'max_depth': 5, 'learning_rate': 0.07067913440629363, 'num_leaves': 25, 'min_child_samples': 60}. Best is trial 15 with value: 0.6026451084467155.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000382 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595








[I 2026-01-04 19:51:18,718] Trial 17 finished with value: 0.57995814510263 and parameters: {'n_estimators': 465, 'max_depth': 10, 'learning_rate': 0.04565095893549845, 'num_leaves': 94, 'min_child_samples': 59}. Best is trial 15 with value: 0.6026451084467155.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000512 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:19,106] Trial 18 finished with value: 0.6059836709310273 and parameters: {'n_estimators': 116, 'max_depth': 6, 'learning_rate': 0.09867126098380821, 'num_leaves': 72, 'min_child_samples': 88}. Best is trial 18 with value: 0.6059836709310273.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000384 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:19,523] Trial 19 finished with value: 0.6003630730450478 and parameters: {'n_estimators': 106, 'max_depth': 20, 'learning_rate': 0.12530644317373252, 'num_leaves': 32, 'min_child_samples': 63}. Best is trial 18 with value: 0.6059836709310273.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000348 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:20,346] Trial 20 finished with value: 0.5895150987790958 and parameters: {'n_estimators': 190, 'max_depth': 7, 'learning_rate': 0.0977928472707982, 'num_leaves': 75, 'min_child_samples': 44}. Best is trial 18 with value: 0.6059836709310273.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000411 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595






[I 2026-01-04 19:51:21,411] Trial 21 finished with value: 0.6009571199760864 and parameters: {'n_estimators': 445, 'max_depth': 5, 'learning_rate': 0.06995365642307141, 'num_leaves': 44, 'min_child_samples': 87}. Best is trial 18 with value: 0.6059836709310273.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000378 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:21,877] Trial 22 finished with value: 0.606974044915883 and parameters: {'n_estimators': 124, 'max_depth': 6, 'learning_rate': 0.08482991824582427, 'num_leaves': 81, 'min_child_samples': 75}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000531 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:22,346] Trial 23 finished with value: 0.5983508259475518 and parameters: {'n_estimators': 101, 'max_depth': 7, 'learning_rate': 0.11797257596303769, 'num_leaves': 74, 'min_child_samples': 71}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000362 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:23,415] Trial 24 finished with value: 0.5864930064049212 and parameters: {'n_estimators': 138, 'max_depth': 11, 'learning_rate': 0.08589937672180689, 'num_leaves': 86, 'min_child_samples': 66}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000370 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595




[I 2026-01-04 19:51:24,016] Trial 25 finished with value: 0.5929843900378758 and parameters: {'n_estimators': 209, 'max_depth': 6, 'learning_rate': 0.14703443089922363, 'num_leaves': 53, 'min_child_samples': 91}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000370 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:24,660] Trial 26 finished with value: 0.5997875608120485 and parameters: {'n_estimators': 138, 'max_depth': 9, 'learning_rate': 0.08767139930237294, 'num_leaves': 38, 'min_child_samples': 54}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000369 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:25,328] Trial 27 finished with value: 0.5953820701246498 and parameters: {'n_estimators': 211, 'max_depth': 6, 'learning_rate': 0.11129756698652635, 'num_leaves': 55, 'min_child_samples': 77}. Best is trial 22 with value: 0.606974044915883.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000371 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:25,802] Trial 28 finished with value: 0.6142389071311067 and parameters: {'n_estimators': 142, 'max_depth': 8, 'learning_rate': 0.058866523588061964, 'num_leaves': 23, 'min_child_samples': 92}. Best is trial 28 with value: 0.6142389071311067.


[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000450 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.167062 -> initscore=-1.606595
[LightGBM] [Info] Start training from score -1.606595


[I 2026-01-04 19:51:27,183] Trial 29 finished with value: 0.6044442065167603 and parameters: {'n_estimators': 135, 'max_depth': 11, 'learning_rate': 0.026916290528722152, 'num_leaves': 85, 'min_child_samples': 94}. Best is trial 28 with value: 0.6142389071311067.


[LightGBM] [Info] Number of positive: 34155, number of negative: 17045
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000564 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 51200, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.667090 -> initscore=0.695052
[LightGBM] [Info] Start training from score 0.695052
[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000379 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [bi

['/Users/bariatmamaeva/Desktop/uplift-marketing-mlops/models/preprocessor.joblib']

# Evaluate

In [8]:
feature_cols = [col for col in train_df.columns if col not in ['conversion', 'treatment', 'offer']]
X_train_feat = preprocessor.transform(train_df[feature_cols])
t_train, y_train = train_df['treatment'], train_df['conversion']

# Модель вероятности назначения treatment
prop_model = LGBMClassifier(**best_params)
prop_model.fit(X_train_feat, t_train)

# Модель эффекта для treatment
effect_t = LGBMClassifier(**best_params)
effect_t.fit(X_train_feat[t_train==1], y_train[t_train==1])

# Модель эффекта для control
effect_c = LGBMClassifier(**best_params)
effect_c.fit(X_train_feat[t_train==0], y_train[t_train==0])

# Сохраняем как словарь
uplift_model_full = {
    'prop_model': prop_model,
    'effect_t': effect_t,
    'effect_c': effect_c
}

# Сохраняем модель
models_path = PROJECT_ROOT / config['folders']['models']
models_path.mkdir(parents=True, exist_ok=True)
joblib.dump(uplift_model_full, models_path / 'uplift_model.joblib')


[LightGBM] [Info] Number of positive: 34155, number of negative: 17045
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000484 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 51200, number of used features: 18
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.667090 -> initscore=0.695052
[LightGBM] [Info] Start training from score 0.695052
[LightGBM] [Info] Number of positive: 5706, number of negative: 28449
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000376 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1064
[LightGBM] [Info] Number of data points in the train set: 34155, number of used features: 18
[LightGBM] [Info] [bi

['/Users/bariatmamaeva/Desktop/uplift-marketing-mlops/models/uplift_model.joblib']