# Client Targeting

* **Section 0: Load dataset**
* **Section 1: Prepare datasets**
  * Data Split into (1) Training and (2) Client Targetting sets
  * Apply data processing
  * Prepare training datasets - further split training data set into (1) train and (2) validation sets
* **Section 2: Models training**
  * Revenue regression models
  * Sales classification models
* **Section 3: Clients targeting**
  * Propensity scoring
  * Predict revenues
  * Prepare list of clients to target

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from utlis.data_utils import load_data, merge_data, process_features1, process_features2, get_feature_cols
from utlis.model_utils import train_revenue_model_xgb_optuna, train_sales_model_xgb_optuna_f1, predict_propensity
from utlis.targeting import calculate_revenues, run_full_targeting_pipeline, print_targeting_summary, assign_best_offer


## Section 0: Load dataset

In [2]:
print("1. Loading data...")
file = 'DataScientist_CaseStudy_Dataset.xlsx'
soc_dem, products, inflow, sales = load_data(file)
df = merge_data(soc_dem, products, inflow, sales)


1. Loading data...


## Section 1: Prepare datasets

### Data Split into (1) Training and (2) Client Targetting sets

In [3]:
classification_target_columns = ['Sale_CL', 'Sale_CC', 'Sale_MF']
regression_target_columns = ['Revenue_CL','Revenue_CC','Revenue_MF']

# Training data set
train_val = df.dropna(subset=classification_target_columns+regression_target_columns, how='all')

# Client Targetting set
test = df[df[classification_target_columns+regression_target_columns].isna().all(axis=1)].copy()

print(f"Training set: {train_val.shape[0]} clients ({train_val.shape[0]/len(df)*100:.1f}%)")
print(f"Client Targetting set: {test.shape[0]} clients ({test.shape[0]/len(df)*100:.1f}%)")

Training set: 969 clients (60.0%)
Client Targetting set: 646 clients (40.0%)


### Apply data processing

In [4]:
print( "*"*30 + "Before train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
train_val, sex_label_encoder = process_features2(train_val)
print("\n")

print( "*"*30 + "Before test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")
test, _ = process_features2(test, le=sex_label_encoder)

******************************Before train_val processing******************************

   Client Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  Count_CC  \
1    1217   M   38     165         1       NaN       NaN        NaN       NaN   
2     850   F   49      44         1       NaN       NaN        NaN       NaN   
3    1473   M   54      34         1       1.0       NaN        NaN       1.0   

   Count_CL  ...  TransactionsDeb_CA  TransactionsDebCash_Card  \
1       NaN  ...                 1.0                       0.0   
2       NaN  ...                 6.0                       0.0   
3       1.0  ...                38.0                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Rev

In [5]:
print( "*"*30 + "After train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
print("\n")

print( "*"*30 + "After test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")

******************************After train_val processing******************************

   Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
1    1217    1   38     165         1       0.0       0.0        0.0   
2     850    0   49      44         1       0.0       0.0        0.0   
3    1473    1   54      34         1       1.0       0.0        0.0   

   Count_CC  Count_CL  ...  TransactionsDebCash_Card  \
1       0.0       0.0  ...                       0.0   
2       0.0       0.0  ...                       0.0   
3       1.0       1.0  ...                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Revenue_MF  Revenue_CC  Revenue_CL  VolumeCredDebRatio  
1      0.0      0.0    

In [6]:
feature_cols = get_feature_cols(train_val)
train_val[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

In [7]:
target_columns = classification_target_columns + regression_target_columns
train_val[target_columns].isnull().sum()

Sale_CL       0
Sale_CC       0
Sale_MF       0
Revenue_CL    0
Revenue_CC    0
Revenue_MF    0
dtype: int64

In [8]:
test[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

In [9]:
X_train_val = train_val[feature_cols].fillna(0)
# y_train_val = (train_val[target_columns] > 0).astype(int)
y_train_val = train_val[target_columns]

random_state=42
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.1, random_state=random_state)

print( "X_train.shape, X_val.shape\n", X_train.shape, X_val.shape )

X_train.head(3)


X_train.shape, X_val.shape
 (870, 26) (97, 26)


Unnamed: 0,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,ActBal_CA,...,TransactionsCred,VolumeDeb,VolumeDebCash_Card,VolumeDebCashless_Card,VolumeDeb_PaymentOrder,TransactionsDeb,TransactionsDebCash_Card,TransactionsDebCashless_Card,TransactionsDeb_PaymentOrder,VolumeCredDebRatio
382,0,48,37,1,0.0,0.0,0.0,0.0,0.0,2262.178929,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276,0,52,115,1,0.0,0.0,0.0,0.0,0.0,1704.711786,...,3.0,556.364286,125.0,358.042857,41.678571,14.0,2.0,7.0,2.0,0.96116
938,1,34,237,1,1.0,0.0,0.0,0.0,0.0,1161.505714,...,10.0,8567.436429,0.0,84.909643,8477.5,14.0,0.0,1.0,6.0,0.869058


In [10]:
print( "y_train.shape, y_val.shape\n", y_train.shape, y_val.shape )
y_train.head(3)

y_train.shape, y_val.shape
 (870, 6) (97, 6)


Unnamed: 0,Sale_CL,Sale_CC,Sale_MF,Revenue_CL,Revenue_CC,Revenue_MF
382,0.0,1.0,0.0,0.0,4.035714,0.0
276,1.0,1.0,0.0,12.208214,4.928571,0.0
938,1.0,1.0,1.0,133.275357,2.679286,0.368214


## Section 2: Models Training

### Revenue regression models

In [11]:
models = {}
r2_scores = {}
rmse_scores = {}

# Separate revenue regression models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Revenue_{product}'
    model, r2, rmse, best_params, study = train_revenue_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_revenue"] = model
    r2_scores[product] = r2
    rmse_scores[product] = rmse
    print(f"{product} Revenue Model - R²: {r2:.3f}, RMSE: {rmse:.3f}")



[I 2025-07-14 17:25:43,865] A new study created in memory with name: no-name-5bdf30d2-3f54-46dc-8af4-8cb1e4ae6878
[I 2025-07-14 17:25:44,238] Trial 0 finished with value: -0.9467503090376359 and parameters: {'n_estimators': 233, 'max_depth': 5, 'learning_rate': 0.2158819850161675, 'subsample': 0.6740871291325753, 'colsample_bytree': 0.9388010112847059, 'reg_alpha': 0.004186162613434377, 'reg_lambda': 3.3607295561600465, 'min_child_weight': 6, 'gamma': 1.9138117627032525}. Best is trial 0 with value: -0.9467503090376359.
[I 2025-07-14 17:25:44,581] Trial 1 finished with value: -0.9765135025584841 and parameters: {'n_estimators': 438, 'max_depth': 10, 'learning_rate': 0.11834449784093612, 'subsample': 0.9073022415639165, 'colsample_bytree': 0.9969357223306103, 'reg_alpha': 0.003088075111370155, 'reg_lambda': 0.20826573524426575, 'min_child_weight': 5, 'gamma': 4.881964465299612}. Best is trial 0 with value: -0.9467503090376359.
[I 2025-07-14 17:25:44,722] Trial 2 finished with value: -0.

CL Revenue Model - R²: -0.094, RMSE: 4.705


[I 2025-07-14 17:26:04,245] Trial 1 finished with value: -2.944846228504548 and parameters: {'n_estimators': 117, 'max_depth': 7, 'learning_rate': 0.030494579356807133, 'subsample': 0.7055692805141125, 'colsample_bytree': 0.7132398152958627, 'reg_alpha': 0.07566819526976863, 'reg_lambda': 0.0011846293068286213, 'min_child_weight': 2, 'gamma': 3.070302335828109}. Best is trial 1 with value: -2.944846228504548.
[I 2025-07-14 17:26:04,949] Trial 2 finished with value: -1.9957983141843179 and parameters: {'n_estimators': 418, 'max_depth': 10, 'learning_rate': 0.018489316791012088, 'subsample': 0.9828855243143112, 'colsample_bytree': 0.7266053747406196, 'reg_alpha': 0.019019003353113385, 'reg_lambda': 0.12230643402660542, 'min_child_weight': 8, 'gamma': 3.3874925204287463}. Best is trial 2 with value: -1.9957983141843179.
[I 2025-07-14 17:26:05,172] Trial 3 finished with value: -1.049283731213798 and parameters: {'n_estimators': 387, 'max_depth': 3, 'learning_rate': 0.015932154421762296, 's

CC Revenue Model - R²: -0.051, RMSE: 5.099


[I 2025-07-14 17:26:25,568] Trial 1 finished with value: -0.2519119614843566 and parameters: {'n_estimators': 494, 'max_depth': 9, 'learning_rate': 0.04813835728782724, 'subsample': 0.6729746416445471, 'colsample_bytree': 0.6487617026311713, 'reg_alpha': 0.0011479097104733205, 'reg_lambda': 1.4562734159192887, 'min_child_weight': 3, 'gamma': 0.07082899734942194}. Best is trial 0 with value: -0.12470052718978009.
[I 2025-07-14 17:26:25,843] Trial 2 finished with value: -0.34806516674852395 and parameters: {'n_estimators': 349, 'max_depth': 9, 'learning_rate': 0.16306168812867136, 'subsample': 0.9630875753563393, 'colsample_bytree': 0.7868535197738971, 'reg_alpha': 8.62688718091077, 'reg_lambda': 0.2881063729554994, 'min_child_weight': 2, 'gamma': 2.534351446194086}. Best is trial 0 with value: -0.12470052718978009.
[I 2025-07-14 17:26:26,254] Trial 3 finished with value: -0.07252185126814292 and parameters: {'n_estimators': 209, 'max_depth': 10, 'learning_rate': 0.0240633188642729, 'sub

MF Revenue Model - R²: 0.002, RMSE: 8.557


In [12]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Revenue Model - R²: {r2_scores[product]:.3f}, RMSE: {rmse_scores[product]:.3f}")

CL Revenue Model - R²: -0.094, RMSE: 4.705
CC Revenue Model - R²: -0.051, RMSE: 5.099
MF Revenue Model - R²: 0.002, RMSE: 8.557


### Sales classification models

In [13]:
f1_scores = {}

# Separate sales classification models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Sale_{product}'
    model, f1, roc_auc, best_params, study = train_sales_model_xgb_optuna_f1(X_train, X_val, y_train[target_col], y_val[target_col])
    # model, f1, roc_auc, best_params, study  = train_sales_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_sales"] = model
    f1_scores[product] = f1
    print(f"{product} Sales Model - f1: {f1:.3f}")

[I 2025-07-14 17:26:54,293] A new study created in memory with name: no-name-6b2e0ba1-b67b-4f20-9173-36151191e262
[I 2025-07-14 17:26:54,560] Trial 0 finished with value: 0.40816326530612246 and parameters: {'n_estimators': 53, 'max_depth': 10, 'learning_rate': 0.03644186211869349, 'subsample': 0.623440335384945, 'colsample_bytree': 0.7724426511614575, 'reg_alpha': 7.450033684982631, 'reg_lambda': 0.09293471239370221, 'min_child_weight': 1, 'gamma': 1.9935224463871077}. Best is trial 0 with value: 0.40816326530612246.
[I 2025-07-14 17:26:54,762] Trial 1 finished with value: 0.39285714285714285 and parameters: {'n_estimators': 159, 'max_depth': 3, 'learning_rate': 0.024514642659642732, 'subsample': 0.9883032737980759, 'colsample_bytree': 0.7020028100122906, 'reg_alpha': 4.9407493756040415, 'reg_lambda': 3.1109403402591136, 'min_child_weight': 10, 'gamma': 0.7152251491260575}. Best is trial 0 with value: 0.40816326530612246.
[I 2025-07-14 17:26:55,032] Trial 2 finished with value: 0.3703

CL Sales Model - f1: 0.533


[I 2025-07-14 17:27:34,128] Trial 0 finished with value: 0.4583333333333333 and parameters: {'n_estimators': 438, 'max_depth': 4, 'learning_rate': 0.010761173599558235, 'subsample': 0.8906350267978917, 'colsample_bytree': 0.6777445262396241, 'reg_alpha': 1.4757437453814017, 'reg_lambda': 0.13948114705053055, 'min_child_weight': 5, 'gamma': 0.9859674941572694}. Best is trial 0 with value: 0.4583333333333333.
[I 2025-07-14 17:27:34,392] Trial 1 finished with value: 0.391304347826087 and parameters: {'n_estimators': 169, 'max_depth': 8, 'learning_rate': 0.15674334235802997, 'subsample': 0.632649543593704, 'colsample_bytree': 0.6477281185836259, 'reg_alpha': 0.0021781228860091113, 'reg_lambda': 0.5671194986298225, 'min_child_weight': 6, 'gamma': 0.1227479979598356}. Best is trial 0 with value: 0.4583333333333333.
[I 2025-07-14 17:27:34,711] Trial 2 finished with value: 0.46153846153846156 and parameters: {'n_estimators': 319, 'max_depth': 8, 'learning_rate': 0.01692913692733179, 'subsample

CC Sales Model - f1: 0.549


[I 2025-07-14 17:28:06,205] Trial 0 finished with value: 0.2222222222222222 and parameters: {'n_estimators': 260, 'max_depth': 6, 'learning_rate': 0.023064917753738135, 'subsample': 0.8782971379870069, 'colsample_bytree': 0.851582049733228, 'reg_alpha': 0.00544705428036118, 'reg_lambda': 0.36736114599693587, 'min_child_weight': 4, 'gamma': 0.23258189858263723}. Best is trial 0 with value: 0.2222222222222222.
[I 2025-07-14 17:28:06,356] Trial 1 finished with value: 0.3076923076923077 and parameters: {'n_estimators': 108, 'max_depth': 10, 'learning_rate': 0.2742001991501661, 'subsample': 0.9043303878053675, 'colsample_bytree': 0.9828188548395432, 'reg_alpha': 0.06430617785602319, 'reg_lambda': 1.2967572280412443, 'min_child_weight': 10, 'gamma': 1.2101602635189672}. Best is trial 1 with value: 0.3076923076923077.
[I 2025-07-14 17:28:06,691] Trial 2 finished with value: 0.25 and parameters: {'n_estimators': 163, 'max_depth': 9, 'learning_rate': 0.014687817126897481, 'subsample': 0.6281181

MF Sales Model - f1: 0.381


In [14]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Sales Model - f1: {f1_scores[product]:.3f}")

CL Sales Model - f1: 0.533
CC Sales Model - f1: 0.549
MF Sales Model - f1: 0.381


In [15]:
print( models.keys() )

dict_keys(['CL_revenue', 'CC_revenue', 'MF_revenue', 'CL_sales', 'CC_sales', 'MF_sales'])


## Section 3: Clients targeting

### Propensity Scoring

In [16]:
for product in ['CL', 'CC', 'MF']:
    test[f'p_{product.lower()}'] = predict_propensity(models[f"{product}_sales"] , test, feature_cols)

test[['p_cl', 'p_cc', 'p_mf']]

Unnamed: 0,p_cl,p_cc,p_mf
0,0.603545,0.242269,0.302070
6,0.577690,0.234166,0.493637
9,0.646792,0.491938,0.253135
10,0.364351,0.200187,0.442161
13,0.484572,0.311697,0.492773
...,...,...,...
1598,0.373833,0.407459,0.306492
1600,0.344878,0.292213,0.198320
1608,0.477667,0.550279,0.331795
1610,0.489226,0.463510,0.297550


### Predict Revenues

In [17]:
predicted_revenues_df = calculate_revenues(test, models)
predicted_revenues_df.head()


Unnamed: 0,Client,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,...,Sale_MF,Sale_CC,Sale_CL,Revenue_MF,Revenue_CC,Revenue_CL,VolumeCredDebRatio,p_cl,p_cc,p_mf
0,909,1,21,27,1,0.0,0.0,1.0,0.0,1.0,...,,,,1.964688,2.460992,3.581117,1.747104,0.603545,0.242269,0.30207
6,699,1,37,175,1,0.0,4.0,1.0,0.0,0.0,...,,,,1.310554,2.70113,4.675503,1.560034,0.57769,0.234166,0.493637
9,528,0,19,70,1,0.0,0.0,1.0,0.0,0.0,...,,,,1.768972,3.125132,3.907906,1.114116,0.646792,0.491938,0.253135
10,1145,1,61,45,1,0.0,0.0,0.0,0.0,0.0,...,,,,2.69741,2.099485,2.943126,30.084959,0.364351,0.200187,0.442161
13,517,0,41,28,1,0.0,0.0,0.0,0.0,0.0,...,,,,1.883629,1.865907,3.829043,1.020149,0.484572,0.311697,0.492773


### Prepare list of clients to target

In [18]:
targets, forecast, df_targets = run_full_targeting_pipeline(predicted_revenues_df, top_frac=0.15)
print_targeting_summary(targets, forecast)

Stage 3,4: Assigning best offers...
Stage 5: Selecting top targets...
Stage 6: Calculating revenue forecast...

=== TARGETING SUMMARY ===
Total clients targeted: 96
Total expected revenue: $389.37
Average expected revenue per client: $2.08
Lift vs baseline targeting: 95.1%

Offer distribution:
  CL: 59 clients (61.5%)
  CC: 24 clients (25.0%)
  MF: 13 clients (13.5%)


In [19]:
targets.head()

Unnamed: 0,Client,Best_Offer,Expected_Revenue,Age,Tenure
693,766,MF,11.801003,32,95
933,1093,MF,6.818237,22,181
967,265,MF,6.486712,26,179
1253,340,MF,6.014675,76,80
830,153,CC,5.961412,67,92


In [20]:
# save targets to targeted_clients.csv
df_targets[['Client', 'Best_Offer', 'Expected_Revenue']].to_csv('targeted_clients.csv', index=False)
