# Client Targeting

* **Section 0: Load dataset**
* **Section 1: Prepare datasets**
  * Data Split into (1) Training and (2) Client Targetting sets
  * Apply data processing
  * Prepare training datasets - further split training data set into (1) train and (2) validation sets
* **Models training**
  * Revenue regression models
  * Sales classification models
* **Clients targeting**
  * Propensity scoring
  * Predict revenues
  * Prepare list of clients to target

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from utlis.data_utils import load_data, merge_data, process_features1, process_features2, get_feature_cols
from utlis.model_utils import train_revenue_model_xgb_optuna, train_sales_model_xgb_optuna_f1, predict_propensity
from utlis.targeting import calculate_revenues, run_full_targeting_pipeline, print_targeting_summary, assign_best_offer


## Section 0: Load dataset

In [2]:
print("1. Loading data...")
file = 'DataScientist_CaseStudy_Dataset.xlsx'
soc_dem, products, inflow, sales = load_data(file)
df = merge_data(soc_dem, products, inflow, sales)


1. Loading data...


## Section 1: Prepare datasets

### Data Split into (1) Training and (2) Client Targetting sets

In [3]:
classification_target_columns = ['Sale_CL', 'Sale_CC', 'Sale_MF']
regression_target_columns = ['Revenue_CL','Revenue_CC','Revenue_MF']

# Training data set
train_val = df.dropna(subset=classification_target_columns+regression_target_columns, how='all')

# Client Targetting set
test = df[df[classification_target_columns+regression_target_columns].isna().all(axis=1)].copy()

print(f"Training set: {train_val.shape[0]} clients ({train_val.shape[0]/len(df)*100:.1f}%)")
print(f"Client Targetting set: {test.shape[0]} clients ({test.shape[0]/len(df)*100:.1f}%)")

Training set: 969 clients (60.0%)
Client Targetting set: 646 clients (40.0%)


### Apply data processing

In [4]:
print( "*"*30 + "Before train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
train_val, sex_label_encoder = process_features2(train_val)
print("\n")

print( "*"*30 + "Before test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")
test, _ = process_features2(test, le=sex_label_encoder)

******************************Before train_val processing******************************

   Client Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  Count_CC  \
1    1217   M   38     165         1       NaN       NaN        NaN       NaN   
2     850   F   49      44         1       NaN       NaN        NaN       NaN   
3    1473   M   54      34         1       1.0       NaN        NaN       1.0   

   Count_CL  ...  TransactionsDeb_CA  TransactionsDebCash_Card  \
1       NaN  ...                 1.0                       0.0   
2       NaN  ...                 6.0                       0.0   
3       1.0  ...                38.0                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Rev

In [5]:
print( "*"*30 + "After train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
print("\n")

print( "*"*30 + "After test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")

******************************After train_val processing******************************

   Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
1    1217    1   38     165         1       NaN       NaN        NaN   
2     850    0   49      44         1       NaN       NaN        NaN   
3    1473    1   54      34         1       1.0       NaN        NaN   

   Count_CC  Count_CL  ...  TransactionsDebCash_Card  \
1       NaN       NaN  ...                       0.0   
2       NaN       NaN  ...                       0.0   
3       1.0       1.0  ...                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Revenue_MF  Revenue_CC  Revenue_CL  VolumeCredDebRatio  
1      0.0      0.0    

In [6]:
feature_cols = get_feature_cols(train_val)
train_val[feature_cols].isnull().sum()

Sex                               0
Age                               0
Tenure                            0
Count_CA                          0
Count_SA                        703
Count_MF                        783
Count_OVD                       716
Count_CC                        857
Count_CL                        888
ActBal_CA                         0
ActBal_SA                       703
ActBal_MF                       783
ActBal_OVD                      716
ActBal_CC                       857
ActBal_CL                       888
VolumeCred_CA                     0
TransactionsCred                  0
VolumeDeb                         0
VolumeDebCash_Card                0
VolumeDebCashless_Card            0
VolumeDeb_PaymentOrder            0
TransactionsDeb                   0
TransactionsDebCash_Card          0
TransactionsDebCashless_Card      0
TransactionsDeb_PaymentOrder      0
VolumeCredDebRatio                0
dtype: int64

In [7]:
target_columns = classification_target_columns + regression_target_columns
train_val[target_columns].isnull().sum()

Sale_CL       0
Sale_CC       0
Sale_MF       0
Revenue_CL    0
Revenue_CC    0
Revenue_MF    0
dtype: int64

In [8]:
test[feature_cols].isnull().sum()

Sex                               0
Age                               0
Tenure                            0
Count_CA                          0
Count_SA                        484
Count_MF                        523
Count_OVD                       477
Count_CC                        585
Count_CL                        589
ActBal_CA                         0
ActBal_SA                       484
ActBal_MF                       523
ActBal_OVD                      477
ActBal_CC                       585
ActBal_CL                       589
VolumeCred_CA                     0
TransactionsCred                  0
VolumeDeb                         0
VolumeDebCash_Card                0
VolumeDebCashless_Card            0
VolumeDeb_PaymentOrder            0
TransactionsDeb                   0
TransactionsDebCash_Card          0
TransactionsDebCashless_Card      0
TransactionsDeb_PaymentOrder      0
VolumeCredDebRatio                0
dtype: int64

In [9]:
X_train_val = train_val[feature_cols].fillna(0)
# y_train_val = (train_val[target_columns] > 0).astype(int)
y_train_val = train_val[target_columns]

random_state=42
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.2, random_state=random_state)

print( "X_train.shape, X_val.shape\n", X_train.shape, X_val.shape )

X_train.head(3)


X_train.shape, X_val.shape
 (773, 26) (194, 26)


Unnamed: 0,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,ActBal_CA,...,TransactionsCred,VolumeDeb,VolumeDebCash_Card,VolumeDebCashless_Card,VolumeDeb_PaymentOrder,TransactionsDeb,TransactionsDebCash_Card,TransactionsDebCashless_Card,TransactionsDeb_PaymentOrder,VolumeCredDebRatio
1215,1,39,55,1,0.0,0.0,1.0,1.0,0.0,19.862857,...,30.0,6410.215,1600.0,822.523214,386.714286,76.0,5.0,36.0,11.0,0.962953
305,0,27,169,1,0.0,0.0,0.0,0.0,0.0,111.173214,...,3.0,699.967857,209.82,327.116786,151.785714,50.0,10.0,29.0,1.0,0.985672
503,0,45,149,1,0.0,0.0,0.0,0.0,0.0,0.0,...,3.0,449.363929,357.142857,70.971071,17.857143,11.0,3.0,6.0,1.0,0.560157


In [10]:
print( "y_train.shape, y_val.shape\n", y_train.shape, y_val.shape )
y_train.head(3)

y_train.shape, y_val.shape
 (773, 6) (194, 6)


Unnamed: 0,Sale_CL,Sale_CC,Sale_MF,Revenue_CL,Revenue_CC,Revenue_MF
1215,0.0,1.0,0.0,0.0,2.466786,0.0
305,1.0,0.0,0.0,9.285714,0.0,0.0
503,0.0,0.0,1.0,0.0,0.0,56.139821


In [11]:
# scaler, X_train_scaled, X_val_scaled = scale_features(X_train, X_val)

## Models training

### Revenue regression models

In [None]:
models = {}
r2_scores = {}
rmse_scores = {}

# Separate revenue regression models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Revenue_{product}'
    model, r2, rmse, best_params, study = train_revenue_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_revenue"] = model
    r2_scores[product] = r2
    rmse_scores[product] = rmse
    print(f"{product} Revenue Model - R²: {r2:.3f}, RMSE: {rmse:.3f}")



[I 2025-07-14 11:58:07,828] A new study created in memory with name: no-name-65f873ee-9adc-4b7a-b1ad-28d7c76d0938
[I 2025-07-14 11:58:08,089] Trial 0 finished with value: 0.07635917174298279 and parameters: {'n_estimators': 58, 'max_depth': 8, 'learning_rate': 0.050630375842980815, 'subsample': 0.929379625717388, 'colsample_bytree': 0.8160099579958033, 'reg_alpha': 0.38947028725210014, 'reg_lambda': 0.04712005529372043, 'min_child_weight': 4, 'gamma': 4.717517247485988}. Best is trial 0 with value: 0.07635917174298279.
[I 2025-07-14 11:58:08,223] Trial 1 finished with value: 0.029050779813798555 and parameters: {'n_estimators': 78, 'max_depth': 9, 'learning_rate': 0.1257978769174635, 'subsample': 0.9971615534671009, 'colsample_bytree': 0.6907495100199703, 'reg_alpha': 0.0010533031705722434, 'reg_lambda': 0.7212429153865864, 'min_child_weight': 4, 'gamma': 2.586003779671817}. Best is trial 0 with value: 0.07635917174298279.
[I 2025-07-14 11:58:08,398] Trial 2 finished with value: 0.0373

CL Revenue Model - R²: 0.087, RMSE: 11.012


[I 2025-07-14 11:58:50,648] Trial 0 finished with value: -0.8854987866440887 and parameters: {'n_estimators': 377, 'max_depth': 7, 'learning_rate': 0.029287382234983977, 'subsample': 0.7533446992844143, 'colsample_bytree': 0.7661319536379971, 'reg_alpha': 0.0028670714300972674, 'reg_lambda': 0.0030719942997942283, 'min_child_weight': 10, 'gamma': 2.546926662142173}. Best is trial 0 with value: -0.8854987866440887.
[I 2025-07-14 11:58:50,944] Trial 1 finished with value: -0.4631814667426266 and parameters: {'n_estimators': 147, 'max_depth': 10, 'learning_rate': 0.02257683945103031, 'subsample': 0.9390396768073028, 'colsample_bytree': 0.7659481916067465, 'reg_alpha': 0.018654611946669657, 'reg_lambda': 0.6080019973716962, 'min_child_weight': 10, 'gamma': 4.711630698113828}. Best is trial 1 with value: -0.4631814667426266.
[I 2025-07-14 11:58:51,181] Trial 2 finished with value: -1.9119889964944647 and parameters: {'n_estimators': 183, 'max_depth': 5, 'learning_rate': 0.10873469432446281,

CC Revenue Model - R²: -0.033, RMSE: 6.908


[I 2025-07-14 11:59:16,196] Trial 0 finished with value: -0.2627495047173545 and parameters: {'n_estimators': 377, 'max_depth': 4, 'learning_rate': 0.28278959389360914, 'subsample': 0.7483328747965503, 'colsample_bytree': 0.6318682935321523, 'reg_alpha': 0.008809860269127553, 'reg_lambda': 1.410769738610234, 'min_child_weight': 6, 'gamma': 2.571137555266106}. Best is trial 0 with value: -0.2627495047173545.
[I 2025-07-14 11:59:16,567] Trial 1 finished with value: -0.5154404948766926 and parameters: {'n_estimators': 296, 'max_depth': 9, 'learning_rate': 0.2191137661200297, 'subsample': 0.8990283232726969, 'colsample_bytree': 0.9293687584381549, 'reg_alpha': 0.0017886464442892513, 'reg_lambda': 0.0010875115195800107, 'min_child_weight': 7, 'gamma': 1.119596720507321}. Best is trial 0 with value: -0.2627495047173545.
[I 2025-07-14 11:59:16,999] Trial 2 finished with value: -0.2097511323840866 and parameters: {'n_estimators': 200, 'max_depth': 7, 'learning_rate': 0.02305474012045973, 'subs

MF Revenue Model - R²: -0.009, RMSE: 7.042


In [13]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Revenue Model - R²: {r2_scores[product]:.3f}, RMSE: {rmse_scores[product]:.3f}")

CL Revenue Model - R²: 0.087, RMSE: 11.012
CC Revenue Model - R²: -0.033, RMSE: 6.908
MF Revenue Model - R²: -0.009, RMSE: 7.042


### Sales classification models

In [14]:
f1_scores = {}

# Separate sales classification models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Sale_{product}'
    model, f1, roc_auc, best_params, study = train_sales_model_xgb_optuna_f1(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_sales"] = model
    f1_scores[product] = f1
    print(f"{product} Sales Model - f1: {f1:.3f}")

[I 2025-07-14 11:59:54,601] A new study created in memory with name: no-name-65f748fd-4214-4258-acd4-bac9a413e5a1
[I 2025-07-14 11:59:55,196] Trial 0 finished with value: 0.4578313253012048 and parameters: {'n_estimators': 437, 'max_depth': 7, 'learning_rate': 0.05893086659147121, 'subsample': 0.8768436061963304, 'colsample_bytree': 0.8031912790096376, 'reg_alpha': 0.006664988792065458, 'reg_lambda': 1.009938095174021, 'min_child_weight': 8, 'gamma': 1.5431051947656105}. Best is trial 0 with value: 0.4578313253012048.
[I 2025-07-14 11:59:55,468] Trial 1 finished with value: 0.2777777777777778 and parameters: {'n_estimators': 422, 'max_depth': 3, 'learning_rate': 0.09410269854062671, 'subsample': 0.9736300157151326, 'colsample_bytree': 0.74059355567957, 'reg_alpha': 0.061087082353657875, 'reg_lambda': 0.9688921744947212, 'min_child_weight': 2, 'gamma': 4.082241143450537}. Best is trial 0 with value: 0.4578313253012048.
[I 2025-07-14 11:59:55,830] Trial 2 finished with value: 0.461538461

CL Sales Model - f1: 0.566


[I 2025-07-14 12:00:36,626] Trial 1 finished with value: 0.13114754098360656 and parameters: {'n_estimators': 258, 'max_depth': 4, 'learning_rate': 0.02005911728654972, 'subsample': 0.6794412372866075, 'colsample_bytree': 0.8251955244946714, 'reg_alpha': 0.002334533360604178, 'reg_lambda': 0.0033312963761114997, 'min_child_weight': 9, 'gamma': 0.9405195784030884}. Best is trial 0 with value: 0.345679012345679.
[I 2025-07-14 12:00:36,931] Trial 2 finished with value: 0.27692307692307694 and parameters: {'n_estimators': 314, 'max_depth': 6, 'learning_rate': 0.018909954626407773, 'subsample': 0.7152801512828398, 'colsample_bytree': 0.749792037416038, 'reg_alpha': 0.0014429962928079862, 'reg_lambda': 0.002699598439399374, 'min_child_weight': 5, 'gamma': 3.602537488197359}. Best is trial 0 with value: 0.345679012345679.
[I 2025-07-14 12:00:37,208] Trial 3 finished with value: 0.3170731707317073 and parameters: {'n_estimators': 385, 'max_depth': 9, 'learning_rate': 0.27758020113727677, 'subs

CC Sales Model - f1: 0.442


[I 2025-07-14 12:01:20,633] Trial 1 finished with value: 0.04878048780487805 and parameters: {'n_estimators': 308, 'max_depth': 7, 'learning_rate': 0.010481292733333964, 'subsample': 0.8820508279421813, 'colsample_bytree': 0.7964282365284809, 'reg_alpha': 0.005928556385065768, 'reg_lambda': 0.1661648829197056, 'min_child_weight': 5, 'gamma': 2.735242025810792}. Best is trial 1 with value: 0.04878048780487805.
[I 2025-07-14 12:01:21,158] Trial 2 finished with value: 0.13636363636363635 and parameters: {'n_estimators': 253, 'max_depth': 10, 'learning_rate': 0.026635511754888968, 'subsample': 0.8214684191979877, 'colsample_bytree': 0.9186020228352929, 'reg_alpha': 0.7025239787199016, 'reg_lambda': 0.00816783564041768, 'min_child_weight': 6, 'gamma': 0.2140509330840601}. Best is trial 2 with value: 0.13636363636363635.
[I 2025-07-14 12:01:21,441] Trial 3 finished with value: 0.05 and parameters: {'n_estimators': 287, 'max_depth': 3, 'learning_rate': 0.02897798684347736, 'subsample': 0.9648

MF Sales Model - f1: 0.340


In [15]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Sales Model - f1: {f1_scores[product]:.3f}")

CL Sales Model - f1: 0.566
CC Sales Model - f1: 0.442
MF Sales Model - f1: 0.340


In [16]:
print( models.keys() )

dict_keys(['CL_revenue', 'CC_revenue', 'MF_revenue', 'CL_sales', 'CC_sales', 'MF_sales'])


## Clients targeting

### Propensity Scoring

In [17]:
for product in ['CL', 'CC', 'MF']:
    test[f'p_{product.lower()}'] = predict_propensity(models[f"{product}_sales"] , test, feature_cols)

test[['p_cl', 'p_cc', 'p_mf']]

Unnamed: 0,p_cl,p_cc,p_mf
0,0.331993,0.114278,0.080387
6,0.482244,0.050744,0.228878
9,0.638992,0.504077,0.094329
10,0.226001,0.062412,0.194623
13,0.546649,0.230029,0.340796
...,...,...,...
1598,0.127554,0.253584,0.082231
1600,0.284582,0.051714,0.061692
1608,0.641556,0.141669,0.022725
1610,0.690307,0.279247,0.135603


### Predict Revenues

In [18]:
predicted_revenues_df = calculate_revenues(test, models)
predicted_revenues_df.head()


Unnamed: 0,Client,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,...,Sale_MF,Sale_CC,Sale_CL,Revenue_MF,Revenue_CC,Revenue_CL,VolumeCredDebRatio,p_cl,p_cc,p_mf
0,909,1,21,27,1,,,1.0,,1.0,...,,,,4.071222,3.535363,6.647781,1.747104,0.331993,0.114278,0.080387
6,699,1,37,175,1,,4.0,1.0,,,...,,,,1.7191,3.176804,4.528166,1.560034,0.482244,0.050744,0.228878
9,528,0,19,70,1,,,1.0,,,...,,,,3.177362,4.660526,8.768682,1.114116,0.638992,0.504077,0.094329
10,1145,1,61,45,1,,,,,,...,,,,4.950373,3.417247,6.934967,30.084959,0.226001,0.062412,0.194623
13,517,0,41,28,1,,,,,,...,,,,3.885494,3.497108,7.414339,1.020149,0.546649,0.230029,0.340796


In [19]:
# df_best_offer = assign_best_offer(predicted_revenues_df)
# df_best_offer.head(2)

### Prepare list of clients to target

In [20]:
targets, forecast, df_targets = run_full_targeting_pipeline(predicted_revenues_df, top_frac=0.15)
print_targeting_summary(targets, forecast)

Stage 3,4: Assigning best offers...
df.head(3)
    Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
0     909    1   21      27         1       NaN       NaN        1.0   
6     699    1   37     175         1       NaN       4.0        1.0   
9     528    0   19      70         1       NaN       NaN        1.0   

   Count_CC  Count_CL  ...  Revenue_CL  VolumeCredDebRatio      p_cl  \
0       NaN       1.0  ...    6.647781            1.747104  0.331993   
6       NaN       NaN  ...    4.528166            1.560034  0.482244   
9       NaN       NaN  ...    8.768682            1.114116  0.638992   

       p_cc      p_mf  Expected_Revenue_CL  Expected_Revenue_CC  \
0  0.114278  0.080387             2.207016             0.404013   
6  0.050744  0.228878             2.183679             0.161204   
9  0.504077  0.094329             5.603118             2.349265   

   Expected_Revenue_MF  Best_Offer  Expected_Revenue  
0             0.327272          CL          2.2070

In [21]:
targets.head()

Unnamed: 0,Client,Best_Offer,Expected_Revenue,Age,Tenure
760,828,CL,14.221807,27,221
166,217,CL,13.619447,17,152
709,504,CL,13.1002,20,55
1560,1119,CL,12.150184,22,177
1294,1073,CL,11.710987,4,9


In [22]:
# save targets to targeted_clients.csv
df_targets[['Client', 'Best_Offer', 'Expected_Revenue']].to_csv('targeted_clients.csv', index=False)
