# Client Targeting

* **Section 0: Load dataset**
* **Section 1: Prepare datasets**
  * Data Split into (1) Training and (2) Client Targetting sets
  * Apply data processing
  * Prepare training datasets - further split training data set into (1) train and (2) validation sets
* **Section 2: Models training**
  * Revenue regression models
  * Sales classification models
* **Section 3: Clients targeting**
  * Propensity scoring
  * Predict revenues
  * Prepare list of clients to target

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from utlis.data_utils import load_data, merge_data, process_features1, process_features2, get_feature_cols
from utlis.model_utils import train_revenue_model_xgb_optuna, train_sales_model_xgb_optuna_f1, predict_propensity
from utlis.targeting import calculate_revenues, run_full_targeting_pipeline, print_targeting_summary, assign_best_offer


## Section 0: Load dataset

In [2]:
print("1. Loading data...")
file = 'DataScientist_CaseStudy_Dataset.xlsx'
soc_dem, products, inflow, sales = load_data(file)
df = merge_data(soc_dem, products, inflow, sales)


1. Loading data...


## Section 1: Prepare datasets

### Data Split into (1) Training and (2) Client Targetting sets

In [3]:
classification_target_columns = ['Sale_CL', 'Sale_CC', 'Sale_MF']
regression_target_columns = ['Revenue_CL','Revenue_CC','Revenue_MF']

# Training data set
train_val = df.dropna(subset=classification_target_columns+regression_target_columns, how='all')

# Client Targetting set
test = df[df[classification_target_columns+regression_target_columns].isna().all(axis=1)].copy()

print(f"Training set: {train_val.shape[0]} clients ({train_val.shape[0]/len(df)*100:.1f}%)")
print(f"Client Targetting set: {test.shape[0]} clients ({test.shape[0]/len(df)*100:.1f}%)")

Training set: 969 clients (60.0%)
Client Targetting set: 646 clients (40.0%)


### Apply data processing

In [4]:
print( "*"*30 + "Before train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
train_val, sex_label_encoder = process_features2(train_val)
print("\n")

print( "*"*30 + "Before test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")
test, _ = process_features2(test, le=sex_label_encoder)

******************************Before train_val processing******************************

   Client Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  Count_CC  \
1    1217   M   38     165         1       NaN       NaN        NaN       NaN   
2     850   F   49      44         1       NaN       NaN        NaN       NaN   
3    1473   M   54      34         1       1.0       NaN        NaN       1.0   

   Count_CL  ...  TransactionsDeb_CA  TransactionsDebCash_Card  \
1       NaN  ...                 1.0                       0.0   
2       NaN  ...                 6.0                       0.0   
3       1.0  ...                38.0                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Rev

In [5]:
print( "*"*30 + "After train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
print("\n")

print( "*"*30 + "After test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")

******************************After train_val processing******************************

   Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
1    1217    1   38     165         1       0.0       0.0        0.0   
2     850    0   49      44         1       0.0       0.0        0.0   
3    1473    1   54      34         1       1.0       0.0        0.0   

   Count_CC  Count_CL  ...  TransactionsDebCash_Card  \
1       0.0       0.0  ...                       0.0   
2       0.0       0.0  ...                       0.0   
3       1.0       1.0  ...                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Revenue_MF  Revenue_CC  Revenue_CL  VolumeCredDebRatio  
1      0.0      0.0    

In [6]:
feature_cols = get_feature_cols(train_val)
train_val[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

In [7]:
target_columns = classification_target_columns + regression_target_columns
train_val[target_columns].isnull().sum()

Sale_CL       0
Sale_CC       0
Sale_MF       0
Revenue_CL    0
Revenue_CC    0
Revenue_MF    0
dtype: int64

In [8]:
test[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

stratify=y_train_val[['Sale_CL', 'Sale_CC', 'Sale_MF']]

In [None]:
X_train_val = train_val[feature_cols].fillna(0)
y_train_val = train_val[target_columns]

X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.15, shuffle=True, random_state=42)

print( "X_train.shape, X_val.shape\n", X_train.shape, X_val.shape )

X_train.head(3)


X_train.shape, X_val.shape
 (821, 26) (146, 26)


Unnamed: 0,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,ActBal_CA,...,TransactionsCred,VolumeDeb,VolumeDebCash_Card,VolumeDebCashless_Card,VolumeDeb_PaymentOrder,TransactionsDeb,TransactionsDebCash_Card,TransactionsDebCashless_Card,TransactionsDeb_PaymentOrder,VolumeCredDebRatio
737,0,55,31,3,0.0,0.0,0.0,0.0,0.0,3175.264643,...,2.0,404.005357,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.881825
1017,0,6,172,1,0.0,0.0,0.0,0.0,0.0,1824.04,...,2.0,691.853571,500.0,189.889286,0.0,9.0,1.0,7.0,0.0,0.945673
1291,1,33,17,1,0.0,0.0,1.0,1.0,0.0,676.768929,...,26.0,3808.083214,210.714286,509.878571,177.964286,62.0,1.0,27.0,8.0,0.881983


In [10]:
print( "y_train.shape, y_val.shape\n", y_train.shape, y_val.shape )
y_train.head(3)

y_train.shape, y_val.shape
 (821, 6) (146, 6)


Unnamed: 0,Sale_CL,Sale_CC,Sale_MF,Revenue_CL,Revenue_CC,Revenue_MF
737,0.0,0.0,0.0,0.0,0.0,0.0
1017,1.0,1.0,1.0,46.392857,5.170357,2.461964
1291,0.0,0.0,0.0,0.0,0.0,0.0


## Section 2: Models Training

### Revenue regression models

In [11]:
models = {}
r2_scores = {}
rmse_scores = {}

# Separate revenue regression models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Revenue_{product}'
    model, r2, rmse, best_params, study = train_revenue_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_revenue"] = model
    r2_scores[product] = r2
    rmse_scores[product] = rmse
    print(f"{product} Revenue Model - R²: {r2:.3f}, RMSE: {rmse:.3f}")



[I 2025-07-14 17:56:06,737] A new study created in memory with name: no-name-997e4eaa-c3c2-4fd1-a8dc-f27f1ceaf836
[I 2025-07-14 17:56:07,681] Trial 0 finished with value: 0.03698880485769107 and parameters: {'n_estimators': 443, 'max_depth': 8, 'learning_rate': 0.0953021900905327, 'subsample': 0.7565772491208074, 'colsample_bytree': 0.6236139492447047, 'reg_alpha': 1.4864323942559983, 'reg_lambda': 0.05557604987310241, 'min_child_weight': 5, 'gamma': 3.2104988486061634}. Best is trial 0 with value: 0.03698880485769107.
[I 2025-07-14 17:56:08,036] Trial 1 finished with value: 0.011415103236641522 and parameters: {'n_estimators': 113, 'max_depth': 5, 'learning_rate': 0.08857520009405234, 'subsample': 0.8540631049148992, 'colsample_bytree': 0.6276811820811745, 'reg_alpha': 0.004105954515162749, 'reg_lambda': 0.02677830909390283, 'min_child_weight': 4, 'gamma': 2.135739798918908}. Best is trial 0 with value: 0.03698880485769107.
[I 2025-07-14 17:56:10,078] Trial 2 finished with value: 0.03

CL Revenue Model - R²: 0.109, RMSE: 11.102


[I 2025-07-14 17:57:03,346] Trial 0 finished with value: -1.8222412431553674 and parameters: {'n_estimators': 484, 'max_depth': 5, 'learning_rate': 0.013773976992355634, 'subsample': 0.6367869964181726, 'colsample_bytree': 0.9343243272042505, 'reg_alpha': 1.7955263158090204, 'reg_lambda': 6.40908060562525, 'min_child_weight': 2, 'gamma': 3.3932724752255625}. Best is trial 0 with value: -1.8222412431553674.
[I 2025-07-14 17:57:04,284] Trial 1 finished with value: -3.6828043951785894 and parameters: {'n_estimators': 362, 'max_depth': 7, 'learning_rate': 0.05449240571576369, 'subsample': 0.8054686076698798, 'colsample_bytree': 0.8458155823797756, 'reg_alpha': 0.06901950294408482, 'reg_lambda': 0.05136907346567702, 'min_child_weight': 6, 'gamma': 4.245011527712135}. Best is trial 0 with value: -1.8222412431553674.
[I 2025-07-14 17:57:04,819] Trial 2 finished with value: -4.112300956983254 and parameters: {'n_estimators': 177, 'max_depth': 7, 'learning_rate': 0.10209391700766818, 'subsample

CC Revenue Model - R²: -0.046, RMSE: 4.753


[I 2025-07-14 17:57:49,120] Trial 0 finished with value: -0.11967734452650158 and parameters: {'n_estimators': 220, 'max_depth': 4, 'learning_rate': 0.05905946902286625, 'subsample': 0.8650945415341296, 'colsample_bytree': 0.9961514458814388, 'reg_alpha': 0.31104919005776854, 'reg_lambda': 7.522466142299998, 'min_child_weight': 8, 'gamma': 2.3388033824090018}. Best is trial 0 with value: -0.11967734452650158.
[I 2025-07-14 17:57:49,451] Trial 1 finished with value: -0.21321227505475981 and parameters: {'n_estimators': 145, 'max_depth': 6, 'learning_rate': 0.03203663710007246, 'subsample': 0.9140056449776625, 'colsample_bytree': 0.7839439227713836, 'reg_alpha': 6.918222240717901, 'reg_lambda': 0.02880807432420491, 'min_child_weight': 5, 'gamma': 4.996651159102136}. Best is trial 0 with value: -0.11967734452650158.
[I 2025-07-14 17:57:49,759] Trial 2 finished with value: -0.06830351708313898 and parameters: {'n_estimators': 133, 'max_depth': 7, 'learning_rate': 0.01648367074406914, 'subs

MF Revenue Model - R²: -0.012, RMSE: 7.601


In [12]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Revenue Model - R²: {r2_scores[product]:.3f}, RMSE: {rmse_scores[product]:.3f}")

CL Revenue Model - R²: 0.109, RMSE: 11.102
CC Revenue Model - R²: -0.046, RMSE: 4.753
MF Revenue Model - R²: -0.012, RMSE: 7.601


### Sales classification models

In [13]:
f1_scores = {}

# Separate sales classification models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Sale_{product}'
    model, f1, roc_auc, best_params, study = train_sales_model_xgb_optuna_f1(X_train, X_val, y_train[target_col], y_val[target_col])
    # model, f1, roc_auc, best_params, study  = train_sales_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_sales"] = model
    f1_scores[product] = f1
    print(f"{product} Sales Model - f1: {f1:.3f}")

[I 2025-07-14 17:58:41,517] A new study created in memory with name: no-name-1c979764-1856-4283-85cf-ce7d12b02f00
[I 2025-07-14 17:58:41,660] Trial 0 finished with value: 0.4691358024691358 and parameters: {'n_estimators': 63, 'max_depth': 3, 'learning_rate': 0.0441711287971863, 'subsample': 0.9826540524142956, 'colsample_bytree': 0.8639418894590811, 'reg_alpha': 0.8484277517580637, 'reg_lambda': 0.0023346476982018466, 'min_child_weight': 7, 'gamma': 4.851670197285741}. Best is trial 0 with value: 0.4691358024691358.
[I 2025-07-14 17:58:41,910] Trial 1 finished with value: 0.4888888888888889 and parameters: {'n_estimators': 209, 'max_depth': 10, 'learning_rate': 0.08878650591042314, 'subsample': 0.7401276486268203, 'colsample_bytree': 0.6788682120116661, 'reg_alpha': 8.700178650596168, 'reg_lambda': 0.03800489631021337, 'min_child_weight': 5, 'gamma': 1.1804608583638072}. Best is trial 1 with value: 0.4888888888888889.
[I 2025-07-14 17:58:42,356] Trial 2 finished with value: 0.51162790

CL Sales Model - f1: 0.556


[I 2025-07-14 17:59:37,020] Trial 0 finished with value: 0.417910447761194 and parameters: {'n_estimators': 243, 'max_depth': 10, 'learning_rate': 0.02787362915247304, 'subsample': 0.9152829908359532, 'colsample_bytree': 0.8814228452782371, 'reg_alpha': 9.570843329380049, 'reg_lambda': 0.042976280994929554, 'min_child_weight': 1, 'gamma': 4.3742680630372535}. Best is trial 0 with value: 0.417910447761194.
[I 2025-07-14 17:59:37,227] Trial 1 finished with value: 0.42857142857142855 and parameters: {'n_estimators': 172, 'max_depth': 10, 'learning_rate': 0.28641985732066955, 'subsample': 0.8637994641285891, 'colsample_bytree': 0.8778267125229691, 'reg_alpha': 0.08646858788897413, 'reg_lambda': 1.2034414283895045, 'min_child_weight': 4, 'gamma': 4.7674420194033775}. Best is trial 1 with value: 0.42857142857142855.
[I 2025-07-14 17:59:37,430] Trial 2 finished with value: 0.5106382978723404 and parameters: {'n_estimators': 175, 'max_depth': 4, 'learning_rate': 0.2690860614417068, 'subsample'

CC Sales Model - f1: 0.589


[I 2025-07-14 18:00:26,575] Trial 0 finished with value: 0.40625 and parameters: {'n_estimators': 376, 'max_depth': 6, 'learning_rate': 0.010593775384022876, 'subsample': 0.6203920537616064, 'colsample_bytree': 0.644924909657881, 'reg_alpha': 1.0524110607924115, 'reg_lambda': 0.03701148451506248, 'min_child_weight': 7, 'gamma': 2.9314484645017624}. Best is trial 0 with value: 0.40625.
[I 2025-07-14 18:00:27,022] Trial 1 finished with value: 0.3157894736842105 and parameters: {'n_estimators': 471, 'max_depth': 8, 'learning_rate': 0.2476171784132002, 'subsample': 0.9956820085555504, 'colsample_bytree': 0.8486410334991562, 'reg_alpha': 0.6389319233505942, 'reg_lambda': 0.005822558617160812, 'min_child_weight': 9, 'gamma': 3.2730889116310684}. Best is trial 0 with value: 0.40625.
[I 2025-07-14 18:00:27,507] Trial 2 finished with value: 0.36363636363636365 and parameters: {'n_estimators': 500, 'max_depth': 9, 'learning_rate': 0.11810699733657447, 'subsample': 0.9429801312604489, 'colsample_

MF Sales Model - f1: 0.514


In [14]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Sales Model - f1: {f1_scores[product]:.3f}")

CL Sales Model - f1: 0.556
CC Sales Model - f1: 0.589
MF Sales Model - f1: 0.514


In [15]:
print( models.keys() )

dict_keys(['CL_revenue', 'CC_revenue', 'MF_revenue', 'CL_sales', 'CC_sales', 'MF_sales'])


## Section 3: Clients targeting

### Propensity Scoring

In [16]:
for product in ['CL', 'CC', 'MF']:
    test[f'p_{product.lower()}'] = predict_propensity(models[f"{product}_sales"] , test, feature_cols)

test[['p_cl', 'p_cc', 'p_mf']]

Unnamed: 0,p_cl,p_cc,p_mf
0,0.525874,0.429565,0.399317
6,0.640935,0.304374,0.498236
9,0.719078,0.549549,0.412049
10,0.340194,0.302981,0.494206
13,0.412703,0.488096,0.442772
...,...,...,...
1598,0.344043,0.487030,0.434074
1600,0.337001,0.279214,0.410129
1608,0.395711,0.524857,0.313936
1610,0.498464,0.490543,0.435885


### Predict Revenues

In [17]:
predicted_revenues_df = calculate_revenues(test, models)
predicted_revenues_df.head()


Unnamed: 0,Client,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,...,Sale_MF,Sale_CC,Sale_CL,Revenue_MF,Revenue_CC,Revenue_CL,VolumeCredDebRatio,p_cl,p_cc,p_mf
0,909,1,21,27,1,0.0,0.0,1.0,0.0,1.0,...,,,,1.403815,2.064963,4.812676,1.747104,0.525874,0.429565,0.399317
6,699,1,37,175,1,0.0,4.0,1.0,0.0,0.0,...,,,,1.292778,2.072767,2.498749,1.560034,0.640935,0.304374,0.498236
9,528,0,19,70,1,0.0,0.0,1.0,0.0,0.0,...,,,,1.751386,2.714209,5.664531,1.114116,0.719078,0.549549,0.412049
10,1145,1,61,45,1,0.0,0.0,0.0,0.0,0.0,...,,,,5.033252,2.020648,0.917377,30.084959,0.340194,0.302981,0.494206
13,517,0,41,28,1,0.0,0.0,0.0,0.0,0.0,...,,,,2.038966,2.095122,3.115516,1.020149,0.412703,0.488096,0.442772


### Prepare list of clients to target

In [18]:
targets, forecast, df_targets = run_full_targeting_pipeline(predicted_revenues_df, top_frac=0.15)
print_targeting_summary(targets, forecast)

Stage 3,4: Assigning best offers...
Stage 5: Selecting top targets...
Stage 6: Calculating revenue forecast...

=== TARGETING SUMMARY ===
Total clients targeted: 96
Total expected revenue: $617.60
Average expected revenue per client: $2.50
Lift vs baseline targeting: 157.6%

Offer distribution:
  CL: 96 clients (100.0%)


In [19]:
targets.head()

Unnamed: 0,Client,Best_Offer,Expected_Revenue,Age,Tenure
505,878,CL,16.199154,5,173
1408,498,CL,14.651286,21,176
166,217,CL,13.333215,17,152
435,1530,CL,11.843295,22,153
1560,1119,CL,11.534856,22,177


In [20]:
# save targets to targeted_clients.csv
df_targets[['Client', 'Best_Offer', 'Expected_Revenue']].to_csv('targeted_clients.csv', index=False)
