# Client Targeting

* **Section 0: Load dataset**
* **Section 1: Prepare datasets**
  * Data Split into (1) Training and (2) Client Targetting sets
  * Apply data processing
  * Prepare training datasets - further split training data set into (1) train and (2) validation sets
* **Section 2: Models training**
  * Revenue regression models
  * Sales classification models
* **Section 3: Clients targeting**
  * Propensity scoring
  * Predict revenues
  * Prepare list of clients to target

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from utlis.data_utils import load_data, merge_data, process_features1, process_features2, get_feature_cols
from utlis.model_utils import train_revenue_model_xgb_optuna, train_sales_model_xgb_optuna_f1, predict_propensity
from utlis.targeting import calculate_revenues, run_full_targeting_pipeline, print_targeting_summary, assign_best_offer


## Section 0: Load dataset

In [2]:
print("1. Loading data...")
file = 'DataScientist_CaseStudy_Dataset.xlsx'
soc_dem, products, inflow, sales = load_data(file)
df = merge_data(soc_dem, products, inflow, sales)


1. Loading data...


## Section 1: Prepare datasets

### Data Split into (1) Training and (2) Client Targetting sets

In [3]:
classification_target_columns = ['Sale_CL', 'Sale_CC', 'Sale_MF']
regression_target_columns = ['Revenue_CL','Revenue_CC','Revenue_MF']

# Training data set
train_val = df.dropna(subset=classification_target_columns+regression_target_columns, how='all')

# Client Targetting set
test = df[df[classification_target_columns+regression_target_columns].isna().all(axis=1)].copy()

print(f"Training set: {train_val.shape[0]} clients ({train_val.shape[0]/len(df)*100:.1f}%)")
print(f"Client Targetting set: {test.shape[0]} clients ({test.shape[0]/len(df)*100:.1f}%)")

Training set: 969 clients (60.0%)
Client Targetting set: 646 clients (40.0%)


### Apply data processing

In [4]:
print( "*"*30 + "Before train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
train_val, sex_label_encoder = process_features2(train_val)
print("\n")

print( "*"*30 + "Before test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")
test, _ = process_features2(test, le=sex_label_encoder)

******************************Before train_val processing******************************

   Client Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  Count_CC  \
1    1217   M   38     165         1       NaN       NaN        NaN       NaN   
2     850   F   49      44         1       NaN       NaN        NaN       NaN   
3    1473   M   54      34         1       1.0       NaN        NaN       1.0   

   Count_CL  ...  TransactionsDeb_CA  TransactionsDebCash_Card  \
1       NaN  ...                 1.0                       0.0   
2       NaN  ...                 6.0                       0.0   
3       1.0  ...                38.0                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Rev

In [5]:
print( "*"*30 + "After train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
print("\n")

print( "*"*30 + "After test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")

******************************After train_val processing******************************

   Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
1    1217    1   38     165         1       0.0       0.0        0.0   
2     850    0   49      44         1       0.0       0.0        0.0   
3    1473    1   54      34         1       1.0       0.0        0.0   

   Count_CC  Count_CL  ...  TransactionsDebCash_Card  \
1       0.0       0.0  ...                       0.0   
2       0.0       0.0  ...                       0.0   
3       1.0       1.0  ...                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Revenue_MF  Revenue_CC  Revenue_CL  VolumeCredDebRatio  
1      0.0      0.0    

In [6]:
feature_cols = get_feature_cols(train_val)
train_val[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

In [7]:
target_columns = classification_target_columns + regression_target_columns
train_val[target_columns].isnull().sum()

Sale_CL       0
Sale_CC       0
Sale_MF       0
Revenue_CL    0
Revenue_CC    0
Revenue_MF    0
dtype: int64

In [8]:
test[feature_cols].isnull().sum()

Sex                             0
Age                             0
Tenure                          0
Count_CA                        0
Count_SA                        0
Count_MF                        0
Count_OVD                       0
Count_CC                        0
Count_CL                        0
ActBal_CA                       0
ActBal_SA                       0
ActBal_MF                       0
ActBal_OVD                      0
ActBal_CC                       0
ActBal_CL                       0
VolumeCred_CA                   0
TransactionsCred                0
VolumeDeb                       0
VolumeDebCash_Card              0
VolumeDebCashless_Card          0
VolumeDeb_PaymentOrder          0
TransactionsDeb                 0
TransactionsDebCash_Card        0
TransactionsDebCashless_Card    0
TransactionsDeb_PaymentOrder    0
VolumeCredDebRatio              0
dtype: int64

stratify=y_train_val[['Sale_CL', 'Sale_CC', 'Sale_MF']]

In [9]:
X_train_val = train_val[feature_cols].fillna(0)
y_train_val = train_val[target_columns]

X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.1, shuffle=True, random_state=11)

print( "X_train.shape, X_val.shape\n", X_train.shape, X_val.shape )

X_train.head(3)


X_train.shape, X_val.shape
 (870, 26) (97, 26)


Unnamed: 0,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,ActBal_CA,...,TransactionsCred,VolumeDeb,VolumeDebCash_Card,VolumeDebCashless_Card,VolumeDeb_PaymentOrder,TransactionsDeb,TransactionsDebCash_Card,TransactionsDebCashless_Card,TransactionsDeb_PaymentOrder,VolumeCredDebRatio
248,1,24,151,1,1.0,0.0,0.0,0.0,0.0,2278.614643,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7.895714
424,1,75,22,1,2.0,2.0,0.0,0.0,0.0,323.677143,...,3.0,346.567857,0.0,343.175,0.0,25.0,0.0,24.0,0.0,0.185774
474,1,34,48,1,1.0,4.0,0.0,0.0,0.0,819.151786,...,6.0,925.325,328.571429,94.217857,502.535714,14.0,3.0,2.0,9.0,1.332358


In [10]:
print( "y_train.shape, y_val.shape\n", y_train.shape, y_val.shape )
y_train.head(3)

y_train.shape, y_val.shape
 (870, 6) (97, 6)


Unnamed: 0,Sale_CL,Sale_CC,Sale_MF,Revenue_CL,Revenue_CC,Revenue_MF
248,0.0,0.0,0.0,0.0,0.0,0.0
424,0.0,1.0,0.0,0.0,2.001071,0.0
474,0.0,0.0,1.0,0.0,0.0,1.368929


## Section 2: Models Training

### Revenue regression models

In [11]:
models = {}
r2_scores = {}
rmse_scores = {}

# Separate revenue regression models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Revenue_{product}'
    model, r2, rmse, best_params, study = train_revenue_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_revenue"] = model
    r2_scores[product] = r2
    rmse_scores[product] = rmse
    print(f"{product} Revenue Model - R²: {r2:.3f}, RMSE: {rmse:.3f}")



[I 2025-07-14 20:42:15,605] A new study created in memory with name: no-name-900afa28-f69f-49ae-87d1-15b1cfe98aab
[I 2025-07-14 20:42:16,163] Trial 0 finished with value: -0.5384603289839722 and parameters: {'n_estimators': 396, 'max_depth': 7, 'learning_rate': 0.16897281247859303, 'subsample': 0.9645904502675773, 'colsample_bytree': 0.8356595396701214, 'reg_alpha': 0.08877351511515805, 'reg_lambda': 0.0029054660678417164, 'min_child_weight': 9, 'gamma': 3.069142799215285}. Best is trial 0 with value: -0.5384603289839722.
[I 2025-07-14 20:42:17,020] Trial 1 finished with value: -0.30359917472038767 and parameters: {'n_estimators': 498, 'max_depth': 9, 'learning_rate': 0.031737867451940495, 'subsample': 0.979024318139267, 'colsample_bytree': 0.9671536270448806, 'reg_alpha': 0.09931546254670477, 'reg_lambda': 0.4055437110018939, 'min_child_weight': 4, 'gamma': 2.1269323266738454}. Best is trial 1 with value: -0.30359917472038767.
[I 2025-07-14 20:42:17,470] Trial 2 finished with value: -

CL Revenue Model - R²: 0.006, RMSE: 7.776


[I 2025-07-14 20:42:43,078] Trial 0 finished with value: -2.5314754750923423 and parameters: {'n_estimators': 362, 'max_depth': 4, 'learning_rate': 0.015854714325841252, 'subsample': 0.6680908930770284, 'colsample_bytree': 0.8432318443608001, 'reg_alpha': 0.004633543891828198, 'reg_lambda': 0.150957992038831, 'min_child_weight': 5, 'gamma': 0.9871389033664879}. Best is trial 0 with value: -2.5314754750923423.
[I 2025-07-14 20:42:43,409] Trial 1 finished with value: -9.81223347610363 and parameters: {'n_estimators': 432, 'max_depth': 7, 'learning_rate': 0.26258339194043984, 'subsample': 0.9589111226980414, 'colsample_bytree': 0.7761476518911357, 'reg_alpha': 0.10489412763743867, 'reg_lambda': 0.01295934041750653, 'min_child_weight': 3, 'gamma': 0.9477415697746872}. Best is trial 0 with value: -2.5314754750923423.
[I 2025-07-14 20:42:43,668] Trial 2 finished with value: -0.5280586597548971 and parameters: {'n_estimators': 103, 'max_depth': 10, 'learning_rate': 0.010464577477538128, 'subs

CC Revenue Model - R²: -0.057, RMSE: 5.506


[I 2025-07-14 20:43:08,224] Trial 0 finished with value: -1.134136760536327 and parameters: {'n_estimators': 256, 'max_depth': 5, 'learning_rate': 0.0492702913908961, 'subsample': 0.9653907278827402, 'colsample_bytree': 0.6888448325244371, 'reg_alpha': 3.2933680803606005, 'reg_lambda': 0.9455755866672361, 'min_child_weight': 10, 'gamma': 1.3875760100797745}. Best is trial 0 with value: -1.134136760536327.
[I 2025-07-14 20:43:08,313] Trial 1 finished with value: -0.5935321629549559 and parameters: {'n_estimators': 62, 'max_depth': 6, 'learning_rate': 0.016741844510344276, 'subsample': 0.90396709351006, 'colsample_bytree': 0.8149867807267364, 'reg_alpha': 0.0024236319558201477, 'reg_lambda': 0.06859861716869935, 'min_child_weight': 3, 'gamma': 4.899899938433393}. Best is trial 1 with value: -0.5935321629549559.
[I 2025-07-14 20:43:08,545] Trial 2 finished with value: -0.8365852275005716 and parameters: {'n_estimators': 320, 'max_depth': 4, 'learning_rate': 0.02127106265222583, 'subsample

MF Revenue Model - R²: -0.045, RMSE: 4.248


In [12]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Revenue Model - R²: {r2_scores[product]:.3f}, RMSE: {rmse_scores[product]:.3f}")

CL Revenue Model - R²: 0.006, RMSE: 7.776
CC Revenue Model - R²: -0.057, RMSE: 5.506
MF Revenue Model - R²: -0.045, RMSE: 4.248


### Sales classification models

In [13]:
f1_scores = {}

# Separate sales classification models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Sale_{product}'
    model, f1, roc_auc, best_params, study = train_sales_model_xgb_optuna_f1(X_train, X_val, y_train[target_col], y_val[target_col])
    # model, f1, roc_auc, best_params, study  = train_sales_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_sales"] = model
    f1_scores[product] = f1
    print(f"{product} Sales Model - f1: {f1:.3f}")

[I 2025-07-14 20:43:28,310] A new study created in memory with name: no-name-635a49f4-cc9e-45fb-9888-ff852e7ec03d
[I 2025-07-14 20:43:28,439] Trial 0 finished with value: 0.41509433962264153 and parameters: {'n_estimators': 135, 'max_depth': 8, 'learning_rate': 0.03300507148009211, 'subsample': 0.7524871090441254, 'colsample_bytree': 0.9096949636785956, 'reg_alpha': 4.83070752687507, 'reg_lambda': 0.19393603060858847, 'min_child_weight': 5, 'gamma': 2.404503767367639}. Best is trial 0 with value: 0.41509433962264153.
[I 2025-07-14 20:43:28,640] Trial 1 finished with value: 0.39285714285714285 and parameters: {'n_estimators': 195, 'max_depth': 8, 'learning_rate': 0.10897888753323513, 'subsample': 0.7624768814976908, 'colsample_bytree': 0.6415434457661221, 'reg_alpha': 0.11088382693223853, 'reg_lambda': 0.045279466111795764, 'min_child_weight': 5, 'gamma': 0.6140800231227572}. Best is trial 0 with value: 0.41509433962264153.
[I 2025-07-14 20:43:28,910] Trial 2 finished with value: 0.4333

CL Sales Model - f1: 0.615


[I 2025-07-14 20:43:52,723] Trial 1 finished with value: 0.41025641025641024 and parameters: {'n_estimators': 319, 'max_depth': 6, 'learning_rate': 0.15070894910815286, 'subsample': 0.7634132930259047, 'colsample_bytree': 0.6126476006939342, 'reg_alpha': 0.0024749065030711783, 'reg_lambda': 0.0026954999897889384, 'min_child_weight': 3, 'gamma': 0.4412181793406028}. Best is trial 0 with value: 0.5217391304347826.
[I 2025-07-14 20:43:52,913] Trial 2 finished with value: 0.5306122448979592 and parameters: {'n_estimators': 315, 'max_depth': 3, 'learning_rate': 0.021581583403069045, 'subsample': 0.8991244826404061, 'colsample_bytree': 0.7062445973895372, 'reg_alpha': 0.0056539507356279314, 'reg_lambda': 0.03877053258046566, 'min_child_weight': 4, 'gamma': 0.4107417287605608}. Best is trial 2 with value: 0.5306122448979592.
[I 2025-07-14 20:43:52,975] Trial 3 finished with value: 0.5416666666666666 and parameters: {'n_estimators': 85, 'max_depth': 10, 'learning_rate': 0.0770934768721752, 'su

CC Sales Model - f1: 0.615


[I 2025-07-14 20:44:22,725] Trial 1 finished with value: 0.2926829268292683 and parameters: {'n_estimators': 411, 'max_depth': 3, 'learning_rate': 0.05738288618555787, 'subsample': 0.767912781394908, 'colsample_bytree': 0.619374621993035, 'reg_alpha': 2.1691824379751634, 'reg_lambda': 0.0031742885817466026, 'min_child_weight': 2, 'gamma': 4.418117269811416}. Best is trial 1 with value: 0.2926829268292683.
[I 2025-07-14 20:44:22,950] Trial 2 finished with value: 0.12903225806451613 and parameters: {'n_estimators': 72, 'max_depth': 9, 'learning_rate': 0.2833873165784454, 'subsample': 0.9921089992111798, 'colsample_bytree': 0.7452862011054209, 'reg_alpha': 0.05140331922803875, 'reg_lambda': 0.814491050520314, 'min_child_weight': 9, 'gamma': 1.9962816399780892}. Best is trial 1 with value: 0.2926829268292683.
[I 2025-07-14 20:44:23,287] Trial 3 finished with value: 0.13793103448275862 and parameters: {'n_estimators': 369, 'max_depth': 5, 'learning_rate': 0.0879358709341221, 'subsample': 0.

MF Sales Model - f1: 0.444


In [14]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Sales Model - f1: {f1_scores[product]:.3f}")

CL Sales Model - f1: 0.615
CC Sales Model - f1: 0.615
MF Sales Model - f1: 0.444


In [15]:
print( models.keys() )

dict_keys(['CL_revenue', 'CC_revenue', 'MF_revenue', 'CL_sales', 'CC_sales', 'MF_sales'])


## Section 3: Clients targeting

### Propensity Scoring

In [16]:
for product in ['CL', 'CC', 'MF']:
    test[f'p_{product.lower()}'] = predict_propensity(models[f"{product}_sales"] , test, feature_cols)

test[['p_cl', 'p_cc', 'p_mf']].head()

Unnamed: 0,p_cl,p_cc,p_mf
0,0.561214,0.269312,0.380281
6,0.725677,0.349461,0.555722
9,0.634305,0.359101,0.368378
10,0.293517,0.196224,0.40479
13,0.418788,0.358825,0.472232


In [17]:
(test[['p_cl', 'p_cc', 'p_mf']].mean() * 100).round(1).astype(str) + '%'

p_cl    45.3%
p_cc    44.3%
p_mf    45.4%
dtype: object

### Predict Revenues

In [18]:
predicted_revenues_df = calculate_revenues(test, models)
predicted_revenues_df.head()


Unnamed: 0,Client,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,...,Sale_MF,Sale_CC,Sale_CL,Revenue_MF,Revenue_CC,Revenue_CL,VolumeCredDebRatio,p_cl,p_cc,p_mf
0,909,1,21,27,1,0.0,0.0,1.0,0.0,1.0,...,,,,1.860177,2.737851,3.63997,1.747104,0.561214,0.269312,0.380281
6,699,1,37,175,1,0.0,4.0,1.0,0.0,0.0,...,,,,1.632214,2.071407,3.989403,1.560034,0.725677,0.349461,0.555722
9,528,0,19,70,1,0.0,0.0,1.0,0.0,0.0,...,,,,1.99696,2.703642,4.775798,1.114116,0.634305,0.359101,0.368378
10,1145,1,61,45,1,0.0,0.0,0.0,0.0,0.0,...,,,,3.046836,1.995164,2.564697,30.084959,0.293517,0.196224,0.40479
13,517,0,41,28,1,0.0,0.0,0.0,0.0,0.0,...,,,,1.866592,1.90395,2.438523,1.020149,0.418788,0.358825,0.472232


### Prepare list of clients to target

In [19]:
targets, forecast, df_targets = run_full_targeting_pipeline(predicted_revenues_df, top_frac=0.15)
print_targeting_summary(targets, forecast)

Stage 3,4: Assigning best offers...
Stage 5: Selecting top targets...
Stage 6: Calculating revenue forecast...

=== TARGETING SUMMARY ===
Total clients targeted: 96
Total revenue: $599.22
Lift vs baseline targeting: 74.7%

Offer distribution:
  CL: 89 clients (92.7%)
  CC: 7 clients (7.3%)

Revenue distribution:
  CC: $37.60 (6.3%)
  CL: $561.62 (93.7%)


In [None]:
# save targets to targeted_clients.csv
df_targets[['Client', 'Best_Offer', 'Expected_Revenue']].to_csv('targeted_clients.csv', index=False)
