# Client Targeting

* **Section 0: Load dataset**
* **Section 1: Prepare datasets**
  * Data Split into (1) Training and (2) Client Targetting sets
  * Apply data processing
  * Prepare training datasets - further split training data set into (1) train and (2) validation sets
* **Section 2: Models training**
  * Revenue regression models
  * Sales classification models
* **Section 3: Clients targeting**
  * Propensity scoring
  * Predict revenues
  * Prepare list of clients to target

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

from utlis.data_utils import load_data, merge_data, process_features1, process_features2, get_feature_cols
from utlis.model_utils import train_revenue_model_xgb_optuna, train_sales_model_xgb_optuna_f1, predict_propensity, train_sales_model_xgb_optuna
from utlis.targeting import calculate_revenues, run_full_targeting_pipeline, print_targeting_summary, assign_best_offer


## Section 0: Load dataset

In [2]:
print("1. Loading data...")
file = 'DataScientist_CaseStudy_Dataset.xlsx'
soc_dem, products, inflow, sales = load_data(file)
df = merge_data(soc_dem, products, inflow, sales)


1. Loading data...


## Section 1: Prepare datasets

### Data Split into (1) Training and (2) Client Targetting sets

In [3]:
classification_target_columns = ['Sale_CL', 'Sale_CC', 'Sale_MF']
regression_target_columns = ['Revenue_CL','Revenue_CC','Revenue_MF']

# Training data set
train_val = df.dropna(subset=classification_target_columns+regression_target_columns, how='all')

# Client Targetting set
test = df[df[classification_target_columns+regression_target_columns].isna().all(axis=1)].copy()

print(f"Training set: {train_val.shape[0]} clients ({train_val.shape[0]/len(df)*100:.1f}%)")
print(f"Client Targetting set: {test.shape[0]} clients ({test.shape[0]/len(df)*100:.1f}%)")

Training set: 969 clients (60.0%)
Client Targetting set: 646 clients (40.0%)


### Apply data processing

In [4]:
print( "*"*30 + "Before train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
train_val, sex_label_encoder = process_features2(train_val)
print("\n")

print( "*"*30 + "Before test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")
test, _ = process_features2(test, le=sex_label_encoder)

******************************Before train_val processing******************************

   Client Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  Count_CC  \
1    1217   M   38     165         1       NaN       NaN        NaN       NaN   
2     850   F   49      44         1       NaN       NaN        NaN       NaN   
3    1473   M   54      34         1       1.0       NaN        NaN       1.0   

   Count_CL  ...  TransactionsDeb_CA  TransactionsDebCash_Card  \
1       NaN  ...                 1.0                       0.0   
2       NaN  ...                 6.0                       0.0   
3       1.0  ...                38.0                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Rev

In [5]:
print( "*"*30 + "After train_val processing" + "*"*30 + "\n")
print( train_val.head(3))
print( f"\n{train_val.shape=}")
print("\n")

print( "*"*30 + "After test processing" + "*"*30 + "\n")
print( test.head(3))
print( f"\n{test.shape=}")

******************************After train_val processing******************************

   Client  Sex  Age  Tenure  Count_CA  Count_SA  Count_MF  Count_OVD  \
1    1217    1   38     165         1       NaN       NaN        NaN   
2     850    0   49      44         1       NaN       NaN        NaN   
3    1473    1   54      34         1       1.0       NaN        NaN   

   Count_CC  Count_CL  ...  TransactionsDebCash_Card  \
1       NaN       NaN  ...                       0.0   
2       NaN       NaN  ...                       0.0   
3       1.0       1.0  ...                       1.0   

   TransactionsDebCashless_Card  TransactionsDeb_PaymentOrder  Sale_MF  \
1                           0.0                           1.0      0.0   
2                           0.0                           1.0      0.0   
3                          26.0                          11.0      1.0   

   Sale_CC  Sale_CL  Revenue_MF  Revenue_CC  Revenue_CL  VolumeCredDebRatio  
1      0.0      0.0    

In [6]:
feature_cols = get_feature_cols(train_val)
train_val[feature_cols].isnull().sum()

Sex                               0
Age                               0
Tenure                            0
Count_CA                          0
Count_SA                        703
Count_MF                        783
Count_OVD                       716
Count_CC                        857
Count_CL                        888
ActBal_CA                         0
ActBal_SA                       703
ActBal_MF                       783
ActBal_OVD                      716
ActBal_CC                       857
ActBal_CL                       888
VolumeCred_CA                     0
TransactionsCred                  0
VolumeDeb                         0
VolumeDebCash_Card                0
VolumeDebCashless_Card            0
VolumeDeb_PaymentOrder            0
TransactionsDeb                   0
TransactionsDebCash_Card          0
TransactionsDebCashless_Card      0
TransactionsDeb_PaymentOrder      0
VolumeCredDebRatio                0
dtype: int64

In [7]:
target_columns = classification_target_columns + regression_target_columns
train_val[target_columns].isnull().sum()

Sale_CL       0
Sale_CC       0
Sale_MF       0
Revenue_CL    0
Revenue_CC    0
Revenue_MF    0
dtype: int64

In [8]:
test[feature_cols].isnull().sum()

Sex                               0
Age                               0
Tenure                            0
Count_CA                          0
Count_SA                        484
Count_MF                        523
Count_OVD                       477
Count_CC                        585
Count_CL                        589
ActBal_CA                         0
ActBal_SA                       484
ActBal_MF                       523
ActBal_OVD                      477
ActBal_CC                       585
ActBal_CL                       589
VolumeCred_CA                     0
TransactionsCred                  0
VolumeDeb                         0
VolumeDebCash_Card                0
VolumeDebCashless_Card            0
VolumeDeb_PaymentOrder            0
TransactionsDeb                   0
TransactionsDebCash_Card          0
TransactionsDebCashless_Card      0
TransactionsDeb_PaymentOrder      0
VolumeCredDebRatio                0
dtype: int64

In [9]:
X_train_val = train_val[feature_cols].fillna(0)
# y_train_val = (train_val[target_columns] > 0).astype(int)
y_train_val = train_val[target_columns]

random_state=42
X_train, X_val, y_train, y_val = train_test_split(X_train_val, y_train_val, test_size=0.1, random_state=random_state)

print( "X_train.shape, X_val.shape\n", X_train.shape, X_val.shape )

X_train.head(3)


X_train.shape, X_val.shape
 (870, 26) (97, 26)


Unnamed: 0,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,ActBal_CA,...,TransactionsCred,VolumeDeb,VolumeDebCash_Card,VolumeDebCashless_Card,VolumeDeb_PaymentOrder,TransactionsDeb,TransactionsDebCash_Card,TransactionsDebCashless_Card,TransactionsDeb_PaymentOrder,VolumeCredDebRatio
382,0,48,37,1,0.0,0.0,0.0,0.0,0.0,2262.178929,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
276,0,52,115,1,0.0,0.0,0.0,0.0,0.0,1704.711786,...,3.0,556.364286,125.0,358.042857,41.678571,14.0,2.0,7.0,2.0,0.96116
938,1,34,237,1,1.0,0.0,0.0,0.0,0.0,1161.505714,...,10.0,8567.436429,0.0,84.909643,8477.5,14.0,0.0,1.0,6.0,0.869058


In [10]:
print( "y_train.shape, y_val.shape\n", y_train.shape, y_val.shape )
y_train.head(3)

y_train.shape, y_val.shape
 (870, 6) (97, 6)


Unnamed: 0,Sale_CL,Sale_CC,Sale_MF,Revenue_CL,Revenue_CC,Revenue_MF
382,0.0,1.0,0.0,0.0,4.035714,0.0
276,1.0,1.0,0.0,12.208214,4.928571,0.0
938,1.0,1.0,1.0,133.275357,2.679286,0.368214


## Section 2: Models Training

### Revenue regression models

In [11]:
models = {}
r2_scores = {}
rmse_scores = {}

# Separate revenue regression models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Revenue_{product}'
    model, r2, rmse, best_params, study = train_revenue_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_revenue"] = model
    r2_scores[product] = r2
    rmse_scores[product] = rmse
    print(f"{product} Revenue Model - R²: {r2:.3f}, RMSE: {rmse:.3f}")



[I 2025-07-14 16:35:33,031] A new study created in memory with name: no-name-83f1f4eb-9d4e-4c38-b482-e8263e133375
[I 2025-07-14 16:35:34,325] Trial 0 finished with value: -0.6070938285810743 and parameters: {'n_estimators': 414, 'max_depth': 9, 'learning_rate': 0.01383233476310583, 'subsample': 0.8652239936329307, 'colsample_bytree': 0.7011588724902725, 'reg_alpha': 0.11181576214614873, 'reg_lambda': 0.15398469774760712, 'min_child_weight': 6, 'gamma': 2.71380667275888}. Best is trial 0 with value: -0.6070938285810743.
[I 2025-07-14 16:35:34,595] Trial 1 finished with value: -0.4694028962992478 and parameters: {'n_estimators': 389, 'max_depth': 4, 'learning_rate': 0.05888594568128681, 'subsample': 0.7282769911404198, 'colsample_bytree': 0.9764800312869627, 'reg_alpha': 0.15311134629084394, 'reg_lambda': 0.4112069223701909, 'min_child_weight': 9, 'gamma': 1.600152981600505}. Best is trial 1 with value: -0.4694028962992478.
[I 2025-07-14 16:35:34,753] Trial 2 finished with value: -0.3469

CL Revenue Model - R²: -0.073, RMSE: 4.661


[I 2025-07-14 16:35:53,433] Trial 0 finished with value: -2.673450151968336 and parameters: {'n_estimators': 152, 'max_depth': 9, 'learning_rate': 0.09331031478632419, 'subsample': 0.8705751063236169, 'colsample_bytree': 0.8623381784073421, 'reg_alpha': 1.678234967378692, 'reg_lambda': 0.0037436390830313114, 'min_child_weight': 10, 'gamma': 0.255154019885927}. Best is trial 0 with value: -2.673450151968336.
[I 2025-07-14 16:35:53,892] Trial 1 finished with value: -4.964742198800264 and parameters: {'n_estimators': 242, 'max_depth': 8, 'learning_rate': 0.14441952595652036, 'subsample': 0.6679115047681929, 'colsample_bytree': 0.9554511877463545, 'reg_alpha': 0.2545205414114956, 'reg_lambda': 0.24591790994253523, 'min_child_weight': 8, 'gamma': 4.308602471937125}. Best is trial 0 with value: -2.673450151968336.
[I 2025-07-14 16:35:54,150] Trial 2 finished with value: -0.5211269492769726 and parameters: {'n_estimators': 333, 'max_depth': 3, 'learning_rate': 0.010152176367886179, 'subsample

KeyboardInterrupt: 

In [None]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Revenue Model - R²: {r2_scores[product]:.3f}, RMSE: {rmse_scores[product]:.3f}")

CL Revenue Model - R²: -0.103, RMSE: 4.725
CC Revenue Model - R²: -0.072, RMSE: 5.150
MF Revenue Model - R²: -0.004, RMSE: 8.585


### Sales classification models

In [None]:
f1_scores = {}

# Separate sales classification models for each product
for product in ['CL', 'CC', 'MF']:
    target_col = f'Sale_{product}'
    model, f1, roc_auc, best_params, study = train_sales_model_xgb_optuna_f1(X_train, X_val, y_train[target_col], y_val[target_col])
    # model, f1, roc_auc, best_params, study  = train_sales_model_xgb_optuna(X_train, X_val, y_train[target_col], y_val[target_col])

    models[f"{product}_sales"] = model
    f1_scores[product] = f1
    print(f"{product} Sales Model - f1: {f1:.3f}")

[I 2025-07-14 16:25:59,344] A new study created in memory with name: no-name-1ee94f84-307f-462f-bd15-299dc97d7731


[I 2025-07-14 16:26:00,380] Trial 0 finished with value: 0.7084848484848485 and parameters: {'n_estimators': 212, 'max_depth': 8, 'learning_rate': 0.05307410011344268, 'subsample': 0.6003271481766269, 'colsample_bytree': 0.8351825776402013, 'reg_alpha': 0.006167192285359413, 'reg_lambda': 0.23577058271329526, 'min_child_weight': 5, 'gamma': 0.031206250915823408}. Best is trial 0 with value: 0.7084848484848485.
[I 2025-07-14 16:26:00,566] Trial 1 finished with value: 0.6387878787878788 and parameters: {'n_estimators': 75, 'max_depth': 3, 'learning_rate': 0.011284315924655305, 'subsample': 0.9345713596945936, 'colsample_bytree': 0.8577094138819462, 'reg_alpha': 0.034481013025810865, 'reg_lambda': 0.28470073492394443, 'min_child_weight': 7, 'gamma': 1.020148312426978}. Best is trial 0 with value: 0.7084848484848485.
[I 2025-07-14 16:26:00,869] Trial 2 finished with value: 0.6436363636363636 and parameters: {'n_estimators': 82, 'max_depth': 8, 'learning_rate': 0.015623617532289255, 'subsam

ValueError: not enough values to unpack (expected 5, got 4)

In [None]:
for product in ['CL', 'CC', 'MF']:
    print(f"{product} Sales Model - f1: {f1_scores[product]:.3f}")

CL Sales Model - f1: 0.531
CC Sales Model - f1: 0.000
MF Sales Model - f1: 0.240


In [None]:
print( models.keys() )

dict_keys(['CL_revenue', 'CC_revenue', 'MF_revenue', 'CL_sales', 'CC_sales', 'MF_sales'])


## Section 3: Clients targeting

### Propensity Scoring

In [None]:
for product in ['CL', 'CC', 'MF']:
    test[f'p_{product.lower()}'] = predict_propensity(models[f"{product}_sales"] , test, feature_cols)

test[['p_cl', 'p_cc', 'p_mf']]

Unnamed: 0,p_cl,p_cc,p_mf
0,0.152659,0.197829,0.075403
6,0.360928,0.196101,0.269336
9,0.918397,0.236714,0.013921
10,0.272071,0.191208,0.066265
13,0.600007,0.209859,0.109999
...,...,...,...
1598,0.085567,0.195593,0.040127
1600,0.109731,0.229429,0.026092
1608,0.574173,0.251103,0.087774
1610,0.210739,0.216860,0.022769


### Predict Revenues

In [None]:
predicted_revenues_df = calculate_revenues(test, models)
predicted_revenues_df.head()


Unnamed: 0,Client,Sex,Age,Tenure,Count_CA,Count_SA,Count_MF,Count_OVD,Count_CC,Count_CL,...,Sale_MF,Sale_CC,Sale_CL,Revenue_MF,Revenue_CC,Revenue_CL,VolumeCredDebRatio,p_cl,p_cc,p_mf
0,909,1,21,27,1,,,1.0,,1.0,...,,,,4.498301,5.477213,4.58642,1.747104,0.152659,0.197829,0.075403
6,699,1,37,175,1,,4.0,1.0,,,...,,,,1.430461,4.320843,5.133866,1.560034,0.360928,0.196101,0.269336
9,528,0,19,70,1,,,1.0,,,...,,,,3.764956,5.688784,5.397694,1.114116,0.918397,0.236714,0.013921
10,1145,1,61,45,1,,,,,,...,,,,7.122893,4.786939,4.35214,30.084959,0.272071,0.191208,0.066265
13,517,0,41,28,1,,,,,,...,,,,3.46901,4.907645,5.383375,1.020149,0.600007,0.209859,0.109999


### Prepare list of clients to target

In [None]:
targets, forecast, df_targets = run_full_targeting_pipeline(predicted_revenues_df, top_frac=0.15)
print_targeting_summary(targets, forecast)

Stage 3,4: Assigning best offers...
Stage 5: Selecting top targets...
Stage 6: Calculating revenue forecast...

=== TARGETING SUMMARY ===
Total clients targeted: 96
Total expected revenue: $456.48
Average expected revenue per client: $2.22
Lift vs baseline targeting: 114.4%

Offer distribution:
  CL: 81 clients (84.4%)
  MF: 10 clients (10.4%)
  CC: 5 clients (5.2%)


In [None]:
targets.head()

Unnamed: 0,Client,Best_Offer,Expected_Revenue,Age,Tenure
967,265,MF,10.011842,26,179
933,1093,MF,9.37526,22,181
166,217,CL,8.155644,17,152
1408,498,CL,7.738518,21,176
958,1341,CL,6.692172,32,180


In [None]:
# save targets to targeted_clients.csv
df_targets[['Client', 'Best_Offer', 'Expected_Revenue']].to_csv('targeted_clients.csv', index=False)
