* [About this Dataset](#about)
* [EDA](#eda)
* [Data preparation](#dataprep)
* [Modeling](#modeling)
    * [Meta Learners](#metalearners)
        * [T-Learners](#tlearners)
        * [S-Learners](#slearners)
    * [Uplift Trees](#utrees)
* [Conclusions](#conclusions)

# About this Dataset <a class="anchor" id="about"></a>

  The dataset was created by The Criteo AI Lab .The dataset consists of 13M rows, each one representing a user with 12 features, a treatment indicator and 2 binary labels (visits and conversions). Positive labels mean the user visited/converted on the advertiser website during the test period (2 weeks). The global treatment ratio is 84.6%. It is usual that advertisers keep only a small control population as it costs them in potential revenue.  
  
Following is a detailed description of the features:  
  
- f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
- treatment: treatment group (1 = treated, 0 = control)
- conversion: whether a conversion occured for this user (binary, label)
- visit: whether a visit occured for this user (binary, label)
- exposure: treatment effect, whether the user has been effectively exposed (binary)

In [3]:
!pip install causalml

Collecting causalml
  Downloading causalml-0.15.2.tar.gz (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m00:01[0m0:01[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting forestci==0.6
  Downloading forestci-0.6-py3-none-any.whl (12 kB)
Collecting pygam
  Downloading pygam-0.8.0-py2.py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m
Collecting pydotplus
  Downloading pydotplus-2.0.2.tar.gz (278 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m278.7/278.7 kB[0m [31m23.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: causalml, pydotplus
  Building wheel for causalml (pyproject.toml) ... [?25l

In [4]:
!pip install scikit-uplift


Collecting scikit-uplift
  Downloading scikit_uplift-0.5.1-py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.1/42.1 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: scikit-uplift
Successfully installed scikit-uplift-0.5.1
[0m

In [5]:
%pip install scikit-learn

[0mNote: you may need to restart the kernel to use updated packages.


In [7]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
# from sklift.metrics import uplift_at_k, uplift_auc_score, qini_auc_score, weighted_average_uplift
# from sklift.viz import plot_uplift_preds
# from sklift.models import SoloModel, TwoModels
from sklearn.base import clone
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from causalml.inference.tree import UpliftRandomForestClassifier, UpliftTreeClassifier
from causalml.inference.tree import uplift_tree_string, uplift_tree_plot
from IPython.display import Image

In [9]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import shap
def prepare_data(df, feature_cols, treatment_col='treatment', target_col='conversion', 
                 test_size=0.2, random_state=42):
    """
    Prepares data for uplift modeling with binary treatment/control setup
    """
    # Split data into features, treatment and target
    X = df[feature_cols]
    y = df[target_col]
    t = df[treatment_col]

    # Train-test split with stratification on treatment
    X_train, X_test, t_train, t_test, y_train, y_test = train_test_split(
        X, t, y, test_size=test_size, random_state=random_state, stratify=t
    )

    return {
        't_train': t_train, 't_test': t_test,'X_train': X_train, 'X_test': X_test,'y_train': y_train, 'y_test': y_test
    }

# Example usage
feature_cols = [f'f{i}' for i in range(12)]
df = pd.read_csv('../input/uplift-modeling/criteo-uplift-v2.1.csv')
data = prepare_data(df, feature_cols)

# Access prepared data
X_train = data['X_train']
X_test = data['X_test']
t_train = data['t_train']
t_test = data['t_test']
y_train = data['y_train']
y_test = data['y_test']


In [17]:
from causalml.inference.meta import BaseSRegressor
from causalml.inference.tree import UpliftRandomForestClassifier
from sklearn.multioutput import MultiOutputRegressor
from xgboost import XGBRegressor

# Initialize and train S-Learner
learner_s = BaseSRegressor(learner=XGBRegressor(random_state=42))
learner_s_result = learner_s.fit_predict(X=X_train, treatment=t_train, y=y_train)

print(learner_s_result)


[[8.54994287e-05]
 [1.00145233e-04]
 [3.63811385e-03]
 ...
 [1.30322325e-04]
 [8.54995669e-05]
 [1.00145233e-04]]


# EDA <a class="anchor" id="eda"></a>

In [None]:
from sklearn.neural_network import MLPRegressor
from causalml.inference.meta import BaseSRegressor

# Define the neural network regressor
nn = MLPRegressor(hidden_layer_sizes=(35, 25, 10, 5),
                  learning_rate_init=0.01,
                  early_stopping=True,
                  random_state=1)

# Plug it into the S-Learner
learner_s = BaseSRegressor(learner=nn)
learner_s_result = learner_s.fit_predict(X=X_train, treatment=t_train, y=y_train)


In [None]:
# ITE
nn_ite = nn.fit_predict(X_train, t_train, y_train)

In [None]:
# Feature importance using permutation
nn.get_importance(X=X_train, tau=nn_ite, method='permutation', features=feature_names, random_state=42)

Looks like the most effective uplift method for this dataset (from tested here) is uplift tree with overall uplift 0.03 (0.031 for top 30%). Among meta learners S-Learners perform better than T-Learners (with a S-Learner based on LGBMClassifier as a leader).