In [1]:
pip install scikit-uplift

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklift.models import SoloModel

# Load balanced dataset

path = "/Users/rohityadav/Desktop/Git Projects/ml-uplift-modeling-criteo/data/criteo-uplift-v2.1-100K-balanced.csv"
df = pd.read_csv(path)

# Define X,Y and T

X = df.drop(['treatment', 'visit', 'exposure', 'conversion'], axis = 1)
Y = df['visit'].astype(int)
T = df['treatment'].astype(int)

In [3]:
# Train/Test Split 

X_train, X_test, T_train, T_test, Y_train, Y_test = train_test_split(
    X, T, Y,
    test_size = 0.3,
    stratify = T,
    random_state = 42
)

print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
print('Treatment ratio train:', T_train.mean())
print('Treatment ratio test:', T_test.mean())

X_train shape: (70000, 12)
X_test shape: (30000, 12)
Treatment ratio train: 0.5
Treatment ratio test: 0.5


In [7]:
# Conservative hyperparameters to avoid overfitting

# Wrap RF in an uplift framework

uplift_rf = SoloModel(RandomForestClassifier(
    n_estimators=100,
    max_depth =  6,
    min_samples_leaf = 200,
    random_state = 42
))

uplift_rf.fit(X_train, Y_train, T_train)

In [5]:
# Predict Uplift on test

uplift_pred = uplift_rf.predict(X_test)

print('\nThe uplift prediction (test) stats are as follows:\n')

print('Mean:', float(np.mean(uplift_pred)))
print('Min:', float(np.min(uplift_pred)))
print('Max:', float(np.max(uplift_pred)))
print('% Positive:', float(np.mean(uplift_pred > 0)))
print('% Negative:', float(np.mean(uplift_pred < 0)))


The uplift prediction (test) stats are as follows:

Mean: 0.0014801034348661934
Min: -0.004999530134076946
Max: 0.030027518140116893
% Positive: 0.9285
% Negative: 0.0715


**Mean uplift is small but positive**
→ That’s realistic. In ad-tech, average treatment effects are tiny. Profit comes from ranking, not magnitude.

**Range is narrow**
→ Uplift RF is conservative by design (especially with min_samples_leaf=200).
→ This is expected and healthy.

**Negative uplift exists (~7%)**
→ These are your do-not-disturb users.
→ If this were 0%, the model would be suspicious.

**Most users have slight positive uplift**
→ Typical in ad exposure datasets: many users are weakly persuadable.

In [6]:
# Reality Check

k = 5000
top_idx = np.argsort(- uplift_pred)[:k]

top_T = T_test.iloc[top_idx].to_numpy()
top_Y = Y_test.iloc[top_idx].to_numpy()

rate_treated = top_Y[top_T ==  1].mean()
rate_control = top_Y[top_T ==  0].mean()

print(f'\nTop {k} users by predicted uplift:')
print('Treated visit rate:', rate_treated)
print('Control visit rate:', rate_control)
print('Observed lift(treated - control):', rate_treated - rate_control)


Top 5000 users by predicted uplift:
Treated visit rate: 0.21659524737047137
Control visit rate: 0.16276202219482122
Observed lift(treated - control): 0.053833225175650146


In the top-ranked group:

→ **Treated users click much more**

→ **Control users click much less**

The observed difference is positive and large

This confirms:
→ **The model is ranking users correctly by treatment effect, not just predicting clicks.**

This is exactly what uplift modeling is supposed to do.

**If this had been ~0 or negative → model failure.**
**We have got +5.38% absolute lift → strong signal.**