<a href="https://colab.research.google.com/github/SuhaniShah008/Uplift-Model/blob/main/Uplift_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

###Uplift Modeling

Step-by-Step Approach
1. Load and Explore the Data
- The dataset contains customer demographics, past purchase behavior, and a treatment column (1 if the customer received the campaign, 0 otherwise).
- The target variable response indicates whether the customer made a purchase.


2. Feature Engineering & Preprocessing
- Convert categorical variables into numerical format.
- Scale or normalize numerical features if necessary.

3. Train an Uplift Model
- Createe a treatment group anmd a the control group
- The uplift score is the difference in predicted probability between these two models.

4. Evaluate Uplift Effectiveness
- Measure performance using Qini curves, uplift at different deciles, or KL divergence.

###Key Takeaways
1. Why is this useful?

The uplift model identifies customers who are positively influenced by marketing campaigns.
Avoids wasting marketing resources on customers who would buy anyway or who react negatively.


2. Extensions we can try

Meta-Learners (T-Learner, S-Learner, X-Learner).
Causal ML libraries like causalml or econml.

In [None]:
pip install datasets




In [None]:
pip install scikit-uplift




In [None]:
'''
The Criterio dataset is 297 MB while still in zip form, but the dataset is
also hosted on Hugging Face, which makes convenient to load directly into
Python using the datasets library. That what we want to do.'''

from sklift.datasets import fetch_criteo

# Fetch the dataset
data, target, treatment = fetch_criteo(return_X_y_t=True)

# Display the first few rows
print(data.head())





          f0         f1        f2        f3         f4        f5        f6  \
0  12.616365  10.059654  8.976429  4.679882  10.280525  4.115453  0.294443   
1  12.616365  10.059654  9.002689  4.679882  10.280525  4.115453  0.294443   
2  12.616365  10.059654  8.964775  4.679882  10.280525  4.115453  0.294443   
3  12.616365  10.059654  9.002801  4.679882  10.280525  4.115453  0.294443   
4  12.616365  10.059654  9.037999  4.679882  10.280525  4.115453  0.294443   

         f7        f8         f9       f10       f11  
0  4.833815  3.955396  13.190056  5.300375 -0.168679  
1  4.833815  3.955396  13.190056  5.300375 -0.168679  
2  4.833815  3.955396  13.190056  5.300375 -0.168679  
3  4.833815  3.955396  13.190056  5.300375 -0.168679  
4  4.833815  3.955396  13.190056  5.300375 -0.168679  


In [None]:
df = data.copy()
df['response'] = target
df['treatment'] = treatment

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

# Train/Test Split
train, test = train_test_split(df, test_size=0.3, random_state=42)

# Train separate models for treatment and control groups
def train_group_model(df, treatment_value):
    subset = df[df['treatment'] == treatment_value]
    X = subset.drop(columns=['response', 'treatment'])
    y = subset['response']
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X, y)
    return model

model_treatment = train_group_model(train, treatment_value=1)
model_control = train_group_model(train, treatment_value=0)

# Predict uplift
X_test = test.drop(columns=['response', 'treatment'])
test['pred_treatment'] = model_treatment.predict_proba(X_test)[:, 1]
test['pred_control'] = model_control.predict_proba(X_test)[:, 1]
test['uplift'] = test['pred_treatment'] - test['pred_control']

# Evaluate uplift model performance
def uplift_at_percentile(df, percentile=0.1):
    top_n = int(len(df) * percentile)
    top_customers = df.nlargest(top_n, 'uplift')
    return top_customers['response'].mean()

uplift_top_10 = uplift_at_percentile(test, 0.1)
uplift_top_20 = uplift_at_percentile(test, 0.2)

print(f"Uplift in top 10%: {uplift_top_10:.4f}")
print(f"Uplift in top 20%: {uplift_top_20:.4f}")

# Plot Qini Curve
import matplotlib.pyplot as plt

test_sorted = test.sort_values(by='uplift', ascending=False)
qini_curve = np.cumsum(test_sorted['response']) / np.arange(1, len(test_sorted) + 1)

plt.plot(qini_curve, label="Uplift Model")
plt.axhline(y=test['response'].mean(), color='r', linestyle='--', label="Random Targeting")
plt.xlabel("Customers Sorted by Uplift Score")
plt.ylabel("Cumulative Response Rate")
plt.title("Qini Curve")
plt.legend()
plt.show()
