## Uplift Modeling

Uplift modelling, also known as **incremental modelling, true lift modelling, or net modelling** is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual's behaviour.

Uplift modelling has applications in customer relationship management for up-sell, cross-sell and retention modelling. It has also been applied to political election and personalised medicine. Unlike the related Differential Prediction concept in psychology, Uplift Modelling assumes an active agent.

One of the most critical jobs of Growth Hacker is to be efficient by all means as much as possible. First of all, you need to be time-efficient. That means you have to quickly ideate, experiment, learn and re-iterate. Second, you need to be cost-efficient. It means bringing the maximum return for a given budget/time/effort.

Uplift modelling uses a randomised scientific control to not only measure the effectiveness of an action but also to build a predictive model that predicts the incremental response to the action. It is a data mining technique that has been applied predominantly in the financial services, telecommunications and retail direct marketing industries to up-sell, cross-sell, churn and retention activities.

Segmentation helps Growth Hackers to increase conversion and hence be cost-efficient.

We can summarize the segments based on this approach like below:

- Treatment Responders: Customers that will purchase only if they receive an offer
- Treatment Non-Responders: Customer that won’t purchase in any case
- Control Responders: Customers that will purchase without an offer
- Control Non-Responders: Customers that will not purchase if they don’t receive an offer

The picture is very obvious. You need to target Treatment Responders (TR) and Control Non-Responders (CN). Since they won’t purchase unless you give an offer, these groups are boosting your uplift in promotional campaigns. On the other hand, you need to avoid targeting Treatment Non-Responders (TN) and Control Responders (CR). You will not benefit from targeting TN and, CN will make you cannibalize.

Uplift Modeling has two simple steps:
- Predict the probabilities of being in each group for all customers: we are going to build a multi-classification model for that.
- We will calculate the uplift score. Uplift score formula is:

![medium_image](https://miro.medium.com/max/3000/1*AvNi0acKGyCs7jYsqcfl_w.png)

There are several apporaches for uplift modeling:

    - Traditional one model approach: This method predicts only customers who received promotions and completed the offer or purchased. The issue with this approach is that it discards the control group data.
    
    - Two model Uplift Approach: create 2 models to predict the purchases from the treatment and the control group respectively. The difference in the predicted probabilities of these models represents the lift value which can be used as a cutoff point for the customers responsive to promotions. The issue with this model is that lifts indirectly.
    
    - Four quadrant approach: split the customers into 4 Groups(treatment response: TR, treatment no response TN, control response CR, control no response CN). then build a model to predict the label the customer belongs to. The promotions can be either sent to only TR(sure thing to respond to promotion) or we can create an uplift score to divide the customer based on the probability of belonging to certain groups and use cut off to decide when to send a promotion or no. 
    
**NIR(Net Incremental Revenue):** NIR depicts how much is made (or lost) by sending out the promotion. Mathematically, this is 10 times the total number of purchasers that received the promotion minus 0.15 times the number of promotions sent out, minus 10 times the number of purchasers who were not given the promotion. **NIR = Uplift Score mentioned above.**

We will sum up the probability of being TR and CN and subtract the probability of falling into other buckets. The higher score means higher uplift.

[Source : Medium](https://towardsdatascience.com/uplift-modeling-e38f96b1ef60)


### Creating different datasets for different offer types.

*Since, uplift modeling determines how a targeted ads and offers work on each customer, we should split datasets into multiple with factor of offer type.*

In [43]:
import pandas as pd
import numpy as np

%matplotlib inline

In [44]:
customer_transactions = pd.read_csv('data/customer_transaction_data_categorical.csv')

In [45]:
# converting became_member on to datetime
customer_transactions.became_member_on = pd.to_datetime(customer_transactions.became_member_on)

In [46]:
customer_transactions.sample(5)

Unnamed: 0,customer_id,time_received,offer_id,reward,age,became_member_on,income,difficulty,duration,offer_type,...,revenue_cluster,frequency,frequency_cluster,overall_score,gender_female,gender_male,gender_others,RFM_segment_Low-Value,RFM_segment_Mid-Value,RFM_segment_High-Value
58208,e4a75f2ce99b45e78d8b42bfb31b2ef4,21,2298d6c36e964ae4a3e7e9706d1fb8c2,3.0,93,2016-11-12,115000.0,7.0,7.0,discount,...,2,8,3,8,1,0,0,1,0,0
37045,914b1197292542a8abfb958fd7b4efd4,17,5a8bc65990b245e5a138643cd4eb9837,0.0,27,2013-09-08,34000.0,0.0,3.0,informational,...,4,16,1,9,0,1,0,0,1,0
42235,a4d629a11c0746c69c01f660286dea7e,17,9b98b8c7a33c4b65b9aebfe6a799e6d9,5.0,62,2018-07-03,72000.0,5.0,7.0,bogo,...,3,5,3,9,1,0,0,0,1,0
7172,1d36a7ddbd114574a804c5c4eb2ecb4d,14,f19421c1d4aa40978ebb69ca19b0e20d,5.0,54,2016-12-05,104000.0,5.0,5.0,bogo,...,3,5,3,8,1,0,0,1,0,0
53223,d1221180e99a46acb2841673afd34c5e,7,3f207df678b143eea3cee63160fa8bed,0.0,65,2015-04-21,47000.0,0.0,4.0,informational,...,4,5,3,9,0,1,0,0,1,0


In [47]:
customer_transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 65020 entries, 0 to 65019
Data columns (total 46 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   customer_id                  65020 non-null  object        
 1   time_received                65020 non-null  int64         
 2   offer_id                     65020 non-null  object        
 3   reward                       65020 non-null  float64       
 4   age                          65020 non-null  int64         
 5   became_member_on             65020 non-null  datetime64[ns]
 6   income                       65020 non-null  float64       
 7   difficulty                   65020 non-null  float64       
 8   duration                     65020 non-null  float64       
 9   offer_type                   65020 non-null  object        
 10  email                        65020 non-null  float64       
 11  mobile                       65020 non-nu

#### Splitting with offer types and offer ids.

In [48]:
customer_transactions.offer_type.unique()

array(['discount', 'informational', 'bogo'], dtype=object)

In [49]:
customer_transactions.offer_id.unique()

array(['2906b810c7d4411798c6938adc9daaa5',
       '3f207df678b143eea3cee63160fa8bed',
       '5a8bc65990b245e5a138643cd4eb9837',
       'f19421c1d4aa40978ebb69ca19b0e20d',
       'fafdcd668e3743c1bb461111dcafc2a4',
       '0b1e1539f2cc45b7b9fa7c272da2e1d7',
       '2298d6c36e964ae4a3e7e9706d1fb8c2',
       '9b98b8c7a33c4b65b9aebfe6a799e6d9',
       '4d5c57ea9a6940dd891ad53e9dbe8da0',
       'ae264e3637204a6fb9bb56bc8210ddfd'], dtype=object)

In [50]:
# offer_ids with bogo
customer_transactions[customer_transactions.offer_type == 'bogo'].offer_id.unique()

array(['f19421c1d4aa40978ebb69ca19b0e20d',
       '9b98b8c7a33c4b65b9aebfe6a799e6d9',
       '4d5c57ea9a6940dd891ad53e9dbe8da0',
       'ae264e3637204a6fb9bb56bc8210ddfd'], dtype=object)

In [51]:
# offer_ids with informational
customer_transactions[customer_transactions.offer_type == 'informational'].offer_id.unique()

array(['3f207df678b143eea3cee63160fa8bed',
       '5a8bc65990b245e5a138643cd4eb9837'], dtype=object)

In [52]:
# offer_ids with discount
customer_transactions[customer_transactions.offer_type == 'discount'].offer_id.unique()

array(['2906b810c7d4411798c6938adc9daaa5',
       'fafdcd668e3743c1bb461111dcafc2a4',
       '0b1e1539f2cc45b7b9fa7c272da2e1d7',
       '2298d6c36e964ae4a3e7e9706d1fb8c2'], dtype=object)

**We need a way to distinguish between multiple bogo offer id, Since each offer id have different difficulty, duration and reward, we can use this information.**

In [61]:
# removing reward = 0 because it will show records who made transaction but didn't get reward.
customer_transactions[customer_transactions.reward != 0][['offer_id', 'offer_type', 'difficulty', 'duration', 'reward']].drop_duplicates('offer_id').sort_values('offer_type').reset_index(drop = True)

Unnamed: 0,offer_id,offer_type,difficulty,duration,reward
0,f19421c1d4aa40978ebb69ca19b0e20d,bogo,5.0,5.0,5.0
1,9b98b8c7a33c4b65b9aebfe6a799e6d9,bogo,5.0,7.0,5.0
2,4d5c57ea9a6940dd891ad53e9dbe8da0,bogo,10.0,5.0,10.0
3,ae264e3637204a6fb9bb56bc8210ddfd,bogo,10.0,7.0,10.0
4,2906b810c7d4411798c6938adc9daaa5,discount,10.0,7.0,2.0
5,fafdcd668e3743c1bb461111dcafc2a4,discount,10.0,10.0,2.0
6,0b1e1539f2cc45b7b9fa7c272da2e1d7,discount,20.0,10.0,5.0
7,2298d6c36e964ae4a3e7e9706d1fb8c2,discount,7.0,7.0,3.0


In [63]:
customer_transactions[customer_transactions.offer_type == 'informational'][['offer_id', 'offer_type', 'difficulty', 'duration', 'reward']].drop_duplicates('offer_id').sort_values('offer_type').reset_index(drop = True)

Unnamed: 0,offer_id,offer_type,difficulty,duration,reward
0,3f207df678b143eea3cee63160fa8bed,informational,0.0,4.0,0.0
1,5a8bc65990b245e5a138643cd4eb9837,informational,0.0,3.0,0.0
