# Uplift Modeling and Causal Inference

<hr>

* <b>Uplift modeling</b> can help <b>Optimize Customer Targeting</b> through Online Ads, Email marketing with special offers, discounts and coupon codes. Many Businesses also use Uplift modeling to upsell their products. In this notebook, we are going to see how Causal Inference can help understand different customer segment behaviours, and how <b>Causality</b> can help optimize targeting customers.


* Generally speaking, Experiments need to be setup to capture for 'Causal Efects' and not Association. Establishing Causal relationships is not as easy as association. Understanding the direct <b>Cause & Effect</b> relationship can be tricky.


* Let us say we want to know the effect of Advertisement on Sale value. Hence, we split our customers into two segments (Control and Treatment). After running the ads for a few days, we now have the following: <b>Individual Effect of Exposure on Outcome.</b> We can also calculate the <b>Average Effect of Exposure on Outcome</b>. Please note that Y can take either Bernoulli or Continuous values. Hence, we can calculate the 'Lift' or 'Gain' as follows:


$$ E(Y/Treatment) - E(Y/Control) $$


<br>
However, can we really assume that the above equation captures <b>True Causal Lift</b> ? With the help on an example, let us understand the tru power of Uplift Modeling.

## Amazon Prime Movies Membership Example

Let us say add new set of movies to our movie streaming service, and want to market it to our customers. We want to market/ target our customers such that <b> ROI on Advertising Efforts are Maximized</b>. We have some of information about our previously targeted customers and their purchasing patterns. We know their subscription service attributes, demographics, watch times, personal preferences, ratings, comments and other rich and informative attributes.

## 1). Random Experiment Model:

In this model, we split our customer base into two (randomly, independent of X-features). We look at the difference Average Uplift of outcome (If customer saw the movie after Ad exposure or not). Hopefully, customers in treatment group have positive outcome (Y=1, as shown in <b>Shaded Region</b>). 


We also need to consider the fact that in both treatment and control groups, there are going to be cases of (Y-0, Y=1). Onto the right, we know which customer is being targeted along with their Actual Outcome. Hence, the gain in Sale value due to treatment is as follows:

<br>
$$Y = E(AverageSales/Treatment) - E(AverageSales/Control) $$

<img src="causal/img111.png">

<br>

## 2). Targeting using Propensity Modeling:

* As our goal is to maximize ROI on Investment, a better approach is to target customers based on their attributes we already have. <b>Propensity Modeling</b> can be really powerful in such cases. We can develop a <b>Probablistic Classification Model</b> based on mentioned demographic attributes, social/ referral attributes, rating, review and preferences, to predict the <b>probability of a given customer watching a new movie</b>.


* Let us say we have customer data for a previous product campaign. We believe that our new offering is very similar to what we did previously. hence, for previous treatment group, we have all features (X), treatment label (T) and outcome (Y). As shown below, we can use <b> Only Training Data</b> for Modeling, and use label across outcome variable <b>Y</b> (Binary Classification). 


* We then sort each customer by <b> Propensity to Purchase/ Watch Movie</b> and assign them <b>segments or Tiers</b>, based on the distribution of Propensity of Outcome.


<img src="causal/img112.png">


<br>


Targeting potential <b>High-Value</b> targets provides <b>better ROI</b>, when compared to random experiment model. Essentially, we are looking for customers with attributes who lead to a better conversion rate. However, we are still not in the <b> Universe of True Causality</b>. Using propensity modeling can be extremely effective, However, we are restricted to using only part of our customer base (We model for outcome only within the treatment group), as we want to know the <b> True Effect of Treatment Only</b>.

# Tending towards Causality

What we find from traditional methods mentioned above, is the <b> Average Effect of Treatment on Outcome</b>. What we would like to find exactly, is the <b> Average Treatment Effect: E(Y_treatment - Y_Control)</b>, as it compares what would happen if the same people were treated and not-treated at the same time. Hence, what we are trying to approximate before, is not truly Causal.

<br>
$$E(Y{^1} - Y{^0}) \neq E(Y/A=1) - E(Y/A=0)$$

In Uplift Modeling, we approximate to <b>causal form</b> of Experimental Design, rather than using <b>difference of Average Treatment and Control Effects.</b>

## Customer Segments

Customers come in different shapes and forms. Broadly speaking, we can segment customers into 4 groups (note that we might not have exactly 4 always).

1). <b>Malleables</b>: Customer in this segment are <b>Ideal targets</b>, and maximize our marketing ROI. These customers can truly be influenced to purchase/ consume our product.


2). <b>Positives</b>: Customers in this segment are <b>highly likely to purchase/ favourable to outcome, irrespective of Ad exposure</b>. We need not provide special offers or discounts to these customers (As it might detriment our profits). We need to provide <b>special offers or coupons to customers who would not have used the service without coupons.</b>


3). <b>Lost Causes</b>: Customers in this segment will <b>not purchase/ consume products irrespective of any type of Ad exposure.</b> We would rather not allocate funds to attract these customers as there is no chance for them to become potential customers in the near future.


4). <b> Do Not Disturb</b>: Customers in this segment <b> should not be disturbed/ bothered</b> with Ad exposure, Emails or notofications, as they are highly likely to <b> Stop using the product/ Service </b> if hit up with new promos/ offers. Subscription services tend to look at these customers very carefully to not send them alerts.

<hr>

Given that we know different customer segments, we <b> are not interested in Propensity of Outcome</b>. We are interested in <b> Maximizing Uplift across complete customer base.</b> Given that we also know the outcomes of customers from previous campaigns, we are looking to model the following:



<br>
<br>
$$ Uplift = P{_t} - P{_c} = P(Y=1/Treatment) - P(Y=1/Control) $$

<br>
However, note that for any given individual customer Ci, there is no way for us to target and not target him/her at the same time. Hence, we need to make some modifications when modeling. Our <b>Uplift Model</b> outputs are going to be sorted by the uplift value itself, and our assumption here is that the model also takes care of the <b> different customer segments</b>. Hence, Malleables would rank on top, followed by other groups, and Do Not Disturbs fall at the last.
    
<br>
<br>
<img src="causal/img113.png">

<br>
<br>
In Uplift Modeling, we also <b>leverage Control Group Data from previous campaigns</b>, which is an excellent advantage over previous models. As shown above, the final model prediction would comprise of rank-ordered customers (Ranked by decreasing order of Uplift). Hence, higher the order, <b> better ROI on customer</b>. In the next notebook, we are going to look at various Uplift Models suggested by literature.


### References:

1). Lo, Victor. 2002.
The True Lift Model - A Novel Data Mining Approach to Response Modeling in Database Marketing. SIGKDD Explorations. 4. 78-86.

2). Gutierrez, P., & Gérardy, J. Y.
Causal Inference and Uplift Modelling: A Review of the Literature. In International Conference on Predictive Applications and APIs (pp. 1-13).

3). Maciej Jaskowski and Szymon Jaroszewicz.
Uplift modeling for clinical trial data. ICML Workshop on Clinical Data Analysis, 2012.