## Introduction 

This notebook will go over optimization of a single response variable using the MRUplift Framework. It will go over:

1. The Business Problem and Data Generating Process

2. Building / Gridsearching an uplift model 

3. Evaluating Model with out-of-sample ERUPT metric

4. Assigning Optimal Treatments for new observations 

In [2]:
import numpy as np
import pandas as pd

from mr_uplift.dataset.data_simulation import get_simple_uplift_data
from mr_uplift.mr_uplift import MRUplift
#from ggplot import *

### Business Problem

Imagine we are data scientists working for a startup that would like to be more profitibile. As a tactic to increase user activity the company gives all users an expensive bonus (referred to as the treatment). In order to reduce costs we were assigned the task of using data to find a subset of users that should continue receiving the costly treatment. 

We are given explanatory variables for users $X$, a random treatment of whether a users recieved the treatment or not $T$, and response variable of profitibility $y$. 

We can use uplift models and the IbottaUplift package specifically to find users
who should receive the treatment.

### Uplift Problem Setup
The general setup for a lift model is:
    
$y$: Response variable of interest you’d like to maximize. Here it is profitibility.

$X$: User level covariates. Includes things like previous activity per user.

$T$: The randomly assigned treatment. In this case it is whether or not to give a bonus to a particular user and is binary. Assume that the distribution and assignment of a treatment is uniform and random.

With the data $(y, X, T)$ the goal is to build a treatment assignment policy 𝜋(x) that will use $X$ to assign $T$ that maximizes the value of $y$. Or in this case we want to use user history to assign whether to give a bonus to a user in order to maximize profit.

A frequent practice is to model the expected outcome $y_i$ under different treatments and choose the treatment $T$ that maximizes $y_i$ for each user.


\begin{equation}
 \pi(x_i) =argmax \:_{t \in T} E[y_i | X=x_i, T=t]
\end{equation}


There are several approaches to do this and can be done with a run of the mill ML algorithm that incorporates interactions. IbottaUplift uses a neural network. 

To get the counterfactual for each treatment one needs to predict with different values of $t$. This calculation is closely related to to creating an [ICE](https://arxiv.org/pdf/1309.6392.pdf) plot with the treatment variable.

### Data Generating Process 


Below is the data generating process of the data we are given. 

\begin{equation}
x_1  \sim runif(0,1)
\end{equation}
\begin{equation}
x_2 \sim runif(0,1)
\end{equation}
\begin{equation}
e_1 \sim rnorm(0,1)
\end{equation}
\begin{equation}
e_2 \sim rnorm(0,1)
\end{equation}
\begin{equation}
t \sim rbinom(.5)
\end{equation}

\begin{equation}
revenue = x_1*t + e_1
\end{equation}
\begin{equation}
costs = x_2*t + e_2
\end{equation}
\begin{equation}
profit = revenue - costs
\end{equation}

(In this problem we are interested in only the response variable $profit$)

In [3]:
y, x, t = get_simple_uplift_data(10000)

y = pd.DataFrame(y)
y.columns = ['revenue','cost', 'noise']
y['profit'] = y['revenue'] - y['cost']

### Model Building / Gridsearch
After instantiating the MRUplift class the `.fit` function will build the model. It first seperates the data into a train / test split and builds standard scaler transformerd on all variables $x, y, t$.

Then it builds and runs grisdesarch using neural network model that minimizes the mean squared error of the form $y = f(t,x)$. The user can input a custom parameter grid. 


In [4]:
uplift_model = MRUplift()
param_grid = dict(num_nodes=[8], dropout=[.1, .5], activation=[
                          'relu'], num_layers=[1, 2], epochs=[25], batch_size=[30])


uplift_model.fit(x, y[['profit']], t.reshape(-1,1), param_grid = param_grid, n_jobs = 1)

### Expected Response Under Proposed Treatments (ERUPT) Metric

After gridsearching we want to know how much better the model is than the current state. Evaluating the out-of-sample importance is key to ensure the model will perform as intended in production. While there are other metrics such as the Qini metric they are usually limited to single treatment case. ERUPT is the only metric I'm aware of that can be applied to multiple treatments and provides an unbiased estimate what would happen if the model were applied.

#### ERUPT
Suppose you have an observation where 𝜋(x) proposes a treatment of not giving bonus and the randomly assigned treatment was given a bonus. Since these do not align it’s not clear we can say anything about it.

However, if the optimal treatment for a model is equal to the assigned treatment we can include that observation in our proposed treatment examples. We go through this exercise for all observations and calculate the response mean for only those where the 𝜋(x) = assigned treatment. This is our estimated value of y under the model! Mathematically it is:

$$\frac{\sum_i y_i I(\pi(x_i) = t_i)} {\sum_i I(\pi(x_i)=t_i)}$$

Note that this formula assumes the treatments distirbution is uniform (same number for each treatment) and randomly assigned. The functionality in this package does not require uniform treatments but does require them to be randomly assigned.

For further information please my blog post [here](https://medium.com/building-ibotta/erupt-expected-response-under-proposed-treatments-ff7dd45c84b4).

### Evaluating Model with out-of-sample ERUPT metric
Using the test dataset MRUplift will then evaluate the model using the ERUPT metric. This functionality gives the model builder insight into whether or not the model performs well out of sample. 

It outputs two dataframes:

1) The first dataframe shows the ERUPT metric and standard deviation for the model assignment. In this example it tells us the expected profit if we were to use this model. In addition we can also see a 'random' row under the assignment column. This uses the same distribution as $\pi(x)$ but shuffles the treatments so as to make it a random assignment. Looking at the difference between the Model and Random assignments should tell us if the model is learning the individual treatment effects well. 

Below we can see that the model performs much better than the randomized treatments suggesting the model learned the heterogenity of the treatment effects well. If we deployed the model we expect to see profit to be ~ 0.16.

2) The dataframe shows the distribution of treatments under the optimal assignment. In this example we can see about half are assigned the treatment and half are not. 




In [5]:
erupt_curves, dists = uplift_model.get_erupt_curves()
erupt_curves

Using Test Data Set


Unnamed: 0,mean,std,response_var_names,weights,assignment,treatment
0,0.161196,0.004519,profit,1,model,-1.0
0,0.002681,0.00542,profit,1,random,-1.0
0,-0.002317,0.002408,profit,-1,ate,0.0
0,0.001036,0.007196,profit,-1,ate,1.0


In [6]:
dists

Unnamed: 0,num_observations,tmt,weights,percent_tmt
0,3519,1.0,1,0.502714
1,3481,0.0,1,0.497286


### Assigning Optimal Treatments for New Observations
After building and evaluating an uplift model the modeler may deem it worthy of production. To assign new users the optimal treatment one can use the `predict_optimal_treatments` function as shown below.



In [6]:
#generate 5 new observation
_, x_new ,_  = get_simple_uplift_data(5)
uplift_model.predict_optimal_treatments(x_new)



array([[0],
       [1],
       [0],
       [1],
       [0]])