## Introduction 

This notebook will go over optimization of a single response variable using the IbottaUplift Framework. It will go over:

1. The data generating Process

2. Building / Gridsearching an uplift model 

3. Evaluating Model with out-of-sample ERUPT metric

4. Assigning Optimal Treatments for new observations 

In [1]:
import numpy as np
import pandas as pd

from dataset.data_simulation import get_simple_uplift_data
from ibotta_uplift.ibotta_uplift import IbottaUplift
from ggplot import *

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
You can access Timestamp as pandas.Timestamp
  pd.tslib.Timestamp,
  from pandas.lib import Timestamp
  from pandas.core import datetools


### Data Generating Process 

Imagine we are data scientists working for a startup that would like to be more profitibile. As a tactic to increase user activity the company gives all users a potentially expensive treatment. In order to reduce costs we were assigned the task of using data to find a subset of users that should continue receiving the costly treatment. 

We are given some explanatory variables for users $x$, a random treatment of whether a users recieved marketing campaign or not $t$, and response variable of profitibility $y$. 

We can use uplift models and the IbottaUplift to specifically find users who should receive this treatment.

Below is the data generating process of the data we have. 

$$x_1  ∼ runif(0,1)$$
$$x_2 ∼ runif(0,1)$$
$$e_1 ∼ rnorm(0,1)$$
$$e_2 ∼ rnorm(0,1)$$
$$noise ∼ rnorm(0,1)$$

$$t ∼ rbinom(.5)$$

$$revenue = x_1*t + e_1$$
$$costs = x_2*t + e_2$$
$$profit = revenue - costs $$

In [2]:
y, x, t = get_simple_uplift_data(10000)

y = pd.DataFrame(y)
y.columns = ['revenue','cost', 'noise']
y['profit'] = y['revenue'] - y['cost']

### Model Building / Gridsearch
After instantiating the IbottaUplift class the `.fit` function will build the model. It first seperates the data into a train / test split. It builds standard scaler transformerd on all variables $x, y, t$.

Then it builds and runs grisdesarch using keras neural network model that minimizes the mean squared error of the form $y = f(t,x)$. The user can input a custom parameter grid. 


In [3]:
uplift_model = IbottaUplift()
param_grid = dict(num_nodes=[8], dropout=[.1, .5], activation=[
                          'relu'], num_layers=[1, 2], epochs=[25], batch_size=[30])


uplift_model.fit(x, y[['profit']], t.reshape(-1,1), param_grid = param_grid, n_jobs = 1)



### Evaluating Model with out-of-sample ERUPT metric
Using the test dataset IbottaUplift will then evaluate the model using the ERUPT metric. This functionality gives the model builder insight into whether or not the model performs well out of sample. 

It outputs two dataframes:

1) The first shows the ERUPT metric and standard deviation for the model assignment. In this example it tells us the expected profit if we were to use this model. In addition we can also see a 'random' row under the assignment column. This uses the same distribution for ERUPT but shuffles the treatments so as to make it a random assignment. 

Below we can see that the model performs much better than the randomized treatments suggesting the model learned the heterogenity of the treatment effects well. If we deployed the model we expect to see profit to be ~ 0.16.

2) The second pandas df shows the distribution of treatments under the optimal assignment. In this example we can see about half are assigned the treatment and half are not. 




In [4]:
erupt_curves, dists = uplift_model.get_erupt_curves()
erupt_curves



Unnamed: 0,mean,std,response_var_names,weights,assignment
0,0.162933,0.004572,profit,1,model
0,0.00892,0.005465,profit,1,random


In [5]:
dists

Unnamed: 0,num_observations,tmt,weights,percent_tmt
0,3545,1,1,0.493571
1,3455,0,1,0.506429


### Assigning Optimal Treatments for New Observations
After building and evaluating an uplift model the modeler may deem it worthy of production. To assign new users the optimal treatment one can use the `predict_optimal_treatments` function as shown below.



In [6]:
#generate 5 new observation
_, x_new ,_  = get_simple_uplift_data(5)
uplift_model.predict_optimal_treatments(x_new)



array([[0],
       [1],
       [1],
       [0],
       [1]])