In [2]:
cd ..


/Users/samweiss/src/ibotta_uplift


## Introduction 

This notebook will go over optimization of a single response variable using the IbottaUplift Framework. It will go over:

1. The data generating Process

2. Building / Gridsearching an uplift model 

3. Evaluating Model with out-of-sample ERUPT metric

4. Assigning Optimal Treatments for new observations 

In [20]:
import numpy as np
import pandas as pd

from dataset.data_simulation import get_simple_uplift_data
from ibotta_uplift.ibotta_uplift import IbottaUplift
from ggplot import *


### Data Generating Process 

Imagine we are running a potentially costly marketing campaign to users. We are interested in maximizing profitibility. A user reacts to the treatment $t$ dependant on his covariates. Dependding on his covariates he will increase revenue and increase costs a varying degree.


$$x_1  \tilde {runif(0,1)}$$
$$x_2 ~ runif(0,1)$$
$$e_1 ~ rnorm(0,1)$$
$$e_2 ~ rnorm(0,1)$$
$$t~rbinom(.5)$$

$$revenue = x_1*t + e_1$$
$$costs = x_2*t + e_2$$

$$profit = revenue - costs $$



In [21]:
y, x, t = get_simple_uplift_data(10000)

y = pd.DataFrame(y)
y.columns = ['revenue','cost', 'noise']
y['profit'] = y['revenue'] - y['cost']

### Model Building / Gridsearch
After instantiating the IbottaUplift class the `.fit` function will build the model.

This builds and runs grisdesarch for a keras neural network model minimizing mean squared error. The user can input customer parameters as necessary. 

This function will apply transformers to all variables $x,y, t$ so they are mean 0 and standard deviation 1 (I have found this useful to train nnets).

In [22]:
uplift_model = IbottaUplift()
param_grid = dict(num_nodes=[8], dropout=[.1, .5], activation=[
                          'relu'], num_layers=[1, 2], epochs=[25], batch_size=[30])


uplift_model.fit(x, y[['profit']], t.reshape(-1,1), param_grid = param_grid, n_jobs = 1)



### Evaluating Model with out-of-sample ERUPT metric
Using the test dataset IbottaUplift will then evaluate the model using the ERUPT metricThis . funcitonality gives the model builder insight into whether or not the model performs well out of sample. 

It outputs two dataframes:

1) The first show the ERUPT metric and standard deviation for the model assignment. In this example it tells us the expected profit if we were to use this model. In addition we can also see a 'random' row under the assignment column. This uses the same distribution for ERUPT but shuffles the treatments so as to make it a random assignment. 

Below we can see that the model performs much better than the randomized treatments suggesting the model learned the heterogenity of the treatment effects well.

2) The second pandas df shows the distribution of treatments under the optimal assignment. In this example we can see about half are assigned the treatment and half are not. 




In [23]:
erupt_curves, dists = uplift_model.get_erupt_curves()
erupt_curves



Unnamed: 0,mean,std,response_var_names,weights,assignment
0,0.16116,0.004676,profit,1,model
0,-0.00052,0.005282,profit,1,random


In [24]:
dists

Unnamed: 0,num_observations,tmt,weights,percent_tmt
0,3809,0,1,0.544143
1,3191,1,1,0.455857


### Assigning Optimal Treatments for new observations
After building and evaluating an uplift model the modeller may deem it worthy of production. To assign new users the optimal treatment one can use the `predict_optimal_treatments` function to do so below.



In [25]:
#generate 5 new observation
_, x_new ,_  = get_simple_uplift_data(5)
uplift_model.predict_optimal_treatments(x_new)



array([[0],
       [0],
       [0],
       [0],
       [1]])

0.9998957938535137

In [8]:
uplift_model.calibrate()
erupt_curves1, dists1 = uplift_model.get_erupt_curves(calibrator = True)
erupt_curves1



Unnamed: 0,mean,std,response_var_names,weights,assignment
0,0.153304,0.004541,profit,1,model
0,-0.00452,0.004905,profit,1,random


In [None]:
dists['weights_1'] = [np.float(x.split(',')[0]) for x in dists['weights']]
erupt_curves['weights_1'] = [np.float(x.split(',')[0]) for x in erupt_curves['weights']]
ggplot(aes(x='weights_1', y='mean', group = 'assignment', colour = 'assignment'), data=erupt_curves) +\
    geom_line()+\
    geom_point()+facet_grid("response_var_names")

In [None]:
ggplot(aes(x='weights_1', y='num_observations'), data=dists) +\
    geom_line()+\
    geom_point()+facet_wrap('tmt')

In [None]:
uplift_model.calibrate()


In [None]:
erupt_curves, dists = uplift_model.get_erupt_curves(calibrator = True)
dists['weights_1'] = [np.float(x.split(',')[0]) for x in dists['weights']]
erupt_curves['weights_1'] = [np.float(x.split(',')[0]) for x in erupt_curves['weights']]
ggplot(aes(x='weights_1', y='mean', group = 'assignment', colour = 'assignment'), data=erupt_curves) +\
    geom_line()+\
    geom_point()+facet_wrap("response_var_names")

In [None]:
test  = uplift_model.copy()
test.x.shape


In [None]:
uplift_model.save('/users/samweiss/temp')

In [None]:
test = IbottaUplift()
test.load('/users/samweiss/')
test

In [None]:
test.x

In [None]:
uplift_model.copy().model == uplift_model.model

In [None]:
import copy
test_copy = copy.copy(uplift_model)


In [None]:
class test_class(object):
    def fit(self, x):
        self.x = x
test = test_class()
test.fit(1)
test.x


import dill
dill.dump(test, file = open('/users/samweiss/downloads/test_load.pkl', "wb"))
test_load = dill.load(open('/users/samweiss/downloads/test_load.pkl', "rb"))
test_load.x

In [None]:
import dill
dill.dump(test, file = open('/users/samweiss/downloads/test_load.pkl', "wb"))
test_load = dill.load(open('/users/samweiss/downloads/test_load.pkl', "rb"))
test_load.x

In [None]:
erupt_curves, dists = uplift_model.get_erupt_curves(objective_weights = np.array([1,0,0,0]).reshape(1,-1))

In [None]:
erupt_curves