# Uplift Modeling 101
## Agenda

- 初心者向けのUplift Modelingの紹介


### Hardware

In [1]:
%%bash
system_profiler SPHardwareDataType | grep -E \
"Model Identifier"\|"Processor Name"\|"Processor Speed"\
\|"Number of Processors"\|"Memory:"

      Model Identifier: MacBookPro13,1
      Processor Name: Dual-Core Intel Core i5
      Processor Speed: 2 GHz
      Number of Processors: 1
      Memory: 16 GB


In [2]:
!sw_vers

ProductName:	Mac OS X
ProductVersion:	10.15.4
BuildVersion:	19E287


### Python 

In [3]:
!python -V

Python 3.7.4


### Install Packages

In [4]:
pass

### Import

In [5]:
from functools import partial
import warnings

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.datasets import fetch_openml
from sklearn.compose import ColumnTransformer

from sklearn.dummy import DummyRegressor
from sklearn.pipeline import Pipeline
from sklearn.linear_model import PoissonRegressor, GammaRegressor
from sklearn.linear_model import TweedieRegressor
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.preprocessing import StandardScaler, KBinsDiscretizer

from sklearn.metrics import mean_absolute_error, mean_squared_error, auc, mean_poisson_deviance, mean_tweedie_deviance

### Setting

In [6]:
pd.set_option('display.max_columns', None)

## 1. Uplift Modelingとは

消費者の購買意欲を高め売上増大を通じた自社利潤増加を目的に、クーポンの発行や、two-in-one priceといったマーケティング活動を企業は実施するが、これらの施策がもともと買う人（always-takers）ではなく、クーポンがあるから購買した人(complier)に効いているのか否かを判断することは施策の効果を判断する上で重要である。

Uplift Modelingとは、大雑把にいうと` who will be most likely buy because of the campaign`, つまりcomplier(またはPersuadables)を識別するモデルを組み立て、そのモデルに基づきマーケティング活動を実施する手法である。

<img src = "https://github.com/RyoNakagami/omorikaizuka/blob/master/Econometrics/uplift_modeling_fig1.jpg?raw=true">



## 2, PythonでのUplift Modeling Example 1

- data: https://drive.google.com/u/0/uc?id=1fkxNmihuS15kk0PP0QcphL_Z3_z8LLeb&export=download

### 変数

- target: おそらくconversionのこと 


In [7]:
# reading data
df_clients = pd.read_csv('./data/clients.csv', index_col='client_id')
df_train = pd.read_csv('./data/uplift_train.csv', index_col='client_id')
df_test = pd.read_csv('./data/uplift_test.csv', index_col='client_id')

In [8]:
df_clients.head()

Unnamed: 0_level_0,first_issue_date,first_redeem_date,age,gender
client_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
000012768d,2017-08-05 15:40:48,2018-01-04 19:30:07,45,U
000036f903,2017-04-10 13:54:23,2017-04-23 12:37:56,72,F
000048b7a6,2018-12-15 13:33:11,,68,F
000073194a,2017-05-23 12:56:14,2017-11-24 11:18:01,60,F
00007c7133,2017-05-22 16:17:08,2018-12-31 17:17:33,67,U


In [9]:
df_train.head()

Unnamed: 0_level_0,treatment_flg,target
client_id,Unnamed: 1_level_1,Unnamed: 2_level_1
000012768d,0,1
000036f903,1,1
00010925a5,1,1
0001f552b0,1,1
00020e7b18,1,1


In [10]:
df_test.head()

000048b7a6
000073194a
00007c7133
00007f9014
0000a90cf7


### Pre-processing

In [11]:
# extracting features
df_features = df_clients.copy()
df_features['first_issue_time'] = (pd.to_datetime(df_features['first_issue_date']) - pd.Timestamp('1970-01-01')) // pd.Timedelta('1s')
df_features['first_redeem_time'] = (pd.to_datetime(df_features['first_redeem_date']) - pd.Timestamp('1970-01-01')) // pd.Timedelta('1s')
df_features['issue_redeem_delay'] = df_features['first_redeem_time'] - df_features['first_issue_time']
df_features = df_features.drop(['first_issue_date', 'first_redeem_date'], axis=1)

In [12]:
# extract indeces
indices_train = df_train.index
indices_test = df_test.index
indices_learn, indices_valid = train_test_split(df_train.index, test_size=0.3, random_state=123)

In [13]:
X_train = df_features.loc[indices_learn, :]
y_train = df_train.loc[indices_learn, 'target']
treat_train = df_train.loc[indices_learn, 'treatment_flg']

X_val = df_features.loc[indices_valid, :]
y_val = df_train.loc[indices_valid, 'target']
treat_val =  df_train.loc[indices_valid, 'treatment_flg']

X_train_full = df_features.loc[indices_train, :]
y_train_full = df_train.loc[:, 'target']
treat_train_full = df_train.loc[:, 'treatment_flg']

X_test = df_features.loc[indices_test, :]