## Getting started tutorial

### ! This tutorial uses the Hillstrom Email Marketing dataset. More information about the dataset you can find on the [official site](http://minethatdata.com/Stochastic_Solutions_E-Mail_Challenge_2008.04.30.pdf)

In [1]:
from pyuplift.variable_selection import Econometric
from pyuplift.datasets import load_hillstrom_email_marketing
from pyuplift.model_selection import train_test_split

### Load data from the Hillstrom Email Marketing dataset
Parameter `load_raw_data` allowed you to load raw data (original dataset) or preprocessed data (ready to go).

In [2]:
data = load_hillstrom_email_marketing(load_raw_data=False)

In [3]:
data

{'description': 'This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test. 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise. 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise. 1/3 were randomly chosen to not receive an e-mail campaign. During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.',
 'data': array([[ 10.  , 142.44,   1.  , ...,   0.  ,   1.  ,   0.  ],
        [  6.  , 329.08,   1.  , ...,   0.  ,   0.  ,   1.  ],
        [  7.  , 180.65,   0.  , ...,   0.  ,   0.  ,   1.  ],
        ...,
        [  6.  ,  29.99,   1.  , ...,   0.  ,   1.  ,   0.  ],
        [  1.  , 552.94,   1.  , ...,   1.  ,   0.  ,   0.  ],
        [  1.  , 472.82,   0.  , ...,   0.  ,   0.  ,   1.  ]]),
 'feature_names': array(['recency', 'history', 'mens', 'wom

### Hillstrom dataset has 3 type of the treatment:
* 0, No E-Mail
* 1, Mens E-Mail
* 2, Womens E-Mail

### Let's take two of them: 
* No E-mail (no treatment) 
* Mens E-mail (treatment)

In [4]:
ex_womens_indexes = data['treatment'] != 2

In [5]:
X = data['data'][ex_womens_indexes]
y = data['target'][ex_womens_indexes]
t = data['treatment'][ex_womens_indexes]

### Random split data on train (70%) and test (30%)

In [6]:
X_train, X_test, y_train, y_test, t_train, t_test = train_test_split(X, y, t, train_share=0.7, random_state=123)

### Create the uplift model with default parameters

In [7]:
model = Econometric()

### Fit the model

In [8]:
model.fit(X_train, y_train, t_train)

<pyuplift.variable_selection.econometric.Econometric at 0x214f0be0>

### Predict uplift for the test dataset

In [9]:
uplift = model.predict(X_test)

In [10]:
uplift[:20]

array([1.0615553 , 0.41391224, 0.26028002, 2.09681851, 0.42625385,
       1.94064929, 2.50369232, 0.52225684, 0.17712341, 0.91999936,
       0.54780214, 0.27353447, 0.74778451, 0.77815588, 0.89413281,
       0.50344916, 0.5541491 , 1.19713328, 1.62508446, 2.72094539])