# Tutorial on using the nudging package

**Important note**: If you're using this notebook on binder: be careful with uploading private datasets, since the data will uploaded to cloud providers.

To install the package locally see: https://github.com/UtrechtUniversity/nudging.

The goal of this tutorial is show how you can use the nudging package and apply machine learning methods to predict the Conditional Average Treatment Effect (CATE). Intuitively speaking, this simply the difference of outcome for each person, in the case when someone is or isn't nudged.

## Step 1: Python imports

First import some modules that are needed later.

In [1]:
import pandas as pd
import numpy as np

from nudging.dataset.file import FileDataset

## Step 2: Read the file into a pandas dataframe

Pandas has a good number of file reading functions for tabular datasets, including Excel, Stata and CSV. Below an example for a CSV file.

In [18]:
# Make sure that the file demonstration.csv is in the same folder as the notebook.
df = pd.read_csv("demonstration.csv")
df[:5]

Unnamed: 0,PassengerId,Name,Sex,Age,Parch,Fare,Cabin,Embarked,Birthday,Board time,Married since
0,1,"Braund, Mr. Owen Harris",male,22.0,0,7.25,,S,1922-03-23,14:57:38,2022-08-13 08:42:37
1,2,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,0,71.2833,C85,C,1938-03-22,17:16:30,2022-07-24 06:45:36
2,3,"Heikkinen, Miss. Laina",female,26.0,0,7.925,,S,1909-11-06,15:57:59,2022-07-21 02:32:58
3,4,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,0,53.1,C123,S,1915-10-14,14:17:36,2022-07-19 13:22:18
4,5,"Allen, Mr. William Henry",male,35.0,0,8.05,,S,1929-04-18,13:23:39,2022-07-26 11:47:42


## Step 3: Adjust the columns 

The end result of adjusting should be that there are at least two columns:

`outcome`, which is a column with the outcome for each person.
`nudge`, whether a person was nudged or not. Should be 1 if nudged, 0 if not nudged.

Then there should be the features such as age and gender as the other columns. The features should be numerical. So for example, one would convert gender to a value [0, 1].

Below an we adjust the columns of our demonstration dataset. Note that originally it doesn't have anything to do with nudging, so two extra columns are generated. The CATE is set to be equal to the fare so that we can see that the CATE modeling works.

In [23]:
# Randomly nudge people
df["nudge"] = [np.random.randint(2) for _ in range(len(df))]

# Set the outcome to the Fare + a small random factor
df["outcome"] = df["Fare"]*df["nudge"] + np.random.randn(len(df))

# Convert the Sex column to one with a numerical value.
df["gender"] = (df["Sex"] == "male").astype(int)

# Drop all columns except the ones we train the model on.
new_df = df[["Age", "Parch", "Fare", "gender", "nudge", "outcome"]]
new_df

Unnamed: 0,Age,Parch,Fare,gender,nudge,outcome
0,22.0,0,7.2500,1,0,1.810222
1,38.0,0,71.2833,0,0,2.736628
2,26.0,0,7.9250,0,1,6.875718
3,35.0,0,53.1000,0,0,-0.974480
4,35.0,0,8.0500,1,0,0.645431
...,...,...,...,...,...,...
886,27.0,0,13.0000,1,1,12.670302
887,19.0,0,30.0000,0,1,28.572253
888,,2,23.4500,0,0,0.224022
889,26.0,0,30.0000,1,0,-1.363047


## Step 4: Convert the pandas DataFrame to a nudging dataset

This simply done with the `FileDataset` python class.

In [24]:
dataset = FileDataset.from_dataframe(new_df)

## Step 5: Predict the CATE

In [26]:
cate = dataset.predict_cate()

# Show the first 10 values of the CATE.
print(cate[:10])

[ 7.32102577 71.26129889  7.96796332 53.106404    8.07848699 51.82890797
 21.05305544 11.18699096 30.10977875 16.72217655]


Int64Index([  0,   1,   2,   3,   4,   6,   7,   8,   9,  10,
            ...
            880, 881, 882, 883, 884, 885, 886, 887, 889, 890],
           dtype='int64', length=714)

Notice that the length of the nudging dataset is not the same as the length of the original dataframe. This is because the modeling does not use any people with NA's in their features. To convert it back to the original you can use the index: `dataset.standard_df.index`.