# Tutorial on using the nudging package

**Important note**: If you're using this notebook on binder: be careful with uploading private datasets, since the data will uploaded to cloud providers.

To install the package locally see: https://github.com/UtrechtUniversity/nudging.

The goal of this tutorial is show how you can use the nudging package and apply machine learning methods to predict the Conditional Average Treatment Effect (CATE). Intuitively speaking, this simply the difference of outcome for each person, in the case when someone is or isn't nudged.

## Step 1: Python imports

First import some modules that are needed later.

In [3]:
import nudging.dataset

In [1]:
import pandas as pd
import numpy as np

from nudging.dataset.file import FileDataset

ModuleNotFoundError: No module named 'nudging.dataset.file'

## Step 2: Read the file into a pandas dataframe

Pandas has a good number of file reading functions for tabular datasets, including Excel, Stata and CSV. Below an example for a CSV file.

In [None]:
# Make sure that the file demonstration.csv is in the same folder as the notebook.
df = pd.read_csv("tutorial.csv")
df[:5]

In the above tutorial dataset, the "checked" column is the nudge, while the "survival chance" is the outcome. This will be standardized in the next step (but obviously that can also be done before loading the CSV.

## Step 3: Adjust the columns 

The end result of adjusting should be that there are at least two columns:

`outcome`, which is a column with the outcome for each person.
`nudge`, whether a person was nudged or not. Should be 1 if nudged, 0 if not nudged.

Then there should be the features such as age and gender as the other columns. The features should be numerical. So for example, one would convert gender to a value [0, 1].

Below an we adjust the columns of our demonstration dataset. Note that originally it doesn't have anything to do with nudging, so two extra columns are generated. The CATE is set to be equal to the fare so that we can see that the CATE modeling works.

In [None]:
# Rename the checked column
df["nudge"] = df["checked"]

# Set the outcome to the Fare + a small random factor
df["outcome"] = df["survival chance"]

# Convert the Sex column to one with a numerical value.
df["gender"] = (df["Sex"] == "male").astype(int)

# Drop all columns except the ones we train the model on.
new_df = df[["Age", "Parch", "Fare", "gender", "nudge", "outcome"]]
new_df

## Step 4: Convert the pandas DataFrame to a nudging dataset

This simply done with the `FileDataset` python class.

In [None]:
dataset = FileDataset.from_dataframe(new_df)

## Step 5: Predict the CATE

In [None]:
cate = dataset.predict_cate()

# Show the first 10 values of the CATE.
print(cate[:10])

Notice that the length of the nudging dataset is not the same as the length of the original dataframe. This is because the modeling does not use any people with NA's in their features. To convert it back to the original you can use the index: `dataset.standard_df.index`.

In [None]:
cate_whole = np.full(len(df), np.nan, dtype=float)
cate_whole[dataset.standard_df.index] = cate
df["cate"] = cate_whole
df[:10]

### Save the results of the modeling to a new CSV file

In [None]:
df.to_csv("tutorial_with_cate.csv", index=False)