## Exercise set 5: causal forest

In this exercise set we will be working with the `econml` package to estimate a causal forest.

Another more general implementation is found in [generalized random forest](https://github.com/grf-labs/grf) by Athey et al. The package is written for the R programming language.

In [1]:
import matplotlib.pyplot as plt
import numpy as np 
import pandas as pd 
import seaborn as sns

from sklearn.datasets import make_classification

sns.set(style='darkgrid')

%matplotlib inline

To highlight the usefulness of causal forest we will be working with synthetic data in this exercise. In particular we will synthetically add a treatment effect to a dataset in which there otherwise is none. Furthermore we will make this effect heterogeneous by adding noise, and by making it depend on a single continuous variable as well as a categorical variable. 

>**Ex. 5.1.0:** Use the code below to simulate data according to
<br>
<br>
\begin{align}
T(X) &= \frac{1}{1+e^{-X\delta+U}} > 0.5 \\ 
\tau(X) &=  \frac{1}{1+e^{-\gamma X_0}} \\
Y(T=0) &= X\beta + \epsilon \\         
Y(T=1) &= Y(0) + \tau(X) \\ 
\end{align}
<br>
where $\epsilon, \nu$ are simply noise terms distributed according to $\mathcal{N}(0,1)$ and $\beta,\delta$ are `N_FEATURES` vector of random parameters. $\gamma$ is a scalar parameter.


In [2]:
N_SAMPLES = 10000
N_FEATURES = 5
GAMMA = 1.2
BETA = np.random.RandomState(0).uniform(0,1, size = N_FEATURES)
DELTA = np.random.RandomState(1).uniform(0,1, size = N_FEATURES)

X = np.random.RandomState(2).normal(size = (N_SAMPLES, N_FEATURES))

U = np.random.RandomState(3).normal(size = (N_SAMPLES))
T = 1/(1+np.exp(-(U+X.dot(DELTA))))>.5

       
Y0 = X @ BETA + np.random.RandomState(5).normal(size = N_SAMPLES)
tau = 10/(1 + np.exp(-GAMMA*X[:,0])) + np.random.normal(size = N_SAMPLES)
Y1 = Y0 + tau
y = Y0 + T*(Y1 - Y0)

> **Ex. 5.1.1:** Create a two-subplot figure, and plot $Y(0)$ and $Y(1)$ in one subplot against $X_0$. Plot $\tau(x)$ against $X_0$ in the other subplot. What do you see? Why do we observe $\tau=0$ in many cases?

In [3]:
# Your answer here

> **Ex. 5.1.2:** Is there a selection problem? Plot for each dimension of $X$ the relationship with treatment assignment.

In [70]:
# Your answer here

>**Ex.5.1.3:** Estimate a causal forest model using the `econml` package, and store the model in a new variable `cf`. 
>
> To unconfound the treatment assignment, use the gradient boosted forest in the first "double machine learning" step. Then use the following line to create a dataframe of predicted treatment effects on the same data that you trained the model on. 
>> Hint: use the following setting 
>>```python
discrete_treatment=True
```

In [70]:
# Your answer here

>**Ex.5.1.4:** Plot a scatterplot of the estimated individual treatment effects against the simulated "true" ITE's `tau` that you produced in the beginning of this exercise set.

In [70]:
# Your answer here