### $\textbf{QuantCo Causal Inference Project}$
##### $\textit{Wian Stipp}$

### Describing The Problem:

We are using a study to compare the quality of services provided by two physician groups for asthma patients in California. Let $Y_i(z)$ be the score (1 = satisfactory, 0 = non-satisfactory) given by patient $i$ for the treatment recieved by physician  group $z \in \{1,2\}$ 
Besides the treatment variable and the outcome variable we have a list of covariates: 'age', 'sex', 'education', 'insurance','drug coverage', 'severity','comorbidity', 'physical comorbidity', 'mental comorbidity'.

Comorbidity is defined as: the presence of one or more additional conditions co-occurring with a primary condition; in the countable sense of the term, a comorbidity is each additional condition. The additional condition may also be a behavioral or mental disorder.

We assume the treatment assignment is ignorable conditional on all the pretreatment variables

#### The Objective

We are using a study to compare the quality of services provided by two physician groups for asthma patients in California. Our objective is to find the average treatment (causal) effect of each physician compared to the other. That is if $p_z$ is the fraction of patients that would be satisfied with the service of $z$ if all patients were treated by the same, then we want to find $p_1 - p_2$.

here is, however, a more interesting insight we can find; We can find which types of people would be more satisified with each of the physician groups. This is a problem known as $\textit{heterogeneous treatment effect (HTE) estimation}$. In this project, we will explore how a causal tree-based learning method can find heterogeneity in the treatment effects.


#### The Data

The data is sourced from http://www.biostat.jhsph.edu/~cfrangak/biostat_causal/asthma.txt. We are going to split the data into a training, validation and testing subset such that we can test the model on out-of-sample data.

#### The Model

We are primarily using the work published 2019: "Learning Triggers for Heterogeneous Treatment Effects" by Christopher Tran and Elena Zheleva.

### Appendix

#### Heterogeneous treatment effect estimation

We are trying to estimate the conditional average treatment effect (CATE) using a set of features. We define the CATE as:

$$\tau(x) := E[Y_{1i}-Y_{i0} | X_i = \textbf{x}]$$

We need to find an estimate $\hat{\tau}(x)$ for the CATE. Clearly we need to partition the feature space. Our dataset, S, needs to have the form:

$$ S = \{ { \textbf{X}_i, Y_i, T_i): \textbf{X}_i \in \mathcal{X}} \} $$

Where $Y_i$ is the outcome, $T_i$ is the binary variable for the treatment, and $\textbf{X}_i$ the feature vector containing the other covariates.

A partitioning of the feature space into L partitions is defined as

$$ \mathcal{X} = \mathcal{X_1}\cup\mathcal{X_2}\cup ... \cup \mathcal{X_L} $$

The subsets should be mutually exclusive and collectively exhaustive of $\mathcal{X}$. Then

$$ S_l = \{ { \textbf{X}_i, Y_i, T_i): \textbf{X}_i \in \mathcal{X_l}} \} $$

Now it follows that the conditional mean for the outcome and control in a particular partition $\mathcal{X_l}$ is defined as:

$$ \hat{\mu}_t(S_l) = \frac{1}{N_{l_t}} \sum_{T_i=t, i \in S_l}^{} Y_i $$

where  $t \in {0,1} : \hat{\mu}_1, \hat{\mu}_2 $ are the conditional means. Capital N is indexed by the partition's treatment and control group. With this information we can find the average causal effect (ACE) for a given partitiond defined by:

$$ \hat{\tau}(S_l) = \hat{\mu}_1(S_l) - \hat{\mu}_0(S_l) $$