# Augmented Inverse Probability of Treatment Weights
In the last tutorial, use used the AIPTW estimator to calculate the average treatment effect for a binary outcome. In this tutorial, we will instead calculate the average treatment effect for a continuous outcome.

## AIPTW

As a reminder, AIPTW takes the following form
$$E[Y^a] = \frac{1}{n} \sum_i^n \left(\frac{Y \times I(A=a)}{\widehat{\Pr}(A=a|L)} - \frac{\hat{E}[Y|A=a, L] \times (I(A=a) - \widehat{\Pr}(A=a|L))}{1 - \widehat{\Pr}(A=a|L)}\right)$$
where $\widehat{\Pr}(A=a|L)$ comes from the IPTW model and $\hat{E}[Y|A=a,L]$ comes from the g-formula

## Continuous Outcome example
To motivate our example, we will use a simulated data set included with *zEpid*. In the data set, we have a cohort of HIV-positive individuals. We are interested in the sample average treatment effect of antiretroviral therapy (ART) on CD4 T-cell count at 45-weeks. We will ignore competing risks and their implications in this example. Based on substantive background knowledge, we believe that the treated and untreated population are exchangeable based gender, age, baseline CD4 T-cell count, and detectable viral load. 

In this tutorial, we will focus on a complete case analysis. Therefore, we will drop the `dead` column and all the missing data in `cd4_wk45`. This will leave 460 observations with no missing data

In [1]:
import numpy as np
import pandas as pd

from zepid import load_sample_data, spline
from zepid.causal.doublyrobust import AIPTW

df = load_sample_data(False)
df[['age_rs1', 'age_rs2']] = spline(df, 'age0', n_knots=3, term=2, restricted=True)
df[['cd4_rs1', 'cd4_rs2']] = spline(df, 'cd40', n_knots=3, term=2, restricted=True)

dfcc = df.drop(columns=['dead']).dropna()
dfcc.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 460 entries, 1 to 546
Data columns (total 12 columns):
id          460 non-null int64
male        460 non-null int64
age0        460 non-null int64
cd40        460 non-null int64
dvl0        460 non-null int64
art         460 non-null int64
t           460 non-null float64
cd4_wk45    460 non-null float64
age_rs1     460 non-null float64
age_rs2     460 non-null float64
cd4_rs1     460 non-null float64
cd4_rs2     460 non-null float64
dtypes: float64(6), int64(6)
memory usage: 46.7 KB


Our data is now ready to conduct a complete case analysis using AIPTW. First, we initialize AIPTW with our complete-case data (`dfcc`), the treatment (`art`), and the outcome (`cd4_wk45`). In the background, `AIPTW` will automatically recognize that `cd4_wk45` is not a binary variable and will consider it as a continuous outcome

In [2]:
aipw = AIPTW(dfcc, exposure='art', outcome='cd4_wk45')

We now repeat the process of fitting the treatment model and outcome models, then estimate the average treatment effect.

In [4]:
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', print_results=False)
aipw.fit()
aipw.summary()

           Augment Inverse Probability of Treatment Weights           
Average Treatment Effect:    225.138
95.0% two-sided CI: (118.647 , 331.629)


Our results indicate that ART increased CD4 T-cell count by week 45. This results are similar to the other methods.

## Poisson Distribution
While the default of `AIPTW` is to assume the outcome follows a normal distribution and uses ordinary least squares to estimate the effect, we can also specify to use Poisson regression. To do that, we specify `continuous_distribution='poisson'` in the `outcome_model()` function. Let's look at an example

In [6]:
aipw = AIPTW(dfcc, exposure='art', outcome='cd4_wk45')
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   continuous_distribution='poisson', print_results=False)
aipw.fit()
aipw.summary()

           Augment Inverse Probability of Treatment Weights           
Average Treatment Effect:    225.234
95.0% two-sided CI: (118.663 , 331.805)


# Conclusion
In this tutorial, I demonstrated augmented-IPTW for continuous outcomes with `AIPTW` using *zEpid*. Please view other tutorials for information on other functionality within *zEpid*

## References
Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M. (2011). Doubly robust estimation of causal effects. *AJE*, 173(7), 761-767.

Lunceford JK, Davidian M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. *SiM*, 23(19), 2937-2960.

Keil AP et al. (2018). Resolving an apparent paradox in doubly robust estimators. *AJE*, 187(4), 891-892.

Robins JM, Rotnitzky A, Zhao LP. (1994). Estimation of regression coefficients when some regressors are not always observed. *JASA*, 89(427), 846-866.