In [None]:
import os
import tempfile

# Target Trial Emulation in Python

On this notebook we will be following the tutorial [*Target Trial Emulation in R*](https://rpubs.com/alanyang0924/TTE) by Alan Yang and translate the **R** code to **Python**.

But before we do, since the tutorial is using a package [`TrialEmulation`](https://github.com/Causal-LDA/TrialEmulation/blob/v0.0.4.2/R/) that is not available in Python as of the making of this notebook, we will first have to simulate the class `trial_sequence`. We will not simulate everything from the package, only the functions that are used in the tutorial.

In [2]:
from trial_sequence import trial_sequence
from trial_sequence.utils import stats_glm_logit

We separated the logic of [`trial_sequence`](trial_sequence) for readability.

## 1. Setup

First, we have to identify what estimand will be used. For simplicity we will just follow what the tutorial used, which is PP and ITT; besides these two, there is also **as-treated (AT)**.

In [3]:
trial_pp = trial_sequence(estimand="PP")
trial_itt = trial_sequence(estimand="ITT")

Let's also make sure that we have dedicated directories to save the files for later inspection.

In [4]:
trial_pp_dir = os.path.join(tempfile.gettempdir(), "trial_pp")
os.makedirs(trial_pp_dir, exist_ok=True)

trial_itt_dir = os.path.join(tempfile.gettempdir(), "trial_itt")
os.makedirs(trial_itt_dir, exist_ok=True)

## 2. Data Preparation

In [5]:
data_censored = pd.read_csv("data_censored.csv")
data_censored.groupby("id").first().head()

Unnamed: 0_level_0,period,treatment,x1,x2,x3,x4,age,age_s,outcome,censored,eligible
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,1,1,1.146148,0,0.734203,36,0.083333,0,0,1
2,0,0,1,-0.802142,0,-0.990794,26,-0.75,0,0,1
3,0,1,0,0.571029,1,0.391966,48,1.083333,0,0,1
4,0,0,0,-0.107079,1,-1.613258,29,-0.5,0,0,1
5,0,1,1,0.749092,0,1.62033,32,-0.25,0,0,1


In [6]:
trial_pp.set_data(
    id="id",
    period="period",
    outcome="outcome",
    eligible="eligible",
    treatment="treatment",
    data=data_censored
)

trial_itt.set_data(
    id="id",
    period="period",
    outcome="outcome",
    eligible="eligible",
    treatment="treatment",
    data=data_censored
)

trial_itt


Trial Sequence Object
Estimand: Intent-to-Treat

Data:
   id  period  treatment  x1        x2  x3        x4  age     age_s  outcome  \
0   1       0          1   1  1.146148   0  0.734203   36  0.083333        0   
1   1       1          1   1  0.002200   0  0.734203   37  0.166667        0   
2   1       2          1   0 -0.481762   0  0.734203   38  0.250000        0   
3   1       3          1   0  0.007872   0  0.734203   39  0.333333        0   
4   1       4          1   1  0.216054   0  0.734203   40  0.416667        0   

   ...  eligible  time_of_event  first  am_1  cumA  switch  regime_start  \
0  ...         1         9999.0   True   0.0   2.0       0             0   
1  ...         0         9999.0  False   1.0   3.0       0             1   
2  ...         0         9999.0  False   1.0   4.0       0             2   
3  ...         0         9999.0  False   1.0   5.0       0             3   
4  ...         0         9999.0  False   1.0   6.0       0             4   

   tim

## 3. Weight Models and Censoring

The tutorial used inverse probability of censoring weights (IPCW) to adjust for the effects of informative censoring. To estimate these weights, it constructed time-to-censoring event models and fit two sets of models: one for censoring due to deviation from the assigned treatment, and another for other forms of informative censoring.

### 3.1 Censoring Due to Treatment Switching

The tutorial demonstrates how to set up model formulas for estimating the probability of receiving treatment in the current period. It fits separate models for patients who received treatment $(treatment = 1)$ and those who did not $(treatment = 0)$ in the previous period. To obtain stabilized weights, the approach involves fitting both numerator and denominator models.

Also, the tutorial outlines optional arguments that allow you to specify columns to include or exclude observations from the treatment models. This can be particularly useful when a patient is unable to deviate from a particular treatment assignment during a given period.

In [7]:
trial_pp.set_switch_weight_model(
    numerator="age",
    denominator="age + x1 + x3",
    model_fitter=stats_glm_logit(save_path=os.path.join(trial_pp_dir, "switch_models"))
)

trial_pp.switch_weights

Numerator formula: treatment ~ age
Denominator formula: treatment ~ age + x1 + x3
Model fitter type: te_stats_glm_logit

If we attempted this function on a ITT estimand, the function will raise an error.

### 3.2 Other Informative Censoring

The tutorial introduced that if there’s additional informative censoring in the data, you can build similar models to estimate the inverse probability of censoring weights (IPCW). This method works for all estimands, and you simply need to specify the `censor_event` column as the censoring indicator.

In [8]:
trial_pp.set_censor_weight_model(
    censor_event="censored",
    numerator="x2",
    denominator="x2 + x1",
    pool_models=None,
    model_fitter=stats_glm_logit(save_path=os.path.join(trial_pp_dir, "switch_models"))
)

trial_pp.censor_weights

Numerator formula: 1 - censored ~ x2
Denominator formula: 1 - censored ~ x2 + x1
Model fitter type: te_stats_glm_logit

In [9]:
trial_itt.set_censor_weight_model(
    censor_event="censored",
    numerator="x2",
    denominator="x2 + x1",
    pool_models="numerator",
    model_fitter=stats_glm_logit(save_path=os.path.join(trial_itt_dir, "switch_models"))
)

trial_itt.censor_weights

Numerator formula: 1 - censored ~ x2
Denominator formula: 1 - censored ~ x2 + x1
Numerator model is pooled across treatment arms. Denominator model is not pooled
Model fitter type: te_stats_glm_logit

## 4. Calculate Weights

The tutorial then demonstrates how to fit each individual model and merge them into a set of weights using the `calculate_weights()` function.

In [10]:
trial_pp.calculate_weights()
trial_itt.calculate_weights()

AttributeError: module 'parsnip' has no attribute 'fit'