## **1. Setup**

Install Necessary Libraries: You'll need libraries such as pandas for data manipulation, lifelines for survival analysis, and numpy for numerical operations.

In [1]:
import pandas as pd
import numpy as np
from lifelines import KaplanMeierFitter, CoxPHFitter

## **2. Data Preparation**


Load Data: Import your observational dataset into a pandas DataFrame.

In [2]:
data = pd.read_csv('../Data/data_censored.csv')


Define Variables: Specify treatment, outcome, and covariates based on your study design.


In [3]:
treatment = 'treatment_column'
outcome = 'outcome_column'
covariates = ['covariate1', 'covariate2', 'covariate3']


## **3. Handling Censoring via Inverse Probability of Censoring Weights (IPCW)**


Fit a Censoring Model: Use a logistic regression model to estimate the probability of censoring.

In [6]:
from sklearn.linear_model import LogisticRegression

# Define actual covariates based on available columns
covariates = ["x1", "x2", "x3", "x4", "age", "age_s"]

# Ensure we only use covariates that exist in the dataset
existing_covariates = [col for col in covariates if col in data.columns]

if not existing_covariates:
    raise ValueError("None of the specified covariates exist in the dataset. Check column names.")

# Fit logistic regression for censoring probability
censoring_model = LogisticRegression()
censoring_model.fit(data[existing_covariates], data["censored"])  # "censored" is the target

# Compute inverse probability of censoring weights (IPCW)
data["censoring_prob"] = censoring_model.predict_proba(data[existing_covariates])[:, 1]
data["ipcw"] = 1 / data["censoring_prob"]


## **Step 4: Estimating Treatment Effects with IPCW**



In R, glm() was likely used for regression. In Python, we use statsmodels for weighted regression.

In [8]:
import statsmodels.api as sm

# Fit a weighted regression model for outcome prediction
X = data[["treatment"]]  # Only treatment variable
X = sm.add_constant(X)  # Add intercept
y = data["outcome"]

# Apply inverse probability of censoring weights (IPCW)
weighted_model = sm.WLS(y, X, weights=data["ipcw"]).fit()

# Print results
print(weighted_model.summary())


                            WLS Regression Results                            
Dep. Variable:                outcome   R-squared:                       0.010
Model:                            WLS   Adj. R-squared:                  0.009
Method:                 Least Squares   F-statistic:                     7.467
Date:                Sun, 09 Mar 2025   Prob (F-statistic):            0.00644
Time:                        22:01:16   Log-Likelihood:                 22.264
No. Observations:                 725   AIC:                            -40.53
Df Residuals:                     723   BIC:                            -31.36
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0361      0.007      5.125      0.0