# Precision-cancer

Packages:
Numpy version: 1.20.0;
Pandas version: 1.2.1;
scikit-learn version: 0.24.1;
lifelines version: 0.26.0

In [1]:
import pandas as pd
import numpy as np
from lifelines import CoxPHFitter, KaplanMeierFitter
from sklearn.linear_model import LogisticRegression
import warnings
warnings.simplefilter(action='ignore')

## Load Synthetic Data

For this notebook, we use synthetic data in "synthetic_data.csv" as a demonstration of our pipeline. It includes the following features -
- Confounders. We note that here is one toy example. In real applications, more confounders and more categories should be included.
    - <font color=darkblue>*Gender*</font>: whether the patient is female (=1) or not (=0).
    - <font color=darkblue>*Age*</font>: whether age >= 40 (=1) or not (=0).
    - <font color=darkblue>*ECOG*</font>: whether ECOG >=3 (=1) or not (=0).
- <font color=darkblue>*Mutation status*</font> for Gene A and Gene B.
    - Whether the gene is mutated (=1) or not (=0).
- <font color=darkblue>*Treatment status*</font> for the studied off-label drug (Treatment 1 - off-label), standard-of-care (Treatment 2 - standard-of-care), and other treatments.
    - Whether the patient received the treatment (=1) or not (=0).
- Survival information
    - <font color=darkblue>*duration*</font>: time from start of treatment to event (days).
    - <font color=darkblue>*event*</font>: death happened (=1) or right censored (=0)
    - <font color=darkblue>*fmi_test*</font>: time from start of treatment to FMI test (days).

In [2]:
data_cox = pd.read_csv('synthetic_data.csv')
print('Number of patients: %d' % data_cox.shape[0])
cols_basic = ['duration', 'event', 'fmi_test'] # Basic infomation for survival analysis
confounders = ['Female', 'Age >= 40', 'ECOG >= 3'] # Confounders
data_cox.head()

Number of patients: 5000


Unnamed: 0,Female,Age >= 40,ECOG >= 3,Gene A,Gene B,duration,event,fmi_test,Treatment 1 - off-label,Treatment 2 - standard-of-care,Treatment 3,Treatment 4
0,1,1,1,0,1,501,1,27,0,1,0,0
1,0,1,1,0,1,225,1,12,1,0,0,0
2,1,1,1,0,1,70,1,14,0,0,1,0
3,0,1,1,0,1,16,0,5,0,0,0,1
4,1,0,0,0,1,25,0,0,0,1,0,0


## Analysis

In [3]:
def process_HR(cph_summary, name_feature):
    '''
    Process result of CoxPH model to HR with 95% confidence intervals.
    '''
    result = cph_summary.loc[name_feature, ['exp(coef)', 'exp(coef) lower 95%', 'exp(coef) upper 95%']].values
    HR = '%.2f (%.2f, %.2f)' % tuple(result)   
    return HR
    
def Cox_IPTW(data_fit, name_feature):
    '''
    Uni-covariate CoxPH model with IPTW and left-truncation. 
    name_feature is the feature we are interested in (e.g., gene name or treatment name).
    Return the HR for name_feature.
    '''
    # Generate data used in Cox Regression with IPTW Weights
    data_confounders = data_fit.iloc[:, 3:].copy().drop(columns=[name_feature])
    df = pd.concat([data_fit.iloc[:, :3], data_fit[[name_feature]]], axis=1)
    X = data_confounders.values
    y = data_fit[name_feature].values
    model = LogisticRegression(solver='lbfgs', n_jobs=-1, class_weight='balanced')
    model.fit(X, y)
    p_treated = float(np.sum(y==1))/y.shape[0]
    propensity_scores = model.predict_proba(X)[:, 1]
    df['weights'] = 0
    IP_treated = 1 / propensity_scores
    IP_untreated = 1 / (1 - propensity_scores)        
    df.loc[df[name_feature]==1, 'weights'] = IP_treated[df[name_feature]==1]
    df.loc[df[name_feature]==0, 'weights'] = IP_untreated[df[name_feature]==0]    
    # Cox Regression
    cph = CoxPHFitter()
    cph.fit(df, 'duration', 'event', weights_col='weights', entry_col='fmi_test', robust=True)
    HR = process_HR(cph.summary, name_feature)
    return HR

### 1. Treatment effect of off-label drug.

Here we study the treatment effect of the studied off-label drug (compare to standard-of-care) on overall survival (OS) and report the OS HR.

In [4]:
data_fit = data_cox[cols_basic+confounders+['Treatment 1 - off-label']]
HR = Cox_IPTW(data_fit, 'Treatment 1 - off-label')
print('OS HR for the studied off-label drug: %s' % (HR))

OS HR for the studied off-label drug: 0.43 (0.40, 0.46)


### 2. Gene-treatment interactions

Here we use analyze the interaction between mutations of Gene A and the studied off-label drug.
- Interaction: exp(coef) for the drug-treatment interaction term in the Cox model, adjusted by patient confounders.

In [5]:
gene = 'Gene A'
treatment = 'Treatment 1 - off-label'

data_fit = data_cox[cols_basic+[gene, treatment]]
col_inter = '%s+%s'%(gene, treatment)
data_fit[col_inter] = data_fit[gene]*data_fit[treatment]
cph = CoxPHFitter()
cph.fit(data_fit, 'duration', 'event', entry_col='fmi_test', robust=True)
HR = process_HR(cph.summary, col_inter)
print('Interaction between Gene A and Treatment 1 - off-label is %s' % HR)

Interaction between Gene A and Treatment 1 - off-label is 0.79 (0.69, 0.90)
