# <div align='center'> Estimation of Survival Curve<br>
 $S(t) = Pr(T>t)$ 

## Requiremnts <br>
- ISLP package ([Follow for installation](https://pypi.org/project/ISLP/))

In [2]:
pip install ISLP

Collecting ISLP
  Downloading ISLP-0.3.17-py3-none-any.whl (3.6 MB)
     ---------------------------------------- 3.6/3.6 MB 1.8 MB/s eta 0:00:00
Collecting pygam>=0.0
  Downloading pygam-0.9.0-py3-none-any.whl (522 kB)
     -------------------------------------- 522.2/522.2 kB 3.3 MB/s eta 0:00:00
Collecting progressbar2<5.0.0,>=4.2.0
  Downloading progressbar2-4.2.0-py2.py3-none-any.whl (27 kB)
Collecting scipy>=0.9
  Downloading scipy-1.11.1-cp39-cp39-win_amd64.whl (44.1 MB)
     ---------------------------------------- 44.1/44.1 MB 3.6 MB/s eta 0:00:00
Collecting numpy>=1.7.1
  Downloading numpy-1.25.2-cp39-cp39-win_amd64.whl (15.6 MB)
     ---------------------------------------- 15.6/15.6 MB 3.6 MB/s eta 0:00:00
Collecting python-utils>=3.0.0
  Downloading python_utils-3.7.0-py2.py3-none-any.whl (26 kB)
Installing collected packages: python-utils, numpy, scipy, progressbar2, pygam, ISLP
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.5
    Uninstalling n

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'C:\\Users\\Admin\\anaconda3\\Lib\\site-packages\\~umpy\\core\\_multiarray_tests.cp39-win_amd64.pyd'
Consider using the `--user` option or check the permissions.



In [3]:
# importinig libraries
from ISLP.models import ModelSpec as MS
from ISLP import load_data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.pyplot import subplots
import seaborn as sns; sns.set()

from lifelines import KaplanMeierFitter, CoxPHFitter 
from lifelines.statistics import logrank_test, multivariate_logrank_test
from ISLP import load_data

ModuleNotFoundError: No module named 'ISLP'

# Brain Cancer dataset
- Contains the survival times for patients with primary brain tumors undergoing treatment with stereotactic radiation methods
- Conventional status code: 1 = uncensored observation (death) and 0 = censored observation

In [None]:
BrainCancer = load_data('BrainCancer')
BrainCancer.head()
# BrainCancer.shape

- Predictors: 
    1. gtv: Gross tumor volume (cm^3)
    2. sex
    3. Diagnosis: Meningioma, LG glioma, HG glioma and others
    4. ki: Karnofsky index
    5. loc: Tumor location
    6. stereo: Stereotactic methods (SRS and SRT) <br>
- Here the time is '$y_{i}$' (time to the *i*the event(censoring/death))

In [None]:
BrainCancer['sex'].value_counts()

In [None]:
BrainCancer['diagnosis'].value_counts()

In [None]:
BrainCancer['status'].value_counts()

So, the status indicates that 35 patients died before the end of the study

In [None]:
# Creating new object
km = KaplanMeierFitter()
# Fitting the data to the estimator and visualizing the survival curve
# Pointwise Confidence interval is set to 90% by default
km_brain = km.fit(BrainCancer['time'], BrainCancer['status'])
km_brain.plot(label = 'Kalpan Meier estimate')
plt.xlabel('Time in months')
plt.ylabel('Estimated probability of Survival')
plt.title('Survival curve for Brain Cancer')


# Kaplan-Meier survival curve - sex-stratified

In [None]:
# Initializing a directory to store data frames for each sex category
by_sex = {}

for sex, df in BrainCancer.groupby('sex'):
    by_sex[sex] = df
    # Fitting the data to the estimator and visualizing the survival curve
    km_sex = km.fit(df['time'], df['status'])
    km_sex.plot(label = 'Sex=%s' % sex)
    plt.xlabel('Time in months')
    plt.ylabel('Estimated probability of Survival')

In [None]:
by_sex
# by_sex['Female']
# by_sex['Male']

# Log-Rank Test <br>
Comparing survival distributions between two groups



In [None]:
logrank_test(by_sex['Male']['time'], by_sex['Female']['time'], by_sex['Male']['status'], by_sex['Female']['status'])

The p-value = 0.23. So, there is no significant difference between the survival distributions of the two groups

# Cox Proportional Hazards Model <br>

In [None]:
coxph = CoxPHFitter
sex_df = BrainCancer[['time', 'status', 'sex']]
# Design matrix for the Cox model using model specification class (MS) with sex as the only predictor
model_df = MS(['time', 'status', 'sex'],intercept=False).fit_transform(sex_df)
cox_fit = coxph().fit(model_df , 'time','status')
# cox_fit.summary[['coef', 'se(coef)', 'p']]
cox_fit.summary

In [None]:
# Null model (with no features)
cox_fit.log_likelihood_ratio_test()

- p-value from both the cox model(with one predictor, sex) and the null model suggests that adding the 'sex' predictor to the model did not lead to a significant increase in the model's ability to explain the observed survival outcomes

### Fitting a model that makes use of additional predictors

In [None]:
# cleaning the data
cleaned = BrainCancer.dropna()
# Design matrix
all_MS = MS(cleaned.columns , intercept=False)
all_df = all_MS.fit_transform(cleaned)
fit_all = coxph().fit(all_df ,
'time',
'status')
fit_all.summary
# fit_all.summary[['coef', 'se(coef)', 'p']]

### Survival curves for each diagnosis category
 - Partial dependence plots: Relationship between a predictor variable and the model's prediction while holding the values of other predictors constant(at their average values or commonly used values)
 -  Mode for categorical columns ('diagnosis') and the mean for numerical columns are calculated 

In [None]:
# Creating a list of all uique diagnosis
levels = cleaned['diagnosis'].unique()
def representative(series):
    if hasattr(series.dtype , 'categories'):
        return pd.Series.mode(series)
    else:
        return series.mean()
modal_data = cleaned.apply(representative , axis=0)

In [None]:
modal_data.shape

In [None]:
modal_df = pd.DataFrame([modal_data.iloc[0] for _ in range(len(levels))])
modal_df['diagnosis'] = levels
# modal_df

In [None]:
modal_X = all_MS.transform(modal_df)
modal_X.index = levels
# modal_X

In [None]:
predicted_survival = fit_all.predict_survival_function(modal_X)
predicted_survival.plot()
plt.xlabel('Time in months')
plt.ylabel('Estimated probability of Survival')

# predicted_survival

# Follow the link for the [Tutorial](https://drive.google.com/file/d/14QWqblU0LXq3ZM1_-VPBqFyFvso_UqHG/view?usp=drive_link) video