# Survival Analysis And Cox Regression Demonstration In Python

## Background

The bank possesses demographic and transactional data of its loan customers. If the bank has a robust model to predict defaulters it can undertake better resource allocation. 

- Sample size is 700
- Age group, Years at current address, Years at current employer, Debt to Income Ratio, Credit Card Debts, Other Debts are the independent variables
- Status and Time are used to create survival objects. Status =1 if customer defaulted before 36 months, and 0 if no default was observed in 36 months



## Objective
To predict whether the customer applying for the loan will be a defaulter and to identify early defaulters.


## Data Description


| Columns | Description | Type |
|---------|-------------|------|
| AGE | Age Groups 1 (<28 years), 2(28-40 years), 3 (â‰¥40 years) | Factor |
| EMPLOY | No. of Years the Customer is Employed | Numerical |
| ADDRESS | No. of Years the Customer is Staying at their Current Address | Numerical |
| DEBTINC | Debt to Income Ratio | Numerical |
| CREDDEBT | Credit to Debt Ratio | Numerical |
| OTHERDEBT | Other Debt | Numerical |
| STATUS | Whether the Customer Defaulted on the Loan (1) or 0 (Censored at 36 Months) | Binary |
| TIME | Indicates Time of 'Default' | Numerical |

### Import Libraries

In [9]:
import pandas as pd
from lifelines import CoxPHFitter

### Load Data

In [10]:
bankloan = pd.read_csv('BANK LOAN (COX).csv')
bankloan.head()

Unnamed: 0,SN,AGE,EMPLOY,ADDRESS,DEBTINC,CREDDEBT,OTHDEBT,STATUS,TIME
0,1,3,17,12,9.3,11.36,5.01,1,12.0
1,2,1,10,6,17.3,1.36,4.0,0,36.0
2,3,2,15,14,5.5,0.86,2.17,0,36.0
3,4,3,15,14,2.9,2.66,0.82,0,36.0
4,5,1,2,0,17.3,1.79,3.06,1,14.0


In [11]:
# Convert AGE to categorical (factor equivalent)
bankloan['AGE'] = bankloan['AGE'].astype('category')

### Build Model

In [12]:
#Fit Cox Proportional Hazards Model
cph = CoxPHFitter()

cph.fit(
    bankloan,
    duration_col='TIME',
    event_col='STATUS',
    formula="AGE + EMPLOY + ADDRESS + DEBTINC + CREDDEBT + OTHDEBT"
)

# Display model summary
cph.summary


Unnamed: 0_level_0,coef,exp(coef),se(coef),coef lower 95%,coef upper 95%,exp(coef) lower 95%,exp(coef) upper 95%,cmp to,z,p,-log2(p)
covariate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
AGE[T.2],0.306683,1.35891,0.187009,-0.059848,0.673213,0.941908,1.960527,0.0,1.639936,0.1010184,3.30731
AGE[T.3],0.540059,1.716108,0.252929,0.044326,1.035791,1.045323,2.817334,0.0,2.135215,0.03274342,4.932651
EMPLOY,-0.241766,0.78524,0.022379,-0.285628,-0.197904,0.751542,0.820448,0.0,-10.803267,3.321714e-27,87.960131
ADDRESS,-0.098246,0.906426,0.016344,-0.13028,-0.066212,0.87785,0.935932,0.0,-6.011143,1.842203e-09,29.015921
DEBTINC,0.05859,1.06034,0.013084,0.032946,0.084234,1.033495,1.087883,0.0,4.478039,7.533186e-06,17.018308
CREDDEBT,0.584825,1.794677,0.050202,0.486431,0.683219,1.626501,1.980242,0.0,11.649468,2.30896e-31,101.772528
OTHDEBT,0.064652,1.066788,0.031661,0.002598,0.126706,1.002602,1.135083,0.0,2.042026,0.04114895,4.603001


### Predict Probabilities

In [13]:
bankloantest = pd.read_csv('BANK LOAN (COX) TEST.csv')

#Convert Age to Category
bankloantest['AGE'] = bankloantest['AGE'].astype('category')
bankloantest.head()

Unnamed: 0,SN,AGE,EMPLOY,ADDRESS,DEBTINC,CREDDEBT,OTHDEBT
0,701,3,17,12,9.4,11.38,5.01
1,702,2,10,6,17.3,1.36,4.0
2,703,3,15,13,5.5,0.86,2.17
3,704,2,15,14,2.9,2.66,0.82
4,705,1,2,0,17.6,1.79,3.06


In [14]:
bankloantest['prob24'] = cph.predict_survival_function(bankloantest, times=[24]).T[24]
bankloantest.head()

Unnamed: 0,SN,AGE,EMPLOY,ADDRESS,DEBTINC,CREDDEBT,OTHDEBT,prob24
0,701,3,17,12,9.4,11.38,5.01,0.126708
1,702,2,10,6,17.3,1.36,4.0,0.934268
2,703,3,15,13,5.5,0.86,2.17,0.995728
3,704,2,15,14,2.9,2.66,0.82,0.993096
4,705,1,2,0,17.6,1.79,3.06,0.463657
