In [None]:
!pip install lifelines


Collecting lifelines
  Downloading lifelines-0.30.0-py3-none-any.whl.metadata (3.2 kB)
Collecting autograd-gamma>=0.3 (from lifelines)
  Downloading autograd-gamma-0.5.0.tar.gz (4.0 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting formulaic>=0.2.2 (from lifelines)
  Downloading formulaic-1.0.2-py3-none-any.whl.metadata (6.8 kB)
Collecting interface-meta>=1.2.0 (from formulaic>=0.2.2->lifelines)
  Downloading interface_meta-1.3.0-py3-none-any.whl.metadata (6.7 kB)
Downloading lifelines-0.30.0-py3-none-any.whl (349 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m349.3/349.3 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading formulaic-1.0.2-py3-none-any.whl (94 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.5/94.5 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading interface_meta-1.3.0-py3-none-any.whl (14 kB)
Building wheels for collected packages: autograd-gamma
  Building wheel for autograd-gamma (setup

In [None]:
# prompt: I have installed lifline now load it

import lifelines
# now you can use lifelines
lifelines.__version__


'0.30.0'

In [None]:
import pandas as pd
from lifelines import CoxPHFitter

# Step 1: Create the dataset
data = {
    'Age': [50, 60, 70, 80, 55, 65, 75, 85, 60, 70],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'Survival_Time': [10, 12, 15, 18, 20, 22, 25, 28, 30, 35],
    'Event': [1, 1, 1, 1, 0, 0, 1, 0, 0, 1]
}

df = pd.DataFrame(data)


In [None]:
df

Unnamed: 0,Age,Gender,Survival_Time,Event
0,50,Male,10,1
1,60,Female,12,1
2,70,Male,15,1
3,80,Female,18,1
4,55,Male,20,0
5,65,Female,22,0
6,75,Male,25,1
7,85,Female,28,0
8,60,Male,30,0
9,70,Female,35,1


In [None]:
# Step 2: Convert categorical variables (Gender) to numerical
df['Gender'] = df['Gender'].apply(lambda x: 1 if x == 'Male' else 0)

In [None]:
# Step 3: Fit the Cox PH Model
cox_model = CoxPHFitter()
cox_model.fit(df, duration_col='Survival_Time', event_col='Event')



<lifelines.CoxPHFitter: fitted with 10 total observations, 4 right-censored observations>

In [None]:
# Step 4: Display the summary of the model
cox_model.print_summary()

0,1
model,lifelines.CoxPHFitter
duration col,'Survival_Time'
event col,'Event'
baseline estimation,breslow
number of observations,10
number of events observed,6
partial log-likelihood,-9.68
time fit was run,2024-11-30 09:22:34 UTC

Unnamed: 0,coef,exp(coef),se(coef),coef lower 95%,coef upper 95%,exp(coef) lower 95%,exp(coef) upper 95%,cmp to,z,p,-log2(p)
Age,-0.02,0.98,0.05,-0.13,0.09,0.88,1.09,0.0,-0.39,0.7,0.52
Gender,0.3,1.35,1.06,-1.77,2.37,0.17,10.74,0.0,0.29,0.77,0.37

0,1
Concordance,0.67
Partial AIC,23.35
log-likelihood ratio test,0.47 on 2 df
-log2(p) of ll-ratio test,0.34


1. **General Information**
   - **Model**: `lifelines.CoxPHFitter` – Indicates the use of the Cox Proportional Hazards model from the Lifelines library.
   - **Duration Col**: `Survival_Time` – The column representing survival times (in months).
   - **Event Col**: `Event` – The column indicating whether the event occurred (1 = event occurred, 0 = censored).
   - **Baseline Estimation**: `breslow` – A method used to estimate the baseline survival function.

2. **Data Summary**
   - **Number of Observations**: 10 – The total number of individuals in the dataset.
   - **Number of Events Observed**: 6 – Out of 10 individuals, 6 experienced the event (death), and 4 were censored (still alive at the end of observation).


3. **Model Fit Statistics**
   - **Partial Log-Likelihood**: -9.68 – Measures how well the model fits the data. Larger (less negative) values indicate a better fit.
   - **Concordance**: 0.67 – Indicates the model’s predictive accuracy:
     - A concordance of 0.67 means that in 67% of cases, the model correctly predicts which individual will experience the event earlier.


### Partial Log-Likelihood(always negative)

**Definition:**
The partial log-likelihood measures how well the model fits the observed data based on the predictors while considering censoring. It focuses on comparing observed survival times against the predicted hazards.

**Scale:**
- The partial log-likelihood is negative because it is the log of probabilities, which are less than 1.
- Larger values (closer to zero) indicate a better model fit. However, the scale is dataset-dependent:
  - A larger dataset generally results in a more negative value.
  - For small datasets, values might range between -10 and -50.
  - For large datasets, it could go into the hundreds or thousands (e.g., -1000).

**How to Use:**
Compare the partial log-likelihood across models. The higher (less negative) the value, the better the model explains the data.

---

### Concordance Index (C-index)

**Definition:**
A measure of predictive accuracy, indicating how well the model predicts the order of survival times.

**Scale:**
- Ranges from 0.5 (random prediction) to 1.0 (perfect prediction).
  - 0.6-0.7: Acceptable predictive accuracy.
  - 0.7-0.8: Good predictive accuracy.
  - >0.8: Excellent predictive accuracy.

**How to Use:**
Higher values indicate better discriminatory ability of the model.



### Model Fit Tests

- **Partial AIC**: 23.35 – A measure of model quality. Lower values indicate a better fit while balancing complexity.
- **Log-Likelihood Ratio Test**: 0.47 on 2 df – Tests whether the predictors improve the model compared to a null model (no predictors). A p-value of 0.34 suggests the predictors are not statistically significant overall.





### Akaike Information Criterion (AIC)

**What is it?**
AIC is a measure used to evaluate the quality of a statistical model while penalizing it for complexity (number of predictors). It balances goodness-of-fit and parsimony to avoid overfitting.

**Formula:**


[  {AIC} = -2 {Log-Likelihood}) + 2 {Number of Parameters}) ]



**Interpretation:**
- Lower AIC values indicate a better trade-off between model fit and complexity.
- AIC is relative; it is used to compare multiple models rather than provide an absolute measure.

**Rule of thumb:**
- If two models differ by less than 2 in AIC, their performance is considered similar.
- A difference of 2 or more suggests the model with the lower AIC is significantly better.

**Practical Example:**
- If you build two models:
  - **Model A**: AIC = 120
  - **Model B**: AIC = 130
  - **Model A** is the preferred model.


### Log-Likelihood Ratio Test

**What is it?**
The log-likelihood ratio test (also known as the likelihood ratio test) evaluates whether including predictors significantly improves the model compared to a null model (a model with no predictors).

**Formula:**
The test statistic is:


[
 {Likelihood Ratio Test (LRT)} = -2 {Log-Likelihood of Null Model}  ]



**Degrees of Freedom (df):**
The degrees of freedom correspond to the difference in the number of predictors between the null and full models.

**P-value:**
The test statistic is compared to a chi-squared distribution to determine the p-value.
- If \( p < 0.05 \), the predictors in the model significantly improve the fit.

**Interpretation:**
- A small p-value (\( p < 0.05 \)) indicates the predictors add value to the model.
- A large p-value (\( p > 0.05 \)) suggests the model does not significantly improve over the null model.




# Coefficeint Tables


1. **Coefficients (\( \beta β \))**
   - **Definition:** The coef column shows the logarithmic effect of the predictor on the hazard rate.
   - **Values:**
     - **Age:** \( \beta = -0.02 \)
       - A 1-year increase in Age slightly decreases the hazard rate by 2%.
     - **Gender:** \( \beta = 0.30 \)
       - Being Male (Gender = 1) increases the hazard rate by approximately 30% compared to Female (Gender = 0).

2. **Hazard Ratios (\( \exp(\beta) \))**
   - **Definition:** The exp(coef) column shows the Hazard Ratios (HR), which are easier to interpret than \( \beta \).
   - **Values:**
     - **Age:** \( HR = 0.98 \)
       - A hazard ratio less than 1 indicates that Age is protective (decreases the risk). Specifically:
       - For every 1-year increase in Age, the hazard decreases by \( (1 - 0.98) \times 100 = 2\% \).
     - **Gender:** \( HR = 1.35 \)
       - A hazard ratio greater than 1 means being Male increases the risk of the event. Specifically:
       - Males have a 35% higher risk compared to females.

3. **Standard Error of Coefficients (\( \text{se(coef)} \))**
   - **Definition:** The se(coef) column shows the standard error of the coefficient estimate. Smaller values indicate more precise estimates.
   - **Values:**
     - **Age:** \( se = 0.05 \) – The estimate for Age is relatively precise.
     - **Gender:** \( se = 1.06 \) – The estimate for Gender is less precise (higher uncertainty).

4. **Confidence Intervals**
   - **For Coefficients (\( \beta \))**
     - The coef lower 95% and coef upper 95% columns show the 95% confidence interval for \( \beta \).
     - **Values:**
       - **Age:** CI = \([ -0.13, 0.09 ]\)
         - Since the interval includes 0, Age is not statistically significant.
       - **Gender:** CI = \([ -1.77, 2.37 ]\)
         - Since the interval includes 0, Gender is also not statistically significant.
   - **For Hazard Ratios (\( e^\beta \))**
     - The exp(coef) lower 95% and exp(coef) upper 95% columns show the 95% confidence interval for the hazard ratio.
     - **Values:**
       - **Age:** CI = \([ 0.88, 1.09 ]\)
         - Since the interval includes 1, Age is not statistically significant.
       - **Gender:** CI = \([ 0.17, 10.74 ]\)
         - Since the interval includes 1, Gender is also not statistically significant.

5. **Z-Statistic**
   - **Definition:** The z column shows the z-statistic for testing whether \( \beta = 0 \) (no effect of the predictor).
   - **Values:**
     - **Age:** \( z = -0.39 \) – The effect of Age is small and not significant.
     - **Gender:** \( z = 0.29 \) – The effect of Gender is small and not significant.

6. **P-Value**
   - **Definition:** The p column shows the p-value for testing whether the coefficient is significantly different from 0.
   - **Threshold:** A p-value less than 0.05 indicates statistical significance.
   - **Values:**
     - **Age:** \( p = 0.70 \) – Age is not statistically significant.
     - **Gender:** \( p = 0.77 \) – Gender is not statistically significant.

---

### Key Observations from Your Results

- **Age:**
  - **Hazard Ratio (\( HR = 0.98 \))**: A slight decrease in hazard per year of Age.
  - **P-value (\( p = 0.70 \))**: Not statistically significant.
  
- **Gender:**
  - **Hazard Ratio (\( HR = 1.35 \))**: Males have a 35% higher hazard compared to females.
  - **P-value (\( p = 0.77 \))**: Not statistically significant.

**Model Implications:**
- Neither Age nor Gender significantly affects the hazard (risk of the event occurring) based on this dataset.
