![cancer_care_timeline.png](attachment:cancer_care_timeline.png)

# Cancer Care Timeline

## Mapping the patients cancer care treatment Journey

## Solution Overview: 

**Survival time analysis**, also known as survival analysis or time-to-event analysis, is a statistical method used to analyze the time until an event of interest occurs, such as death, disease recurrence, or treatment failure. In healthcare, survival time analysis is a valuable tool for understanding the prognosis of patients, evaluating the effectiveness of treatments, and identifying factors that influence patient outcomes. By accounting for censored data and modeling time-dependent variables, survival time analysis provides insights into the natural history of diseases, aiding clinicians in making informed decisions about patient care and treatment strategies.

##  Steps Involved in Survival Time Analysis:

1. **Data Collection**: Gather relevant data on cancer patients, including demographic information, disease characteristics, treatment modalities, and survival outcomes. Ensure that the data include time-to-event information, such as time of diagnosis, time of treatment initiation, and time of event occurrence (e.g., death, disease recurrence).
2. **Data Preprocessing**: Clean and preprocess the data, addressing missing values, outliers, and inconsistencies. Convert time-to-event data into a suitable format for analysis, such as survival time or time intervals. Identify and handle censored observations, where the event of interest has not yet occurred by the end of the study period.
3. **Survival Analysis**: Perform survival analysis techniques to estimate survival probabilities and assess factors influencing survival outcomes. This includes:
4. **Kaplan-Meier Curve**: Calculate and plot the Kaplan-Meier survival curve, which estimates the probability of survival over time for all patients or specific subgroups based on covariates.
5. **Log-Rank Test**: Conduct the log-rank test to compare survival curves between different groups (e.g., treatment groups, disease stages) and determine if there are statistically significant differences in survival times.
6. **Cox Proportional Hazards Regression**: Fit a Cox proportional hazards regression model to identify prognostic factors associated with survival outcomes while accounting for censoring. This allows for the estimation of hazard ratios and assessment of the effects of covariates on survival probability.
7. **Model Evaluation**: Evaluate the performance and validity of the survival analysis model, checking assumptions such as proportional hazards and model fit. Assess the robustness of results through sensitivity analyses and cross-validation techniques.
8. **Interpretation and Reporting**: Interpret the findings of the survival analysis, including the impact of covariates on survival outcomes and any significant differences observed between groups. Report results in a clear and concise manner, emphasizing clinical implications and recommendations for patient care and future research.

#### Ideation and Model Selection by Carolynne Bukenya

#### by Joe Eberle started on 05-012024 - https://github.com/JoeEberle/ - josepheberle@outlook.com

## Cancer Treatment Timelines:
- T1 - **De novo Cancer Diagnosis**  - The initial diagnosis of cancer in a patient, indicating the presence of malignant cells. 
- T2 - **Time from T1 to Pathology**  - The duration between the de novo cancer diagnosis (T1) and obtaining pathology results from tissue samples, confirming the cancer type and stage. 
- T3 - **Time from T1 to Biopsy**  - The interval between the de novo cancer diagnosis (T1) and the performance of a biopsy procedure to obtain tissue samples for diagnostic evaluation. 
- T4 - **Time from T1 to Cancer Screening**  - The period between the de novo cancer diagnosis (T1) and subsequent screening tests or imaging studies to evaluate cancer progression or recurrence. 
- T5 - **Time from T1 to Genomic Assay**  - The time elapsed between the de novo cancer diagnosis (T1) and conducting genomic assays or molecular testing to assess genetic alterations or biomarkers associated with the cancer. 
- T6 - **Time from T1 to Oncology Referral**  - The duration between the de novo cancer diagnosis (T1) and referral to an oncology specialist for further evaluation and treatment planning. 
- T7 - **Time from T1 to Care Plan**  - The period between the de novo cancer diagnosis (T1) and the development of a comprehensive care plan outlining the recommended treatment approach. 
- T8 - **Time from T1 to First Line Modality Treatment**  - The interval between the de novo cancer diagnosis (T1) and the initiation of the primary treatment modality, such as surgery, chemotherapy, or radiation therapy. 
- T9 - **Time from T1 to First Therapeutic Response**  - The time elapsed between the de novo cancer diagnosis (T1) and the initial therapeutic response, indicated by tumor shrinkage or reduction in malignancy. 
- T10 - **Time from T1 to Second Modality Treatment**  - The duration between the de novo cancer diagnosis (T1) and the initiation of additional treatment modalities, such as adjuvant therapy or targeted therapy, following the first-line treatment. 
- T11 - **Time from T1 to First Therapeutic Switch**  - The interval between the de novo cancer diagnosis (T1) and the decision to switch treatment modalities due to non-response or disease progression. 
- T12 - **Time from T1 to Remission**  - The period between the de novo cancer diagnosis (T1) and achieving remission, defined as the absence of detectable cancer cells for at least three months. 
- T13 - **Time from T1 to Cure**  - The duration between the de novo cancer diagnosis (T1) and achieving a state of cure, characterized by being cancer-free or having no evidence of cancer. 
- T14 - **Time from T1 to Death**  - The interval between the de novo cancer diagnosis (T1) and the occurrence of mortality, indicating the end of the patient's life due to cancer-related causes. 
- T15 - **Time from T1 to Palliation / Hospice**  - The time elapsed between the de novo cancer diagnosis (T1) and transitioning to palliative care or hospice services for end-of-life management. 
- T16 - **Time from T1 to Active Surveillance / Maintenance**  - The period between the de novo cancer diagnosis (T1) and the initiation of active surveillance or maintenance therapy to monitor disease progression and prevent recurrence. 
- T17 - **Time from T1 to Neo - Adjuvant Treatment**  - The duration between the de novo cancer diagnosis (T1) and the administration of neoadjuvant or adjuvant therapy before or after primary treatment to improve outcomes. 
- T18 - **Time from T1 to Adjuvant Treatment**  - The interval between the de novo cancer diagnosis (T1) and the initiation of adjuvant therapy following primary treatment to eradicate residual disease and reduce the risk of recurrence. 
- T19 - **Time from T1 to Inpatient Hospitalization**  - The time elapsed between the de novo cancer diagnosis (T1) and hospital admission for inpatient care, such as surgery, chemotherapy, or complications management. 
- T20 - **Time from T1 to ED / Urgent Care Event**  - The period between the de novo cancer diagnosis (T1) and seeking emergency department or urgent care services for acute medical issues related to cancer or its treatment. 


In [35]:
first_install = False 
if first_install:
    !pip install lifelines

In [36]:
import os, json 
import schedule
from datetime import datetime
from lifelines import KaplanMeierFitter
from lifelines import CoxPHFitter
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np 
import quick_logger as ql
import talking_code as tc 
import file_manager as fm 
import time
print(f"Libraries Imported succesfully on {datetime.now().date()} at {datetime.now().time()}") 

Libraries Imported succesfully on 2024-05-01 at 17:20:00.205567


## Optional Step 0 - Intitiate Configuration Settings and name the overall solution

In [3]:
import configparser 
config = configparser.ConfigParser()
cfg = config.read('config.ini')  

solution_name = 'cancer_care_timeline'

## Optional Step 0 - Intitiate Logging and debugging 

In [4]:
# Establish the Python Logger  
import logging # built in python library that does not need to be installed 
import quick_logger as ql

global start_stime 
start_time = ql.set_start_time()
logging = ql.create_logger_start(solution_name, start_time) 
ql.set_speaking_log(False)
ql.set_speaking_steps(False)
ql.pvlog('info',f'Process {solution_name} Step 0 - Initializing and starting Logging Process.') 

Process solution_temple Step 0 - Initializing and starting Logging Process.


In [28]:
def create_dataframe_markdown(df, term_column_number = 0, definition_column_number = 1, title=None):
    mark_down_text = ""
    if len(title) >= 3:
        mark_down_text += f"## {title}\n"  
    for index, row in df.iterrows():
        mark_down_text +=  f"- {row[df.columns[term_column_number]]} - **{row[df.columns[definition_column_number]]}**  - {row[df.columns[2]]} \n"
    return  mark_down_text

In [26]:
cancer_treatment_timelines = [
    {
      "timing_code": "T1",
      "timing_event": "De novo Cancer Diagnosis",
      "description": "The initial diagnosis of cancer in a patient, indicating the presence of malignant cells."
    },
    {
      "timing_code": "T2",
      "timing_event": "Time from T1 to Pathology",
      "description": "The duration between the de novo cancer diagnosis (T1) and obtaining pathology results from tissue samples, confirming the cancer type and stage."
    },
    {
      "timing_code": "T3",
      "timing_event": "Time from T1 to Biopsy",
      "description": "The interval between the de novo cancer diagnosis (T1) and the performance of a biopsy procedure to obtain tissue samples for diagnostic evaluation."
    },
    {
      "timing_code": "T4",
      "timing_event": "Time from T1 to Cancer Screening",
      "description": "The period between the de novo cancer diagnosis (T1) and subsequent screening tests or imaging studies to evaluate cancer progression or recurrence."
    },
    {
      "timing_code": "T5",
      "timing_event": "Time from T1 to Genomic Assay",
      "description": "The time elapsed between the de novo cancer diagnosis (T1) and conducting genomic assays or molecular testing to assess genetic alterations or biomarkers associated with the cancer."
    },
    {
      "timing_code": "T6",
      "timing_event": "Time from T1 to Oncology Referral",
      "description": "The duration between the de novo cancer diagnosis (T1) and referral to an oncology specialist for further evaluation and treatment planning."
    },
    {
      "timing_code": "T7",
      "timing_event": "Time from T1 to Care Plan",
      "description": "The period between the de novo cancer diagnosis (T1) and the development of a comprehensive care plan outlining the recommended treatment approach."
    },
    {
      "timing_code": "T8",
      "timing_event": "Time from T1 to First Line Modality Treatment",
      "description": "The interval between the de novo cancer diagnosis (T1) and the initiation of the primary treatment modality, such as surgery, chemotherapy, or radiation therapy."
    },
    {
      "timing_code": "T9",
      "timing_event": "Time from T1 to First Therapeutic Response",
      "description": "The time elapsed between the de novo cancer diagnosis (T1) and the initial therapeutic response, indicated by tumor shrinkage or reduction in malignancy."
    },
    {
      "timing_code": "T10",
      "timing_event": "Time from T1 to Second Modality Treatment",
      "description": "The duration between the de novo cancer diagnosis (T1) and the initiation of additional treatment modalities, such as adjuvant therapy or targeted therapy, following the first-line treatment."
    },
    {
      "timing_code": "T11",
      "timing_event": "Time from T1 to First Therapeutic Switch",
      "description": "The interval between the de novo cancer diagnosis (T1) and the decision to switch treatment modalities due to non-response or disease progression."
    },
    {
      "timing_code": "T12",
      "timing_event": "Time from T1 to Remission",
      "description": "The period between the de novo cancer diagnosis (T1) and achieving remission, defined as the absence of detectable cancer cells for at least three months."
    },
    {
      "timing_code": "T13",
      "timing_event": "Time from T1 to Cure",
      "description": "The duration between the de novo cancer diagnosis (T1) and achieving a state of cure, characterized by being cancer-free or having no evidence of cancer."
    },
    {
      "timing_code": "T14",
      "timing_event": "Time from T1 to Death",
      "description": "The interval between the de novo cancer diagnosis (T1) and the occurrence of mortality, indicating the end of the patient's life due to cancer-related causes."
    },
    {
      "timing_code": "T15",
      "timing_event": "Time from T1 to Palliation / Hospice",
      "description": "The time elapsed between the de novo cancer diagnosis (T1) and transitioning to palliative care or hospice services for end-of-life management."
    },
    {
      "timing_code": "T16",
      "timing_event": "Time from T1 to Active Surveillance / Maintenance",
      "description": "The period between the de novo cancer diagnosis (T1) and the initiation of active surveillance or maintenance therapy to monitor disease progression and prevent recurrence."
    },
    {
      "timing_code": "T17",
      "timing_event": "Time from T1 to Neo - Adjuvant Treatment",
      "description": "The duration between the de novo cancer diagnosis (T1) and the administration of neoadjuvant or adjuvant therapy before or after primary treatment to improve outcomes."
    },
    {
      "timing_code": "T18",
      "timing_event": "Time from T1 to Adjuvant Treatment",
      "description": "The interval between the de novo cancer diagnosis (T1) and the initiation of adjuvant therapy following primary treatment to eradicate residual disease and reduce the risk of recurrence."
    },
    {
      "timing_code": "T19",
      "timing_event": "Time from T1 to Inpatient Hospitalization",
      "description": "The time elapsed between the de novo cancer diagnosis (T1) and hospital admission for inpatient care, such as surgery, chemotherapy, or complications management."
    },
    {
      "timing_code": "T20",
      "timing_event": "Time from T1 to ED / Urgent Care Event",
      "description": "The period between the de novo cancer diagnosis (T1) and seeking emergency department or urgent care services for acute medical issues related to cancer or its treatment."
    }
  ]
 

In [29]:
# Save the JSON variable to a file
with open("cancer_treatment_timelines.json", "w") as file:
    json.dump(cancer_treatment_timelines, file)

# # Load the JSON file into a pandas DataFrame
with open("cancer_treatment_timelines.json", "r") as file:
    df = pd.DataFrame(json.load(file))
df.to_excel("cancer_treatment_timelines.xlsx") 
df_cancer_treatment_timelines = df
print(create_dataframe_markdown(df, 0, 1, "Cancer Treatment Timelines:"))

## Cancer Treatment Timelines:
- T1 - **De novo Cancer Diagnosis**  - The initial diagnosis of cancer in a patient, indicating the presence of malignant cells. 
- T2 - **Time from T1 to Pathology**  - The duration between the de novo cancer diagnosis (T1) and obtaining pathology results from tissue samples, confirming the cancer type and stage. 
- T3 - **Time from T1 to Biopsy**  - The interval between the de novo cancer diagnosis (T1) and the performance of a biopsy procedure to obtain tissue samples for diagnostic evaluation. 
- T4 - **Time from T1 to Cancer Screening**  - The period between the de novo cancer diagnosis (T1) and subsequent screening tests or imaging studies to evaluate cancer progression or recurrence. 
- T5 - **Time from T1 to Genomic Assay**  - The time elapsed between the de novo cancer diagnosis (T1) and conducting genomic assays or molecular testing to assess genetic alterations or biomarkers associated with the cancer. 
- T6 - **Time from T1 to Oncology Referra

In [21]:
definition = '''
## Solution Overview: 

**Survival time analysis**, also known as survival analysis or time-to-event analysis, is a statistical method used to analyze the time until an event of interest occurs, such as death, disease recurrence, or treatment failure. In healthcare, survival time analysis is a valuable tool for understanding the prognosis of patients, evaluating the effectiveness of treatments, and identifying factors that influence patient outcomes. By accounting for censored data and modeling time-dependent variables, survival time analysis provides insights into the natural history of diseases, aiding clinicians in making informed decisions about patient care and treatment strategies.
''' 
# Write the solution defitions out to the solution_description.md file
file_name = "solution_description.md"
with open(file_name, 'w') as f:
    # Write the template to the readme.md file
     f.write(definition)

talking_code = False
if talking_code:
    tc.print_say(definition) 
else:
    print(definition)    


## Solution Overview: 

**Survival time analysis**, also known as survival analysis or time-to-event analysis, is a statistical method used to analyze the time until an event of interest occurs, such as death, disease recurrence, or treatment failure. In healthcare, survival time analysis is a valuable tool for understanding the prognosis of patients, evaluating the effectiveness of treatments, and identifying factors that influence patient outcomes. By accounting for censored data and modeling time-dependent variables, survival time analysis provides insights into the natural history of diseases, aiding clinicians in making informed decisions about patient care and treatment strategies.



In [31]:
setps_for_survival_analysis = [
    {
      "step": "Data Collection",
      "description": "Gather relevant data on cancer patients, including demographic information, disease characteristics, treatment modalities, and survival outcomes. Ensure that the data include time-to-event information, such as time of diagnosis, time of treatment initiation, and time of event occurrence (e.g., death, disease recurrence)."
    },
    {
      "step": "Data Preprocessing",
      "description": "Clean and preprocess the data, addressing missing values, outliers, and inconsistencies. Convert time-to-event data into a suitable format for analysis, such as survival time or time intervals. Identify and handle censored observations, where the event of interest has not yet occurred by the end of the study period."
    },
    {
      "step": "Survival Analysis",
      "description": "Perform survival analysis techniques to estimate survival probabilities and assess factors influencing survival outcomes, including Kaplan-Meier Curve, Log-Rank Test, and Cox Proportional Hazards Regression."
    },
    {
      "step": "Kaplan-Meier Curve",
      "description": "Calculate and plot the Kaplan-Meier survival curve, which estimates the probability of survival over time for all patients or specific subgroups based on covariates."
    },
    {
      "step": "Log-Rank Test",
      "description": "Conduct the log-rank test to compare survival curves between different groups (e.g., treatment groups, disease stages) and determine if there are statistically significant differences in survival times."
    },
    {
      "step": "Cox Proportional Hazards Regression",
      "description": "Fit a Cox proportional hazards regression model to identify prognostic factors associated with survival outcomes while accounting for censoring. This allows for the estimation of hazard ratios and assessment of the effects of covariates on survival probability."
    },
    {
      "step": "Model Evaluation",
      "description": "Evaluate the performance and validity of the survival analysis model, checking assumptions such as proportional hazards and model fit. Assess the robustness of results through sensitivity analyses and cross-validation techniques."
    },
    {
      "step": "Interpretation and Reporting",
      "description": "Interpret the findings of the survival analysis, including the impact of covariates on survival outcomes and any significant differences observed between groups. Report results in a clear and concise manner, emphasizing clinical implications and recommendations for patient care and future research."
    }
  ]

In [37]:
sample_file_name = 'survey lung cancer.csv'
df=pd.read_csv('survey lung cancer.csv')   # Kaggle data sets need to be localized 
print(f"The sample data file is '{sample_file_name}' ")
print(f"The sample data set contains {df.shape[0]} rows and {df.shape[1]} columns")
print(f"The sample data set contains {df.duplicated().sum()} duplicate rows, leaving {df.shape[0]-df.duplicated().sum()} unique rows" )
ql.pvlog('info',f'Process {solution_name} Step 1  - Read in Sample Data Set .') 

df

FileNotFoundError: [Errno 2] No such file or directory: 'survey lung cancer.csv'

## Step 0 - Process End - display log

In [5]:
# Calculate and classify the process performance 
status = ql.calculate_process_performance(solution_name, start_time) 
print(ql.append_log_file(solution_name))  

2024-03-15 10:39:07,381 - INFO - START solution_temple Start Time = 2024-03-15 10:39:07
2024-03-15 10:39:07,381 - INFO - solution_temple Step 0 - Initialize the configuration file parser
2024-03-15 10:39:07,382 - INFO - Process solution_temple Step 0 - Initializing and starting Logging Process.
2024-03-15 10:39:07,391 - INFO - PERFORMANCE solution_temple The total process duration was:0.01
2024-03-15 10:39:07,391 - INFO - PERFORMANCE solution_temple Stop Time = 2024-03-15 10:39:07
2024-03-15 10:39:07,391 - INFO - PERFORMANCE solution_temple Short process duration less than 3 Seconds:0.01
2024-03-15 10:39:07,391 - INFO - PERFORMANCE solution_temple Performance optimization is not reccomended



#### https://github.com/JoeEberle/ -- josepheberle@outlook.com