# **EVALUATION AND CONCLUSION**

## Objectives

* Evaluate the outcomes of the modelling phase.
* Summarise key findings from the exploratory analysis and predictive models.
* Reflect on the limitations of the analysis and opportunities for future improvement.
* Present the final interactive Tableau dashboard, guided by the original wireframe.

## Inputs

* Finalised cleaned datasets:  
  - `discovery_clean.csv`  
  - `validation_clean.csv`  
* Outputs from Notebook 3 (modelling results, feature importance plots).
* Dashboard wireframe (`images/clinical_survival_wireframe.png`).

## Outputs

* Consolidated summary of findings from Notebooks 1–3.
* Interpretation of modelling outcomes in context.
* Link to Tableau Public dashboard containing interactive visualisations.

## Additional Comments

* This notebook serves as a narrative “wrap-up” for the project, ensuring all work is presented in an accessible format for both technical and non-technical audiences.
* The wireframe created during the planning phase is used here as a reference to ensure that the final Tableau dashboard delivers the intended insights.
* The Tableau dashboard complements Python-based analysis by providing interactive and shareable visualisations.


---

# Change working directory

* We are assuming you will store the notebooks in a subfolder, therefore when running the notebook in the editor, you will need to change the working directory

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'c:\\Users\\petal\\Downloads\\CI-DBC\\vscode-projects\\clinical-survival-analysis\\jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'c:\\Users\\petal\\Downloads\\CI-DBC\\vscode-projects\\clinical-survival-analysis'

# Section 1

# Summary of Key Findings

**Exploratory Analysis (Notebook 2)**  
* Survival times varied widely across patients, with simplified staging (Early vs Advanced) revealing a clear trend toward shorter survival in advanced stages.  
* KRAS mutation status was simplified for interpretability. Patients with mutations showed trends toward poorer outcomes, although differences were not statistically significant in the validation cohort.  
* Weak correlations (< 0.20) among continuous variables indicated minimal multicollinearity issues, supporting their use in modelling.

**Hypothesis Testing (Notebook 3)**  
* ANOVA confirmed a significant difference in survival time between early and advanced stages (p = 0.0167).  
* Kaplan–Meier survival analysis suggested lower survival probabilities in KRAS-mutated patients, but the log-rank test (p = 0.0917) indicated this was not statistically significant.  
* These results suggest that stage remains a stronger determinant of survival than KRAS mutation status in this dataset.

**Predictive Modelling (Notebook 3)**  
* A full logistic regression model with multiple staging variables failed to converge due to multicollinearity and sparse data in advanced stages.  
* A reduced model including **Stage IIIA**, **KRAS wild-type status**, **Sex (Male)**, and **Age** provided interpretable results without convergence issues.  
* Stage IIIA emerged as the strongest predictor of mortality (OR ≈ 3.56, p = 0.0614), showing borderline significance.  
* KRAS wild-type status suggested a protective trend (OR ≈ 0.45) but was not statistically significant.  
* Sex had a wide confidence interval and was not statistically significant, indicating uncertainty about its effect. 
* Age had an odds ratio very close to 1 with a narrow confidence interval, suggesting little to no influence on mortality in this cohort.

**Overall Insights**  
* Disease stage is the most consistent predictor of mortality risk in this dataset.  
* KRAS mutation status shows clinically plausible trends but requires a larger cohort for robust statistical confirmation.  
* The limited sample size reduces statistical power, making findings exploratory rather than definitive.


---

# Section 2

# Limitations and Future Work

## Limitations
This project analyzed a relatively small dataset (125 patients total, with 95 in the validation cohort), which limits statistical power and the ability to detect smaller effects.
Key constraints included:
### Sample Size and Statistical Power
* Limited sample size increases the risk of overfitting and reduces the ability to detect subtle associations.
### Sparse Representation in Advanced Disease Stages
* Some staging categories were underrepresented, contributing to wide confidence intervals in model estimates.
### Data Source Transparency
* The Kaggle dataset did not specify whether it was based on real or simulated records, limiting certainty about its generalizability to real-world populations.
### Missing or Unknown Values
* Certain variables (e.g., mutation status) contained missing data, potentially biasing estimates.

All analyses and models should therefore be interpreted as exploratory and hypothesis-generating, rather than definitive clinical evidence.

## Future Work
To strengthen findings and extend the project:
### Validation in Larger Cohorts
* Replicate analyses in larger, multi-center datasets to improve statistical power and generalizability.
### Expanded Feature Set
* Include additional clinical, demographic, and genomic variables to refine predictive models.
### Advanced Modeling Techniques
* Explore penalized regression, random forests, or survival analysis methods (e.g., Cox proportional hazards) to improve robustness.
### Dashboard Enhancements
* Integrate interactive filtering and more granular drill-downs in Tableau for greater user engagement.

As a next step, these exploratory findings and predictive models are translated into an interactive Tableau dashboard, enabling stakeholders to explore the data, filter by key variables, and visually compare survival outcomes across patient subgroups.

---

# Section 3

# Tableau Dashboard

**Wireframe Reference**  
The planned design for the Tableau dashboard, shown below, served as the blueprint for the final interactive visualisations:  
![Clinical Survival Analysis Wireframe](images/clinical_survival_wireframe.png)  

**Live Dashboard Link**  
The interactive dashboard for this project is available on Tableau Public: [View Dashboard](INSERT-YOUR-LINK-HERE)

---

# Section 4

# Conclusion

* (Here you would conclude by tying the findings back to the original business requirements, briefly noting how the deliverables address them.)*

* You may add as many sections as you want, as long as it supports your project workflow.
* All notebook's cells should be run top-down (you can't create a dynamic wherein a given point you need to go back to a previous cell to execute some task, like go back to a previous cell and refresh a variable content)

---

* In cases where you don't need to push files to Repo, you may replace this section with "Conclusions and Next Steps" and state your conclusions and next steps.