# Pancreatic Cancer Survival Prediction

Introduction and Background

Pancreatic cancer is one of the most lethal malignancies worldwide, with a five-year survival rate of less than 10% due to its late diagnosis and aggressive progression. Accurate prediction of patient survival at the time of diagnosis remains a critical challenge in oncology, limiting clinicians’ ability to personalize treatment and allocate resources effectively. With the increasing availability of clinical and genomic data, there is a growing opportunity to leverage machine learning to estimate patient outcomes and support evidence-based decision-making. This project aims to develop a predictive model that estimates overall survival in pancreatic cancer patients using real-world clinical and molecular features. By identifying high-risk individuals early, this tool can support treatment planning, improve patient counseling, and guide the selection of candidates for advanced therapies or clinical trials.


Problem Statement 

Pancreatic cancer has one of the poorest prognoses among all malignancies, largely due to its late detection and limited treatment responsiveness. Clinicians often lack reliable tools to predict patient survival at the time of diagnosis, making it difficult to personalize care or prioritize treatment pathways. Despite the availability of rich clinical and genomic data, survival estimation still relies heavily on generic staging systems that fail to capture patient-specific risk factors. This project seeks to address this gap by developing a machine learning model that accurately predicts overall survival outcomes in pancreatic cancer patients, enabling data-driven support for clinical decision-making and patient management.


Project Objectives

1.	Filter and isolate pancreatic cancer patient data from a larger clinical-genomic dataset to create a focused, high-quality subset for analysis.
2.	Preprocess and engineer features from clinical, pathological, and genomic variables such as tumor mutational burden (TMB), tumor purity, disease stage, and demographic data.
3.	Build a predictive machine learning model to estimate the overall survival status (alive vs. deceased) of pancreatic cancer patients at the time of sample collection or diagnosis.
4.	Evaluate model performance using classification metrics such as accuracy, precision, recall, F1-score, and ROC-AUC to ensure clinical relevance and reliability.
5.	Interpret model outputs to identify the most influential features contributing to survival predictions, thereby providing insights into potential prognostic biomarkers.
6.	Demonstrate potential clinical utility by outlining how the model could support risk stratification, personalized treatment planning, and early patient counseling in real-world oncology settings.


## 1.	Data Acquisition & Filtering