Skip to content

Predictive Modelling of Pathological Complete Response Classification and Relapse-Free Survival Regression in Cancer Patients

Notifications You must be signed in to change notification settings

chboey/PathoPredict

Repository files navigation

Predictive Modelling of Pathological Complete Response and Relapse-Free Survival in Cancer Patients

Overview

This repository contains the implementation of a machine learning project aimed at enhancing the accuracy of predicting Pathological Complete Response (PCR) and Relapse-Free Survival (RFS) in chemotherapy-treated breast cancer patients. The project analyzes a dataset with clinical and MRI features across 400 patients to provide additional prognosis information for personalized treatment recommendations.

Introduction

This project aims to improve the predictions of PCR and RFS in breast cancer patients using various machine learning techniques. Accurate predictions can guide treatment decisions and provide better prognostic information.

Dataset

The dataset used in this project is publicly available from The American College of Radiology Imaging Network.

Feature Type Number of Features
Clinical 10
MRI-based 107
Total Records 400

Methods

Data Preprocessing

  • Handling Missing Values: Rows with missing values denoted by '999' were removed. Median imputation was used for remaining missing values.
  • Feature Selection: ANOVA was used to assess the importance of each feature. Features with an f-score above 3.5 were selected.

Model Selection

Model Type Model
Classification Logistic Regression
AdaBoost Classifier
Support Vector Machine (SVM)
Voting Classifier
Model Type Model
Regression Random Forest Regressor
Support Vector Regressor (SVR)
LASSO Regressor
Ridge Regressor
AdaBoost Regressor

Model Evaluation

Evaluation Type Metric
Classification Balanced Accuracy
Precision
Regression Mean Absolute Error (MAE)
Tuning Grid Search

Results

The two best results from classification and regression task was evaluated.

Classification Results

Model Balanced Accuracy Precision
AdaBoost Classifier 73.43% 70.59%
Voting Classifier 70.68% 83.33%

Regression Results

Model Mean Absolute Error (MAE)
Random Forest Regressor 20.43
LASSO Regressor 20.84

Conclusion

The AdaBoost Classifier was the best model for predicting PCR due to its balanced accuracy and precision. The Random Forest Regressor was the most effective for predicting RFS with the lowest MAE and robust performance.

About

Predictive Modelling of Pathological Complete Response Classification and Relapse-Free Survival Regression in Cancer Patients

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages