Skip to content

Estimation of vital status of patients with ovarian cancer using Machine Learning models

Notifications You must be signed in to change notification settings

Kontilenia/Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Machine Learning

This code is part of the team assignment of Machine Learning lesson in Data Science and Machine Learning Master of Science programme of National Technical University of Athens.

The objective of our paper is to test different Machine Learning models in order to accurately predict the vital status of patients with high grade serous ovarian cancer. The dataset used contains information about 488 patients for 11 ovarian cancer related attributes and their vital status, while the trained models implement the classifiers K-Nearest-Neighbors, Support Vector Machine, Logistic Regression, Random Forest and XGBoost. The methodologies used are K-Nearest-Neighbors for filling the missing values, PCA and variance threshold for attribute selection, Min-Max scaling for normalization, 5-fold Cross Validation for the validation of the models and Grid Search for hyperparameter selection. The performance of the models is evaluated using the metrics accuracy and Area Under the Curve (AUC) while precision, recall, F1-Score were merely examined. The best classifier regarding the accuracy is Logistic Regression with the score of 77.3%, and regarding the AUC is XGBoost with the score of 73.47%.