Skip to content

arminZolfaghari/Diabetes-Classification-XGBoost

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Diabetes Classification with XGBoost

This project is the implementation of a classifier for diabetes, which uses the XGBoost library to train the model.
After training, the model should decide whether a person has diabetes disease or not.

Diabetes Dataset

The dataset includes more than 70000 records that have been collected from patients.
Dataset has 22 columns:
Diabetes_binary, HighBP, High Cholesterol, Cholesterol Check, BMI, Smoker, Stroke, HeartDiseaseorAttack, Physical Activity, Fruits, Veggies, Heavy Alcohol Consumption, Any Health Care, No Doctor because of Cost, General Health, Mental Health, Physical Health, Difficulty Walking, Sex, Age, Education, Income.

XGBoost

XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient-boosting framework. In prediction problems involving unstructured data (images, text, etc.), artificial neural networks tend to outperform all other algorithms or frameworks. However, when it comes to small-to-medium structured/tabular data, decision tree-based algorithms are considered best-in-class right now.

Project has 6 steps:

  1. Import libraries
  2. Getting the data
  3. Preprocessing: load dataset, rename column names, handle Null values, normalize, and convert categorical features to numerical features with OneHotEncoding and Min-Max.
  4. Build XGBoost classifier model: create a XGBClassifier, train the model, print accuracy, plot confusion_matrix, and plot precision-recall curve.
  5. Set hyperparameters (use GridSearchCV)
  6. Visualization

Check the full description (in Persian)

Contact

If you have any questions, feel free to ask me:
📩 arminzolfagharid@gmail.com

About

Data Mining Course Project - Diabetes Classification with XGBoost - Winter 2022

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published