Skip to content

A jupyter notebook for binary classification of breast cancer using XGBoost with Bayesian optimization.

License

Notifications You must be signed in to change notification settings

abhinav-neil/classification-xgboost-bayesopt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Classification with XGBoost and Bayesian Optimization

Data

The dataset contains various (numeric) indicators of breast cancer, along with the diagnosis (benign or malignant). (Source) The task is to train a binary classification model to make predictions of cancer based on indicators.

Model

We use XGBoost, augmented with hyperparameter optimization using Bayesian sampling. Stages:

  • Load breast cancer dataset
    • Cleaning & EDA
    • Visualization
  • Feature engineering
    • Prepare feature and targets and drop non-informative features
    • Split into train and test sets
  • Bayesian optimization
    • Define XGBoost classifier model, search space, evaluation metric & cross-validaion strategy (stratified k-fold)
    • Run optimization for n iterations to find best parameters
  • Model training and analysis
    • Configure XGBoost model with best parameters and fit to train set
    • Draw tree graphs
    • Plot feature importance
  • Evaluation
    • Use fitted model to make predictions on test set
    • Compute mean accuracy
    • Draw confusion matrix

About

A jupyter notebook for binary classification of breast cancer using XGBoost with Bayesian optimization.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published