Skip to content

burcuozek/datamining-breastcancer

Repository files navigation

Leveraging Data Science and Machine Learning Algorithms To Predict Breast Cancer

Breast cancer happens when cells in your breast grow and divide uncontrolled, creating a mass of tissue called a tumor.

In the United States, breast cancer is the second-leading cause of cancer death in women, after lung cancer. It’s also the leading cause of cancer death among women ages 35 to 54.

In this project, Breast Cancer Data Set in UCI Machine Learning Repository is used to predict if the cell is malignant or not. There are nine variables and 286 samples.

  • As a first step, data is cleaned by missing value and outlier analysis.
  • To better understand the data, univariate, multivariate, and correlation analyses are conducted.
  • Various visualization techniques are applied to the data (scatter plots, box plots, heat maps).
  • Variables are standardized.
  • Interaction terms are added to the model.
  • For dimension reduction, stepwise regression, logistic regression, and backward selection are applied.
  • Various machine learning models are applied to predict if a cell is malign or benign (if a person has breast cancer or not).
  • To select the best model, the confusion matrix is calculated, and the ROC curve is drawn. Accordingly, sensitivity, specificity, F-score, and accuracy are calculated.
  • Logistic regression is chosen as the best model.

References:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages