Skip to content

Machine Learning Diabetes Prediction using 4 Classifier Algorithms for Fitting the Data.

Notifications You must be signed in to change notification settings

Xage0424/Diabetes-Prediction-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

Machine Learning Diabetes Prediction

Algorithms Used:

  1. Logistic Regression
  2. Support Vector Classifier
  3. KNearest Neighbors Classifier
  4. Random Forest Classifier

Python Libraries Used:

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Seaborn
  5. Scikit-learn

Exploratory Data Analysis and Visualization Summary:

  • There are no missing values in the dataset
  • The dataset is imbalanced
  • No negative and closer to 1.0 correlations based on the correlation matrix
  • The features are skewed and have outliers

Conclusion (Comments):

  • It seems that the SVC model has reached its maximum potential on a imbalanced dataset
  • Other solutions is to penalize or apply regularization and gradient descent techniques in the selected models
  • Other solutions is to have more data to solve the imbalanced dataset or use sampling techniques
  • After that, select a few more classification models based on the defined problem, type of data, and the expected outcome

Noticed Mistakes:

  • Scaled all the features too early
  • Fit only for the x_train and tranform only for the x_test dataset using the StandardScaler