- Logistic Regression
- Support Vector Classifier
- KNearest Neighbors Classifier
- Random Forest Classifier
- Numpy
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- There are no missing values in the dataset
- The dataset is imbalanced
- No negative and closer to 1.0 correlations based on the correlation matrix
- The features are skewed and have outliers
- It seems that the SVC model has reached its maximum potential on a imbalanced dataset
- Other solutions is to penalize or apply regularization and gradient descent techniques in the selected models
- Other solutions is to have more data to solve the imbalanced dataset or use sampling techniques
- After that, select a few more classification models based on the defined problem, type of data, and the expected outcome
- Scaled all the features too early
- Fit only for the x_train and tranform only for the x_test dataset using the StandardScaler