Protein sequence multiclassification Machine Learning Project
- Dataset: https://www.kaggle.com/shahir/protein-data-set
- Kaggle Notebook: https://www.kaggle.com/ammarhelali/protein-sequence-eda-multiclassification
- 48% GaussianNB
- 91% KNN
- 91.7% KNN with SMOTE
- 94% RandomForest (60 estimators)
- 98.5% RandomForest (225 estimators)
- 98.4% RandomForest (225) with SMOTE
- 98.1% RandomForest (150) with Feature Selection
- 75.6% XGBoost with SMOTE(300 estimator)
- 81.2% XGBoost with SMOTE(600 estimator)