The following details and explains performing classification on the Iris dataset using a range of ML models:
k-NNs, Naive Bayes, Stochastic Gradient Descent, Decision Trees, Random Forest, SVM, Logistic Regression, Neural Nets
I similarly do the same for regression on the USA House Pricing dataset using:
Linear Regression (and soon Polynomial Regression, Support Vector Regression (SVR), Random Forest Regression, Regularised regression models (Ridge, Lasso)
All content for classification on the Iris dataset can be found in the following notebook:
The iris dataset comprises three different irises each with 3 different features, petal length, petal width, sepal width, and sepal length. Some EDA is performed using a simple pairplot to provide an indication of any groupings in the dataset.
Using k-Nearest Neighbours to predict the iris type using features of the plant
When using a k-NN model, it can be useful to find the optimum k number. The model is trained for a k from 1 to 40 and it was found that the best k number for this dataset is 4.
The model with a k of 4 achieved perfect accuracy:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 11
Iris-versicolor 1.00 1.00 1.00 19
Iris-virginica 1.00 1.00 1.00 15
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Using Naive Bayes to predict the iris type using features of the plant
The accuracy of the Naive Bayes model is high, but is not perfect:
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 0.92 0.96 13
3 0.93 1.00 0.96 13
accuracy 0.98 45
macro avg 0.98 0.97 0.97 45
weighted avg 0.98 0.98 0.98 45
Using Stochastic Gradient Descent to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 0.69 0.82 13
3 0.76 1.00 0.87 13
accuracy 0.91 45
macro avg 0.92 0.90 0.89 45
weighted avg 0.93 0.91 0.91 45
Using a Decision Tree to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 1.00 1.00 13
3 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Using a Random Foreste to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 1.00 1.00 13
3 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Using an SVM to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 1.00 1.00 13
3 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Using Logistic Regression to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 1.00 1.00 1.00 13
3 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Using a Neural Network with an lbfgs optimsed to predict the iris type using features of the plant
precision recall f1-score support
1 1.00 1.00 1.00 19
2 0.93 1.00 0.96 13
3 1.00 0.92 0.96 13
accuracy 0.98 45
macro avg 0.98 0.97 0.97 45
weighted avg 0.98 0.98 0.98 45