It is a multiclass classification problem. The data is used irish flower dataset. In this dataset the number of observations for each class is balanced. There are 210 observations with 4 input variables and 1 output variable. The variable names are as follows:
- Sepal length in cm
- Sepal width in cm
- Petal length in cm
- Petal width in cm
- Species
Let’s use this standard iris dataset to predict the 3 different species of the flowers using 4 different features: sepal length, sepal width, petal length, and petal width. The aim of this model was to predict the type of Iris flower based on the dimensions of their sepals and petals. This data is work well with the Gaussian Naive Bayes classifier with an accuracy of 95%.
Initially, we divided the datasets into training (keeping 70% data for training purposes) and test data (the rest 30% data for testing purposes). After that most important thing is to check which type of classification model work well this dataset. There are three types of Naive Bayes Classifications:
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Bernoulli Naive Bayes
We train the model with the above classification methods and then verify the accuracy score with the help of test data. The bar chart of the model accuracy is shown below;
After running the model, the accuracy score obtained for the different types of the Naive Bayes Classifications models are;
- Gaussian Naive Bayes >> 95%
- Multinomial Naive Bayes >> 93%
- Bernoulli Naive Bayes >> 25%
From the above bar chart, it is clear that the most accurate model is Gaussian Naive Bayes which work well with an accuracy of 95%. Now, working with this classification type, the precision, recall and f1- score is estimated as shown in the above report. It indicates the model accurately predict the setosa flower whereas the versicolor and virginica flowers gives a slight error during the prediction.
The above confusion matrix predicts well-given flowers based on their different features such as sepal length, sepal width, petal length, and petal width. The model accurately predict setosa flower 16 times after training the data. Similarly, the versicolor and virginica flowers model accurately predict 22 time. However, only two times model predict versicolor flower as virginica flower and one time predict virginica flower as versicolor flower. Hence, it accurately predict the flowers 60 times whereas unable to predict 3 times. Therefore, it work well as shown in the report.
Now, let's summarize the classification model and its accuracy. From the above confusion matrix, it is noticeable that only two times model predict versicolor flower as virginica flower and one time predict virginica flower as versicolor flower. Otherwise, classifier works well and predict flower same as actual flower depending upon different features such as sepal length, sepal width, petal length, and petal width. The accuracy score of the Gaussian Naive Bayes model is obtained 95%. Also, the precision, recall and f1- score are estimated as shown in the above report.