## **Project Description:**
**Evaluating Gaussian Naive Bayes for Digit Classification with Error Analysis**<br><br>
This project evaluates a Gaussian Naive Bayes model for classifying handwritten digits. It uses cross-validation to assess performance and analyzes the confusion matrix to identify digits with the highest misclassification rates. This helps understand where the model struggles and provides opportunities for improvement. 

---

**Expected Outcome:**

- This project aims to provide an evaluation of the GNB classifier's performance on digit classification using cross-validation.
- The confusion matrix offers a visual representation of classification accuracy for each digit class.
- The error analysis by digit helps identify potential challenges faced by the model in classifying specific digits. This information can be valuable for further investigation and potential model improvements. 

In [None]:
import numpy as np
from sklearn.datasets import load_digits
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix

digit=load_digits()
X=digit.data
y=digit.target
y_binary=(digit.target == 0).astype(int)

X_train,X_test,y_train,y_test=train_test_split(X,y_binary,test_size=0.2,random_state=0)

gaussian=GaussianNB()
gaussian.fit(X_train,y_train)

cls=GaussianNB()
score=cross_val_score(cls,X,y,cv=5)
print("Average Error",1-score.mean(),"\n")

y_predict=cross_val_predict(gaussian,X,y,cv=5)
matrix=confusion_matrix(y,y_predict)
print("Confusion Matrix","\n",matrix,"\n")

error_=[]
for i in range(10):
    error=1-matrix[i,i]/sum(matrix[i,:])
    error_.append(error)
    print(f"Error rate for digit {i}: {error}")

max_error=max(error_)
max_error_digit=error_.index(max_error)
print("\n",f"Digit with highest error rate: {max_error_digit}")

Average Error 0.1930718043949241 

Confusion Matrix 
 [[174   0   0   0   2   0   0   1   0   1]
 [  0 137   8   0   0   0   5   4  18  10]
 [  0  13 112   1   1   2   1   0  45   2]
 [  0   2   6 133   0   8   0   7  22   5]
 [  3   2   2   0 142   1   3  25   3   0]
 [  0   1   0   3   2 158   1   8   5   4]
 [  0   1   1   0   1   3 174   0   1   0]
 [  0   0   1   0   2   1   0 174   1   0]
 [  0  20   3   0   1   5   0  10 133   2]
 [  1  11   0   8   2   4   1  17  23 113]] 

Error rate for digit 0: 0.022471910112359605
Error rate for digit 1: 0.24725274725274726
Error rate for digit 2: 0.36723163841807904
Error rate for digit 3: 0.27322404371584696
Error rate for digit 4: 0.2154696132596685
Error rate for digit 5: 0.13186813186813184
Error rate for digit 6: 0.03867403314917128
Error rate for digit 7: 0.027932960893854775
Error rate for digit 8: 0.23563218390804597
Error rate for digit 9: 0.37222222222222223

 Digit with highest error rate: 9
