# Classification Metrics

A Classification metric is a number that measures the performance of your machine learning model when it comes to assigning observations to certain classes.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
link = 'https://raw.githubusercontent.com/kb22/Heart-Disease-Prediction/master/dataset.csv'
df = pd.read_csv(link)

df.sample(5)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
161,55,0,1,132,342,0,1,166,0,1.2,2,0,2,1
10,54,1,0,140,239,0,1,160,0,1.2,2,0,2,1
6,56,0,1,140,294,0,0,153,0,1.3,1,0,2,1
145,70,1,1,156,245,0,0,143,0,0.0,2,0,2,1
270,46,1,0,120,249,0,0,144,0,0.8,2,0,3,0


In [3]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=2002)

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [6]:
lr_clf = LogisticRegression()
dt_clf = DecisionTreeClassifier()

In [7]:
lr_clf.fit(X_train,y_train)
dt_clf.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [8]:
lr_pred = lr_clf.predict(X_test)
dt_pred = dt_clf.predict(X_test)

## Types of Classification Metrics

1. Accuracy
2. Confusion Matrix
4. Precision
5. Recall
6. F1 Score
7. ROC Curve
8. AUC Curve
9. PR Curve

![](https://ai-ml-analytics.com/wp-content/uploads/2020/10/Classification-matrix.png)

## Accuracy

Accuracy simply measures how often the classifier correctly predicts. We can define accuracy as the ratio of the number of correct predictions and the total number of predictions.

$$\text{Accuracy } = \frac{\text{Correct Predictions}}{\text{Total Predictions}} = \frac{TP + TN}{TP+ TN + FP + FN}$$

In [9]:
from sklearn.metrics import accuracy_score, confusion_matrix

In [10]:
print(f'Accuracy of Logistic Regression {accuracy_score(y_test,lr_pred)}')
print(f'Accuracy of Decision Tree {accuracy_score(y_test,dt_pred)}')

Accuracy of Logistic Regression 0.7802197802197802
Accuracy of Decision Tree 0.6923076923076923


**Note:** Accuracy paints a misleading potrait when the dataset is imbalanced.

## Confusion Matrix

Confusion Matrix is a performance measurement for the machine learning classification problems where the output can be two or more classes. It is a table with combinations of predicted and actual values.

- True Positive: We predicted positive and it’s true.
- True Negative: We predicted negative and it’s true.
- False Positive (Type 1 Error): We predicted positive and it’s false.
- False Negative (Type 2 Error): We predicted negative and it’s false.

![](https://cdn.analyticsvidhya.com/wp-content/uploads/2020/04/Example-Confusion-matrix.png)

[Machine Learning Fundamentals: The Confusion Matrix](https://youtu.be/Kdsp6soqA7o?si=rrvZ6a1zlgwha56W)

In [11]:
print('Confusion Matrix for Logistic Regression')
confusion_matrix(y_test,lr_pred)

Confusion Matrix for Logistic Regression


array([[31, 13],
       [ 7, 40]], dtype=int64)

In [12]:
print('Confusion Matrix for Decision Tree')
confusion_matrix(y_test,dt_pred)

Confusion Matrix for Decision Tree


array([[31, 13],
       [15, 32]], dtype=int64)

**Multiclass Classification**

In [13]:
link = 'https://raw.githubusercontent.com/wehrley/Kaggle-Digit-Recognizer/master/train.csv'
df = pd.read_csv(link)

df.sample(3)

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
13630,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3170,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33382,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
X = df.iloc[:,1:]
y = df.iloc[:,0]

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=2002)

In [16]:
lr_clf.fit(X_train,y_train)
dt_clf.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [17]:
lr_pred = lr_clf.predict(X_test)
dt_pred = dt_clf.predict(X_test)

In [18]:
print(f'Accuracy of Logistic Regression {accuracy_score(y_test,lr_pred)}')
print(f'Accuracy of Decision Tree {accuracy_score(y_test,dt_pred)}')

Accuracy of Logistic Regression 0.9146031746031746
Accuracy of Decision Tree 0.8538095238095238


In [19]:
print('Confusion Matrix for Logistic Regression')
pd.DataFrame(confusion_matrix(y_test,lr_pred))

Confusion Matrix for Logistic Regression


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1186,0,5,3,1,7,6,3,9,1
1,0,1413,5,9,1,5,2,1,11,1
2,6,11,1127,23,12,9,13,13,32,10
3,5,4,30,1156,1,47,7,13,29,15
4,4,3,11,0,1115,1,9,2,8,41
5,18,5,10,53,8,970,26,4,44,24
6,11,6,11,1,9,18,1161,0,7,0
7,4,8,24,9,12,2,1,1209,3,51
8,15,22,7,42,1,42,9,5,1091,17
9,6,7,7,9,43,4,0,31,11,1096


In [20]:
print('Confusion Matrix for Decision Tree')
pd.DataFrame(confusion_matrix(y_test,dt_pred))

Confusion Matrix for Decision Tree


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1102,1,11,16,11,22,19,7,17,15
1,1,1364,28,14,3,6,5,8,14,5
2,18,17,1040,45,18,9,20,42,40,7
3,12,9,53,1038,13,67,3,29,53,30
4,5,7,18,10,1016,6,20,12,41,59
5,18,13,9,61,26,936,28,7,46,18
6,17,9,26,10,25,28,1086,2,20,1
7,5,16,28,23,28,4,1,1177,5,36
8,17,29,37,49,29,28,21,18,984,39
9,10,8,13,20,49,23,8,33,35,1015


## Precision

Precision shows how often an ML model is correct when predicting the target class. It shows how often a classification ML model is correct overall.

$$\text{Precision } = \frac{\text{TP}}{\text{TP + FP}}$$

## Recall

Recall shows whether an ML model can find all objects of the target class. It measures the percentage of relevant data points that were correctly identified by the model.

$$\text{Recall } = \frac{\text{TP}}{\text{TP + FN}}$$

## F1 Score

The F1 score is the harmonic mean of precision and recall, taking both false positives and false negatives into account. It ranges from 0 to 1, where 1 indicates perfect precision and recall, and 0 means neither perfect precision nor recall.

$$\text{F1 Score } = \frac{2\text{PR}}{\text{P+R}}$$

In [21]:
from sklearn.metrics import precision_score, recall_score, f1_score

In [33]:
print('Logistic Regression Model')
print('-'*50)
cdf = pd.DataFrame(pd.DataFrame(confusion_matrix(y_test,lr_pred)))
print(cdf)
print('-'*50)
print('Precision',precision_score(y_test,lr_pred))
print('Recall',recall_score(y_test,lr_pred))
print('F1 Score',f1_score(y_test,lr_pred))

Logistic Regression Model
--------------------------------------------------
    0   1
0  31  13
1   7  40
--------------------------------------------------
Precision 0.7547169811320755
Recall 0.851063829787234
F1 Score 0.8


In [34]:
print('Decision Tree Model')
print('-'*50)
cdf = pd.DataFrame(pd.DataFrame(confusion_matrix(y_test,dt_pred)))
print(cdf)
print('-'*50)
print('Precision',precision_score(y_test,dt_pred))
print('Recall',recall_score(y_test,dt_pred))
print('F1 Score',f1_score(y_test,dt_pred))

Decision Tree Model
--------------------------------------------------
    0   1
0  30  14
1  14  33
--------------------------------------------------
Precision 0.7021276595744681
Recall 0.7021276595744681
F1 Score 0.7021276595744681


**Multiclass Classification**

In [27]:
from sklearn.metrics import classification_report

In [23]:
precision_score(y_test,lr_pred,average=None)

array([0.94501992, 0.95537525, 0.91107518, 0.88582375, 0.92684954,
       0.87782805, 0.94084279, 0.94379391, 0.87630522, 0.87261146])

In [24]:
precision_score(y_test,lr_pred,average='macro')

0.9135525092198697

In [25]:
precision_score(y_test,lr_pred,average='weighted')

0.9144339988259119

In [26]:
precision_score(y_test,lr_pred,average='micro')

0.9146031746031746

In [31]:
print(classification_report(y_test,lr_pred))

              precision    recall  f1-score   support

           0       0.95      0.97      0.96      1221
           1       0.96      0.98      0.97      1448
           2       0.91      0.90      0.90      1256
           3       0.89      0.88      0.89      1307
           4       0.93      0.93      0.93      1194
           5       0.88      0.83      0.86      1162
           6       0.94      0.95      0.94      1224
           7       0.94      0.91      0.93      1323
           8       0.88      0.87      0.87      1251
           9       0.87      0.90      0.89      1214

    accuracy                           0.91     12600
   macro avg       0.91      0.91      0.91     12600
weighted avg       0.91      0.91      0.91     12600

