# Classification Metrics

## Classification metrics are evaluation metrics used to assess the performance and accuracy of classification models. These metrics provide quantitative measures of how well the model predicts categorical or discrete outcomes. Some commonly used classification metrics include:

## Accuracy: Accuracy measures the proportion of correctly classified instances out of the total number of instances. It provides an overall measure of the model's correctness. However, accuracy can be misleading when the classes are imbalanced.

## Precision: Precision calculates the proportion of correctly predicted positive instances (true positives) out of the total predicted positive instances (true positives + false positives). Precision indicates how well the model predicts positive instances and is useful when the focus is on minimizing false positives.

## Recall (Sensitivity or True Positive Rate): Recall calculates the proportion of correctly predicted positive instances (true positives) out of the total actual positive instances (true positives + false negatives). Recall indicates how well the model captures the positive instances and is useful when the focus is on minimizing false negatives.

## F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a balanced measure that considers both precision and recall. The F1 score is useful when the class distribution is imbalanced, and you want to consider both false positives and false negatives.

## Specificity (True Negative Rate): Specificity calculates the proportion of correctly predicted negative instances (true negatives) out of the total actual negative instances (true negatives + false positives). It indicates how well the model captures the negative instances and is useful when the focus is on minimizing false positives.

## Area Under the Receiver Operating Characteristic Curve (AUC-ROC): AUC-ROC measures the performance of a classification model across various thresholds by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR). It provides an aggregate measure of the model's discriminatory power, independent of the classification threshold.

## Log Loss (Logistic Loss): Log loss measures the performance of a probabilistic classification model by penalizing incorrect predictions based on the predicted probabilities. It is used when the model outputs predicted probabilities instead of class labels.

## These classification metrics provide different perspectives on the performance of a classification model, focusing on aspects such as overall accuracy, precision, recall, and the trade-off between false positives and false negatives.

# Let's Implement Accuracy and Confusion Metrics

In [61]:
import numpy as np 
import pandas as pd 

In [62]:
df=pd.read_csv('OneDrive/Desktop/SAMIYA/Samiya_learn/Datasets/heart.csv')

In [63]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1025 non-null   int64  
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(1), int64(13)
memory usage: 112.2 KB


In [65]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,0:-1],df.iloc[:,-1],test_size=0.2,random_state=2)
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [66]:
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()
clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [67]:
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)

In [68]:
from sklearn.metrics import accuracy_score,confusion_matrix
print("Accuracy of Logistic Regression",accuracy_score(y_test,y_pred1))
print("Accuracy of Decision Trees",accuracy_score(y_test,y_pred2))

Accuracy of Logistic Regression 0.8536585365853658
Accuracy of Decision Trees 0.9804878048780488


## NOTE: how good our accuracy is depends entirely on our objective and problem statement

## While accuracy is a commonly used metric for evaluating classification models, it has some limitations and may not always provide a complete picture of the model's performance. Here are a few problems associated with accuracy:

## - Imbalanced Classes: Accuracy can be misleading when dealing with imbalanced class distributions, where one class has significantly more instances than the other(s). In such cases, a model that simply predicts the majority class for all instances can achieve a high accuracy even though it fails to properly classify the minority class. Accuracy alone may not reflect the model's ability to correctly identify the minority class, which is often of more interest in imbalanced scenarios.

## - Misinterpretation of Errors: Accuracy treats all misclassifications equally, regardless of the type of error made. However, in certain applications, the consequences of different types of errors may vary. For example, in a medical diagnosis setting, false negatives (missed cases) and false positives (false alarms) may have different implications. Accuracy alone does not provide insight into the specific types of errors made by the model.

## -Threshold Sensitivity: Accuracy does not consider the classification threshold, assuming a default threshold of 0.5 for binary classification. However, adjusting the threshold can significantly impact the model's performance. Changing the threshold can alter the balance between precision and recall, affecting the trade-off between false positives and false negatives. Accuracy alone does not account for this threshold sensitivity.

## Confusion matrix and related metrics are often preferred over accuracy alone because they provide more detailed and informative insights into the performance of a classification model. 

In [69]:
confusion_matrix(y_test,y_pred1)

array([[82, 23],
       [ 7, 93]], dtype=int64)

In [70]:
print("Logistic Regression Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,2)))

Logistic Regression Confusion Matrix



Unnamed: 0,0,1
0,82,23
1,7,93


In [71]:
print("Decision Tree Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,2)))

Decision Tree Confusion Matrix



Unnamed: 0,0,1
0,101,4
1,0,100


In [72]:
result = pd.DataFrame()
result['Actual Label'] = y_test
result['Logistic Regression Prediction'] = y_pred1
result['Decision Tree Prediction'] = y_pred2

In [73]:
result.sample(10)

Unnamed: 0,Actual Label,Logistic Regression Prediction,Decision Tree Prediction
643,1,0,1
438,1,1,1
37,1,1,1
917,1,1,1
867,1,1,1
735,1,1,1
183,1,1,1
933,0,1,0
291,0,0,0
322,0,0,0


In [40]:
# accuracy=(TP+FN)/(TP+FN+FP+TN)

FP i.e, False Positive is Type-1 Error

FN i.e, False Negative is Type-2 Error

In [80]:
dv=pd.read_csv("OneDrive/Desktop/SAMIYA/Samiya_learn/Datasets/iris.csv")

In [81]:
dv.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


In [82]:
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
dv['Species'] = encoder.fit_transform(dv['Species'])

In [83]:
dv.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,0
1,2,4.9,3.0,1.4,0.2,0
2,3,4.7,3.2,1.3,0.2,0
3,4,4.6,3.1,1.5,0.2,0
4,5,5.0,3.6,1.4,0.2,0


In [84]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(dv.iloc[:,0:-1],dv.iloc[:,-1],test_size=0.2,random_state=1)
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [85]:
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

In [86]:
clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [87]:
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)

In [88]:
from sklearn.metrics import accuracy_score,confusion_matrix
print("Accuracy of Logistic Regression",accuracy_score(y_test,y_pred1))
print("Accuracy of Decision Trees",accuracy_score(y_test,y_pred2))

Accuracy of Logistic Regression 1.0
Accuracy of Decision Trees 0.9666666666666667


In [89]:
print("Logistic Regression Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,3)))

Logistic Regression Confusion Matrix



Unnamed: 0,0,1,2
0,11,0,0
1,0,13,0
2,0,0,6


In [90]:
print("Decision Tree Confusion Matrix\n")
pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,3)))

Decision Tree Confusion Matrix



Unnamed: 0,0,1,2
0,11,0,0
1,0,12,1
2,0,0,6


In [91]:
result = pd.DataFrame()
result['Actual Label'] = y_test
result['Logistic Regression Prediction'] = y_pred1
result['Decision Tree Prediction'] = y_pred2
result.sample(10)

Unnamed: 0,Actual Label,Logistic Regression Prediction,Decision Tree Prediction
120,2,2,2
78,1,1,1
94,1,1,1
14,0,0,0
141,2,2,2
77,1,1,1
131,2,2,2
42,0,0,0
56,1,1,1
35,0,0,0


# Let's Implement Precision, Recall and F1 Score

In [92]:
from sklearn.metrics import precision_score,recall_score
precision_score(y_test,y_pred1,average=None)
#for iris dataset

array([1., 1., 1.])

In [93]:
recall_score(y_test,y_pred1,average=None)

array([1., 1., 1.])

In [74]:
from sklearn.metrics import recall_score,precision_score,f1_score

In [94]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred1))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00         6

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [75]:
print("For Logistic regression Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred1))
print("Recall - ",recall_score(y_test,y_pred1))
print("F1 score - ",f1_score(y_test,y_pred1))
# For heart dataset

For Logistic regression Model
--------------------------------------------------
    0   1
0  82  23
1   7  93
--------------------------------------------------
Precision -  0.8017241379310345
Recall -  0.93
F1 score -  0.8611111111111112


In [76]:
print("For DT Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred2))
print("Recall - ",recall_score(y_test,y_pred2))
print("F1 score - ",f1_score(y_test,y_pred2))
# For heart dataset

For DT Model
--------------------------------------------------
     0    1
0  101    4
1    0  100
--------------------------------------------------
Precision -  0.9615384615384616
Recall -  1.0
F1 score -  0.9803921568627451


In [77]:
precision_score(y_test,y_pred1,average=None)

array([0.92134831, 0.80172414])

In [78]:
precision_score(y_test,y_pred2,average=None)

array([1.        , 0.96153846])

In [79]:
recall_score(y_test,y_pred2,average=None)

array([0.96190476, 1.        ])