Unlike regression , classification models doesn't fit into the scenario where our target variable is continous in nature , rather it helps in the scenario where our target output is classifying a set of data , it can be either binary or non binary , 

    -   Binary Classification examples
        -   True
        -   False
    -   Non Binary Classification examples
        -   small
        -   medium
        -   large
        -   extra large 

There are various models and algorithms used for different kinds of classification as below

    -   Models for Binary Classification
        -   Naive Bayes
        -   Logistic Regression
        -   K-Nearest Neighbors
        -   Support Vector Machine
        -   Decision Tree
        -   Random Forest

    -   Models for Non Binary/ Multi Label Classification
        -   Naive Bayes
        -   Gradient Boosting
        -   K-Nearest Neighbors
        -   Decision Tree
        -   Random Forest

For these models there are certain evaluation techniques that we use -

    -   Accuracy
    -   Precision
    -   Recall
    -   F1 Score
    -   ROC AUC
    -   Confusion Matrix

Accuracy, precision, recall, and F1 score are commonly used performance metrics in machine learning for evaluating the performance of a classification model. In this answer, I will provide code examples and explanations of how to calculate these metrics in Python.

First, let's define a binary classification problem, where we are trying to classify whether a person has a disease or not based on some input features. For this example, let's assume we have a dataset with the following features: age, blood pressure, and cholesterol level. The label column indicates whether the person has the disease or not.

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Create a dummy dataset
data = pd.DataFrame({
    'age': [30, 40, 50, 60, 70, 80, 90, 25, 35, 45, 55, 65, 75, 85],
    'blood_pressure': [120, 130, 140, 150, 160, 170, 180, 110, 125, 135, 145, 155, 165, 175],
    'cholesterol': [200, 210, 220, 230, 240, 250, 260, 190, 205, 215, 225, 235, 245, 255],
    'label': [0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1]
})

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data[['age', 'blood_pressure', 'cholesterol']], data['label'], test_size=0.2, random_state=42)

# Fit a logistic regression model to the training data
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)

# Predict the labels for the test data
y_pred = clf.predict(X_test)


In [24]:
''' Now we used logistic regression here in this model , all the predicted data points are stored in ypred and ytest is storing all the actual data points , let's go by the formula of
accuracy , precision and recall now'''

# Accuracy - (TN+TP)/(TN+FP+TP+FN) - Not ideal for imbalanced datasets
# Precision - TP/(TP+FP) - ideally should be 1 for a good classifier . intends to make false positive as 0
# Recall - TP/(TP+FN) - ideally should be 1 for a good classifier . intends to make false negative as 0
# F1 Score - 2* ((precision*recall)/(precision+recall)) - It becomes 1 when only precision and recall both becomes 1 , it's a harmonic mean of precision and recall

''' Let's find out TP , FP , TN , FN'''
print(y_test)

9     0
11    1
0     0
Name: label, dtype: int64


In [26]:
print(y_pred)

[1 1 0]


In [44]:
# ypred -1 and ytest - 1 then it's TP
# ypred -0 and ytest - 0 then it's TN 
# ypred -1 and ytest - 0 then it's FP
print(type(y_test))
print(type(y_pred))

<class 'pandas.core.series.Series'>
<class 'numpy.ndarray'>


In [45]:
ytest = np.array(y_test)
ypred = np.array(y_pred)

In [51]:
TP = 0
TN = 0
FP = 0
FN = 0
for i in range(len(ypred)):
    if ypred[i] == ytest[i] == 1:
        TP += 1
    elif ypred[i] == ytest[i] == 0:
        TN += 1
    elif ypred[i] == 1 and ytest[i] == 0:
        FP += 1
    elif ypred[i] == 0 and ytest[i] == 1:
        FN += 1

In [52]:
print(f"True positive is {TP}")
print(f"False positive is {FP}")
print(f"True negative is {TN}")
print(f"False negative is {FN}")

True positive is 1
False positive is 1
True negative is 1
False negative is 0


In [53]:
''' Let's now apply our formla to identify all metrics '''
accuracy = (TN+TP)/(TN+FP+TP+FN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
f1_score = 2*((Precision*Recall)/(Precision+Recall))
print(f"accuracy is {accuracy}")
print(f"precision is {Precision}")
print(f"recall is {Recall}")
print(f"f1 score is {f1_score}")


accuracy is 0.6666666666666666
precision is 0.5
recall is 1.0
f1 score is 0.6666666666666666


In [56]:
''' Now Let's find all of this with sklearn method as well'''
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import f1_score
print(f"sklearn accuracy is {accuracy_score(ypred,ytest)}")
print(f"sklearn precision is {precision_score(ypred,ytest)}")
print(f"sklearn recall is {recall_score(ypred,ytest)}")
print(f"sklearn f1 score is {f1_score(ypred,ytest)}")

sklearn accuracy is 0.6666666666666666
sklearn precision is 1.0
sklearn recall is 0.5
sklearn f1 score is 0.6666666666666666


#### Accuracy

In Classification models the work around we have is simplfied with few terms , one of them is True Positive. Let me make you understand that with an example. Suppose our model is trying to predict if there will be rain today on different areas of a city . So the probable outcome is 2 over here. "Rain" and "No Rain". Suppose there are 10 records in our data based on each area. 

    -    Model prediction
        -   6 Rain
        -   4 No Rain
        
    -   Actual Status
        -   3 Rain
        -   7 No Rain
    
So out of the 6 predictions , only 3 turned out to be true , so our true positive count is 3 here. Let's understand this by the code here

Let's also understand True negative here , wherever in our records , the prediction and the actual both are having the same value as "No Rain" will be called as True Negative.
In here 4 prediction says no rain and 7 actual value which is higher than 4 says no rain , so our true negative value will be 4 here.

In [2]:
# importing libraries
import pandas as pd
import numpy as np

In [3]:
rain_df = pd.DataFrame(data = [["area 1","yes","Yes"],
                               ["area 2","No","Yes"],
                               ["area 3","No","No"],
                               ["area 4","yes","Yes"],
                               ["area 5","yes","No"],
                               ["area 6","No","Yes"],
                               ["area 7","No","Yes"],
                               ["area 8","No","No"],
                               ["area 9","No","No"],
                               ["area 10","yes","Yes"]],columns = ["Area Code","Prediction","Actual"])

In [4]:
rain_df.head()

Unnamed: 0,Area Code,Prediction,Actual
0,area 1,yes,Yes
1,area 2,No,Yes
2,area 3,No,No
3,area 4,yes,Yes
4,area 5,yes,No


In [5]:
# converting yes no into 1 ,0 with label encoder
from sklearn.preprocessing import LabelEncoder

In [6]:
le = LabelEncoder()
rain_df["Prediction"] = le.fit_transform(rain_df["Prediction"])
rain_df["Actual"] = le.fit_transform(rain_df["Actual"])


In [7]:
# Let's look into our new dataframe
rain_df.head()

Unnamed: 0,Area Code,Prediction,Actual
0,area 1,1,1
1,area 2,0,1
2,area 3,0,0
3,area 4,1,1
4,area 5,1,0


In [8]:
'''Now every instance where both prediction  and actual column stores the value as 1 will be termed as true positive'''
rain_df["TP Status"] = np.where((rain_df["Prediction"] == 1) & (rain_df["Actual"] == 1) ,"true positive","other")
rain_df["TN Status"] = np.where((rain_df["Prediction"] == 0) & (rain_df["Actual"] == 0) ,"true negative","other")

In [9]:
rain_df.head()

Unnamed: 0,Area Code,Prediction,Actual,TP Status,TN Status
0,area 1,1,1,true positive,other
1,area 2,0,1,other,other
2,area 3,0,0,other,true negative
3,area 4,1,1,true positive,other
4,area 5,1,0,other,other


In [10]:
# Now let's find out total number of records present in the data and store it in a variable
total_record = sum(rain_df["Area Code"].value_counts())
print(total_record)

10


In [11]:
# Now let's find the total number of record where the status is True Positive and True Negative
tp_count = len(rain_df[rain_df["TP Status"]=="true positive"])
print(tp_count)


3


In [12]:
tn_count = len(rain_df[rain_df["TN Status"]=="true negative"])
print(tn_count)

3


In [13]:
# Now the accuracy is nothing but the division value of (true positive+true negative) and total records , so let's find out the accuracy as below
accuracy = (tp_count+tn_count)/total_record
print(f"The accuracy is {accuracy}")

The accuracy is 0.6


In [14]:
''' Now that we have found the accuracy based on maths , let's use the sklearn method that we have in place'''
from sklearn.metrics import accuracy_score

In [15]:
accuracy_sklearn = accuracy_score(rain_df['Prediction'],rain_df['Actual'])
print(f"the accuracy based on sklearn metrics is {accuracy_sklearn}")

the accuracy based on sklearn metrics is 0.6


Hence our background math is proven by the sklearn metrics as well

Now let's talk about false positive and false negative as well

    -   Wherever the prediction says "rain" but actual data says "no rain" is False Positive
    -   Wherever the prediction says "No rain" but actual data says "rain" is False Negative

In [16]:
rain_df["FP Status"] = np.where((rain_df["Prediction"] == 1) & (rain_df["Actual"] == 0) ,"false positive","other")
rain_df["FN Status"] = np.where((rain_df["Prediction"] == 0) & (rain_df["Actual"] == 1) ,"false negative","other")

In [17]:
rain_df.head(10)

Unnamed: 0,Area Code,Prediction,Actual,TP Status,TN Status,FP Status,FN Status
0,area 1,1,1,true positive,other,other,other
1,area 2,0,1,other,other,other,false negative
2,area 3,0,0,other,true negative,other,other
3,area 4,1,1,true positive,other,other,other
4,area 5,1,0,other,other,false positive,other
5,area 6,0,1,other,other,other,false negative
6,area 7,0,1,other,other,other,false negative
7,area 8,0,0,other,true negative,other,other
8,area 9,0,0,other,true negative,other,other
9,area 10,1,1,true positive,other,other,other


#### Precision

precision is calulated with the below formula 

    -   TruePositives / (TruePositives + FalsePositives) 

In [18]:
''' Let's find out precision based on our data and applying the above formula'''
fp_count = len(rain_df[rain_df["FP Status"]=="false positive"])
fn_count = len(rain_df[rain_df["FN Status"]=="false negative"])

In [19]:
print(f"the true positive is {tp_count}")
print(f"the true negative is {tn_count}")
print(f"the false positive is {fp_count}")
print(f"the false negative is {fn_count}")

the true positive is 3
the true negative is 3
the false positive is 1
the false negative is 3


In [20]:
''' So as per the formula precision is here'''
precision = tp_count/(tp_count+fp_count)
print(f"the precision is {precision}")

the precision is 0.75


In [21]:
''' Now that we have found the precision based on maths , let's use the sklearn method that we have in place'''
from sklearn.metrics import precision_score

In [22]:
precision_sklearn = precision_score(rain_df['Prediction'],rain_df['Actual'])
print(f"the precision based on sklearn metrics is {precision_sklearn}")

the precision based on sklearn metrics is 0.5
