### Precision

Precision in classification metrics refers to the proportion of true positive predictions among all positive predictions made by the model. It measures the accuracy of the positive predictions made by the model.

Mathematically, precision is calculated as:

`Precision = True Positives / (True Positives + False Positives)`

In other words, precision answers the question: `"Of all the items the model predicted as positive, how many were actually positive?"`

High precision indicates that the model makes few false positive predictions, which means it is good at identifying positive cases accurately. However, high precision does not necessarily mean the model is making accurate predictions overall; it could still be missing a lot of positive cases (false negatives).

Precision is often used in combination with other metrics like recall, F1 score, or accuracy to evaluate the performance of a classification model comprehensively.

### Example

Let's consider a binary classification problem where we're trying to predict whether emails are spam or not spam (ham). 

Let's say we have a dataset with 100 emails, out of which 30 are actually spam and 70 are not spam. Now, our classifier predicts 20 emails as spam, out of which 15 are correctly classified as spam (True Positives) and 5 are incorrectly classified as spam (False Positives).

In this example:

- True Positives (TP) = 15
- False Positives (FP) = 5

Using the formula:

`Precision = True Positives / (True Positives + False Positives)`

Precision = 15 / 15 + 5 = 15 / 20 = 0.75 

So, the precision of our classifier for predicting spam emails is 0.75 or 75%.

This means that out of all the emails our classifier predicted as spam, 75% of them were actually spam. In other words, when our classifier says an email is spam, it is correct 75% of the time.

![Formula](https://miro.medium.com/v2/resize:fit:640/format:webp/1*7J08ekAwupLBegeUI8muHA.png)

### Recall

Recall, also known as sensitivity or true positive rate, is a classification metric that measures the ability of a model to identify all relevant instances, or in other words, it measures the proportion of actual positives that were correctly identified by the model.

Mathematically, recall is calculated as:

`ecall = True Positives / (True Positives + False Negatives)`

In simple terms, recall answers the question:` "Of all the actual positive cases, how many did the model correctly identify as positive?`"

A high recall indicates that the model is good at capturing positive instances, minimizing the number of false negatives (instances wrongly classified as negative). However, a high recall may come at the expense of precision, where the model may also classify many negatives as positives, leading to false positives.

Recall is often used in combination with precision, F1 score, or accuracy to provide a comprehensive evaluation of a classification model's performance.

### Example

Let's consider a simplified example using a cancer dataset. Suppose we have a dataset of 100 patients who underwent cancer testing, out of which 30 patients have cancer (positive cases) and 70 do not have cancer (negative cases). Our classifier predicts 40 patients as having cancer.

Out of these predictions, let's say 25 patients who actually have cancer are correctly identified by the classifier (True Positives), but 5 patients with cancer are missed by the classifier (False Negatives).

In this example:

- True Positives (TP) = 25
- False Negatives (FN) = 5

Using the formula for recall:

Recall = True Positives / (True Positives + False Negatives)

Recall = 25 / 25 + 5 = 25 / 30 = 0.83

So, the recall of our classifier for predicting cancer cases is 0.83 or 83%.

This means that out of all the patients who actually have cancer, our classifier correctly identified 83% of them. In other words, when a patient truly has cancer, the classifier was able to detect it correctly 83% of the time.

### Precision & Recall questions

These two questions address different aspects of the model's performance in a classification task:

1. **"Of all the actual positive cases, how many did the model correctly identify as positive?"** - This question is related to the concept of `recall`. It focuses on evaluating how well the model identifies instances of the positive class from the entire pool of positive instances in the dataset. It measures the model's ability to capture all relevant positive cases and avoid missing any `(i.e., minimizing false negatives)`.

2. **"Of all the items the model predicted as positive, how many were actually positive?"** - This question pertains to `precision`. It assesses the accuracy of the model's positive predictions among all instances that it classified as positive. It measures the model's ability to avoid incorrectly labeling negative instances as positive `(i.e., minimizing false positives)`.

In summary, while recall emphasizes the model's ability to find all relevant instances of the positive class, precision focuses on the model's accuracy in labeling instances as positive. Both metrics are essential in evaluating the overall effectiveness of a classification model, as they provide complementary insights into its performance.

### F1 score

The F1 score is a metric used to evaluate the performance of a classification model. It is the harmonic mean of precision and recall, providing a balance between these two metrics.

`When it is hard to decide what to minimizing false positives or false negatives we use F1 score`

The formula for the F1 score is:

`F1 Score = 2 * (Precision * Recall) / (Precision + Recall)`

or equivalently:

`F1 Score = 2 * True Positives / (2 * True Positives + False Positives + False Negatives)`

The F1 score takes both false positives and false negatives into account, making it a useful metric when the class distribution is imbalanced. It reaches its best value at 1 and worst value at 0.

In summary, the F1 score provides a single value that represents the balance between precision and recall, making it a useful metric for evaluating the overall performance of a classification model.

### Why to use F1 Score

While precision and recall are useful metrics individually, they may not always provide a complete picture of the performance of a classification model, especially when dealing with imbalanced datasets.

Here are some reasons why the F1 score, which combines precision and recall, is commonly used:

1. **Balance between Precision and Recall**: The F1 score provides a balance between precision and recall. It takes into account both false positives (which affect precision) and false negatives (which affect recall). This is particularly important when the cost of false positives and false negatives differs significantly.

2. **Imbalanced Datasets**: In datasets where one class is significantly more prevalent than the other, precision or recall alone may not adequately reflect the model's performance. The F1 score considers both true positives and false negatives, making it more suitable for imbalanced datasets.

3. **Single Metric**: Using the F1 score allows for a single metric to summarize the performance of the model. This simplifies the evaluation process, especially when comparing multiple models or tuning hyperparameters.

4. **Trade-off between Precision and Recall**: The F1 score captures the trade-off between precision and recall. In some applications, such as information retrieval or medical diagnostics, there is a trade-off between precision and recall, and the F1 score helps in assessing this balance.

5. **Threshold Selection**: The F1 score can be useful for selecting the optimal classification threshold. By evaluating the F1 score at different thresholds, one can find the threshold that maximizes the balance between precision and recall.

In summary, while precision and recall are informative on their own, the F1 score provides a more comprehensive evaluation of a classification model's performance, especially in scenarios involving imbalanced datasets or when there's a need to balance precision and recall.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('heart.csv')

In [3]:
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,52,1,0,125,212,0,1,168,0,1.0,2,2,3,0
1,53,1,0,140,203,1,0,155,1,3.1,0,0,3,0
2,70,1,0,145,174,0,1,125,1,2.6,0,0,3,0
3,61,1,0,148,203,0,1,161,0,0.0,2,1,3,0
4,62,0,0,138,294,1,1,106,0,1.9,1,3,2,0


In [4]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(df.iloc[:,0:-1],df.iloc[:,-1],test_size=0.2,random_state=2)

In [5]:
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

In [6]:
clf1 = LogisticRegression()
clf2 = DecisionTreeClassifier()

In [7]:
clf1.fit(X_train,y_train)
clf2.fit(X_train,y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [8]:
y_pred1 = clf1.predict(X_test)
y_pred2 = clf2.predict(X_test)

In [9]:
from sklearn.metrics import confusion_matrix

In [10]:
confusion_matrix(y_test, y_pred1)

array([[82, 23],
       [10, 90]], dtype=int64)

In [11]:
confusion_matrix(y_test, y_pred2)

array([[101,   4],
       [  0, 100]], dtype=int64)

In [12]:
from sklearn.metrics import recall_score,precision_score,f1_score

In [13]:
print("For Logistic regression Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred1),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred1))
print("Recall - ",recall_score(y_test,y_pred1))
print("F1 score - ",f1_score(y_test,y_pred1))

For Logistic regression Model
--------------------------------------------------
    0   1
0  82  23
1  10  90
--------------------------------------------------
Precision -  0.7964601769911505
Recall -  0.9
F1 score -  0.8450704225352113


In [14]:
print("For DT Model")
print("-"*50)
cdf = pd.DataFrame(confusion_matrix(y_test,y_pred2),columns=list(range(0,2)))
print(cdf)
print("-"*50)
print("Precision - ",precision_score(y_test,y_pred2))
print("Recall - ",recall_score(y_test,y_pred2))
print("F1 score - ",f1_score(y_test,y_pred2))

For DT Model
--------------------------------------------------
     0    1
0  101    4
1    0  100
--------------------------------------------------
Precision -  0.9615384615384616
Recall -  1.0
F1 score -  0.9803921568627451


### Multi-Class Pricision and Recall

In the context of multi-class classification, precision, recall, and F1 score can be extended to evaluate the performance of the classifier across multiple classes. Here's how these metrics are defined for multi-class classification:

1. **Precision**: Precision for a particular class \( C_i \) is calculated as the ratio of true positives for class \( C_i \) to the sum of true positives and false positives for class \( C_i \).


Precision(C_i) = TP_C_i / (TP_C_i + FP_C_i)


2. **Recall**: Recall for a particular class \( C_i \) is calculated as the ratio of true positives for class \( C_i \) to the sum of true positives and false negatives for class \( C_i \).

Recall(C_i) = TP_C_i / (TP_C_i + FN_C_i)


3. **F1 Score**: The F1 score for a particular class \( C_i \) is the harmonic mean of precision and recall for that class.

F1_Score(C_i) = 2 * Precision(C_i) * Recall(C_i) / (Precision(C_i) + Recall(C_i))

To compute these metrics for multi-class classification, you would typically aggregate the true positives, false positives, and false negatives across all classes. Then, you can calculate the precision, recall, and F1 score for each class individually, or you can compute macro-average or micro-average scores across all classes.

- **Macro-average**: Computes the metric independently for each class and then takes the average. It gives equal weight to each class, regardless of class size.

- **Micro-average**: Aggregates the contributions of all classes to compute the average metric. It gives equal weight to each instance, regardless of its class.

- **Weighted-average**: Computes the average metric weighted by the number of true instances in each class. It gives more weight to larger classes.

These metrics provide insight into how well the classifier performs for each individual class and overall across all classes in the multi-class classification problem.

For Multi-Class caluculation refer this = https://youtu.be/iK-kdhJ-7yI?si=8xrN6Oen-ivRJ05P&t=1295

![](attachment:4fe074ab-4295-485b-a667-9745e8cdf3ac.png)



![](https://miro.medium.com/v2/resize:fit:828/format:webp/1*DdVtgn3uHgBp3a9qjU9hqg.png)