# F1 Score

**Definition:**  
The F1 score is the harmonic mean of precision and recall, providing a single score that balances both metrics. It is particularly useful in situations where there is an uneven class distribution, as it helps assess the model's accuracy in identifying positive instances while minimizing false positives and false negatives.

**Formula:**

$$
\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$

**Importance of the F1 Score:**
The F1 score is an important metric in classification tasks where both precision and recall are crucial. In scenarios where one metric might be favored over the other, the F1 score provides a balanced view of model performance. This is particularly relevant in fields such as:

- **Medical Diagnosis:** In scenarios where it is vital to accurately identify patients with a disease (high recall) while ensuring that those identified as positive actually have the disease (high precision).
  
- **Information Retrieval:** In search engines and recommendation systems, it’s important to return relevant results (high precision) while also retrieving as many relevant documents as possible (high recall).

**Interpretation:**
- **High F1 Score:** A high F1 score (close to 1) indicates a good balance between precision and recall, suggesting that the model is effective in making accurate predictions with minimal false positives and false negatives.
  
- **Low F1 Score:** A low F1 score indicates poor model performance, suggesting that either precision, recall, or both are low, leading to a large number of false predictions.

**Example:**
Consider a binary classification problem where we are predicting whether patients have a particular disease. Suppose we have the following results:
- True Positives (TP): 60 (patients who actually have the disease and were correctly identified)
- False Positives (FP): 10 (patients who do not have the disease but were incorrectly identified as having it)
- False Negatives (FN): 30 (patients who have the disease but were incorrectly identified as not having it)

First, we calculate precision and recall:

$$
\text{Precision} = \frac{TP}{TP + FP} = \frac{60}{60 + 10} = \frac{60}{70} \approx 0.857
$$

$$
\text{Recall} = \frac{TP}{TP + FN} = \frac{60}{60 + 30} = \frac{60}{90} \approx 0.667
$$

Now we can calculate the F1 score:

$$
\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} = 2 \times \frac{0.857 \times 0.667}{0.857 + 0.667} \approx 0.75
$$

This indicates a balanced performance of the model, but there is still room for improvement.

**Relation to Other Metrics:**
The F1 score is often discussed alongside precision and recall:
- **Precision:** Measures the accuracy of positive predictions.
  
- **Recall:** Measures the model's ability to identify all positive instances.

**Conclusion:**
The F1 score is a crucial metric for evaluating classification models, especially in cases where precision and recall are equally important. Understanding the F1 score helps practitioners make informed decisions about model performance and suitability for specific applications. By considering the F1 score along with precision and recall, stakeholders can obtain a more comprehensive view of model effectiveness.

In [3]:
import numpy as np
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)

print(f"Predicted Labels: {y_pred}")
print(f"True Labels: {y_true}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

f1 = f1_score(y_true, y_pred)

print(f"F1 Score: {f1:.2f}")

conf_matrix = confusion_matrix(y_true, y_pred)

print("\nConfusion Matrix:")
print(conf_matrix)

TP = conf_matrix[1, 1]  # True Positives
FP = conf_matrix[0, 1]  # False Positives
FN = conf_matrix[1, 0]  # False Negatives

print(f"\nTrue Positives (TP): {TP}")
print(f"False Positives (FP): {FP}")
print(f"False Negatives (FN): {FN}")

Predicted Labels: [1 0 1 0 0 1 1 0 1 0]
True Labels: [1 0 1 1 0 1 0 0 1 0]
Precision: 0.80
Recall: 0.80
F1 Score: 0.80

Confusion Matrix:
[[4 1]
 [1 4]]

True Positives (TP): 4
False Positives (FP): 1
False Negatives (FN): 1
