# Precision

**Definition:**  
Precision is the ratio of correctly predicted positive observations to the total predicted positives. It answers the question: *Of all the instances that were predicted as positive, how many were actually positive?* Precision is a key metric in evaluating the performance of classification models, particularly in cases where the cost of false positives is high.

**Formula:**

$$
\text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}}
$$

**Key Terms:**
- **True Positives (TP):** These are cases where the model correctly predicts the positive class. For instance, in a spam detection system, true positives would be emails that are correctly identified as spam.
  
- **False Positives (FP):** These are cases where the model incorrectly predicts the positive class. Continuing with the spam example, false positives would be legitimate emails that the model mistakenly identifies as spam.
  
- **False Negatives (FN):** These are cases where the model fails to predict the positive class. In our spam detection example, false negatives would be spam emails that the model incorrectly classifies as not spam.

**Importance of Precision:**
Precision is particularly crucial in applications where false positives can lead to significant negative consequences. For example:

- **Medical Diagnoses:** In a cancer screening test, a false positive could lead to unnecessary biopsies and anxiety for patients.
  
- **Fraud Detection:** In financial transactions, a false positive may result in blocking legitimate transactions, causing inconvenience for customers.

High precision indicates that the model makes reliable positive predictions, which is essential in such sensitive applications.

**Interpretation:**
- **High Precision:** A high precision value (close to 1) means that when the model predicts a positive class, it is highly likely to be correct. This means the model has a low rate of false positives.
  
- **Low Precision:** A low precision value indicates that the model generates a significant number of false positives, which can mislead stakeholders and lead to unnecessary actions.

**Example:**
Let's consider a binary classification problem in a medical testing scenario where the task is to identify patients with a specific disease.

Suppose a model predicts 100 patients as having the disease, with the following results:
- True Positives (TP): 70 (patients who actually have the disease and were correctly identified)
- False Positives (FP): 30 (patients who do not have the disease but were incorrectly identified as having it)

Using the precision formula, we calculate precision as follows:

$$
\text{Precision} = \frac{TP}{TP + FP} = \frac{70}{70 + 30} = \frac{70}{100} = 0.7
$$

This indicates that 70% of the patients predicted to have the disease actually do have it. In this case, the precision score is relatively high, but there are still a significant number of false positives that need to be addressed.

**Relation to Other Metrics:**
Precision is often discussed alongside other important metrics, such as recall and F1 score:

- **Recall (Sensitivity):** Recall measures the proportion of actual positives that were correctly identified by the model. It answers the question: *Of all the actual positive instances, how many did we correctly predict?*

$$
\text{Recall} = \frac{TP}{TP + FN}
$$

- **F1 Score:** The F1 score is the harmonic mean of precision and recall, providing a single score that balances both metrics. It is particularly useful in situations with imbalanced datasets, where one class may be significantly more prevalent than the other.

$$
\text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}
$$

**Conclusion:**
Precision is a valuable metric for evaluating classification models, especially in cases where the consequences of false positives are significant. Understanding precision helps practitioners make informed decisions about model performance and suitability for specific applications. By considering precision along with recall and other metrics, stakeholders can obtain a more comprehensive view of model performance.


In [1]:
import numpy as np
from sklearn.metrics import precision_score, confusion_matrix

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

precision = precision_score(y_true, y_pred)

print(f"Predicted Labels: {y_pred}")
print(f"True Labels: {y_true}")
print(f"Precision: {precision:.2f}")

conf_matrix = confusion_matrix(y_true, y_pred)

print("\nConfusion Matrix:")
print(conf_matrix)

TP = conf_matrix[1, 1]  # True Positives
FP = conf_matrix[0, 1]  # False Positives
FN = conf_matrix[1, 0]  # False Negatives

print(f"\nTrue Positives (TP): {TP}")
print(f"False Positives (FP): {FP}")
print(f"False Negatives (FN): {FN}")

Predicted Labels: [1 0 1 0 0 1 1 0 1 0]
True Labels: [1 0 1 1 0 1 0 0 1 0]
Precision: 0.80

Confusion Matrix:
[[4 1]
 [1 4]]

True Positives (TP): 4
False Positives (FP): 1
False Negatives (FN): 1
