# Accuracy

**Definition:**  
Accuracy is the ratio of correctly predicted observations to the total observations. It measures how often the classifier is correct across all classes. Accuracy provides a general sense of model performance, making it a commonly used metric for evaluating classification models.

**Formula:**

$$
\text{Accuracy} = \frac{\text{True Positives (TP)} + \text{True Negatives (TN)}}{\text{Total Observations}}
$$

where:
- **True Positives (TP):** Cases where the model correctly predicts the positive class.
- **True Negatives (TN):** Cases where the model correctly predicts the negative class.
- **Total Observations:** The total number of instances in the dataset.

**Importance of Accuracy:**
Accuracy is particularly useful in situations where the class distribution is balanced and there are equal costs for false positives and false negatives. It is widely used in various applications, such as:

- **Image Classification:** In identifying objects in images, accuracy can provide a straightforward measure of performance.
- **Spam Detection:** In email classification, a high accuracy indicates effective filtering of spam and legitimate emails.

**Interpretation:**
- **High Accuracy:** A high accuracy value (close to 1) indicates that the model makes correct predictions for the majority of instances.
  
- **Low Accuracy:** A low accuracy value suggests that the model fails to make correct predictions for a significant portion of the instances, which could be indicative of issues like model overfitting, underfitting, or class imbalance.

**Example:**
Consider a binary classification problem where we are predicting whether patients have a specific disease. Suppose we have the following results from our predictions:
- True Positives (TP): 50 (patients who actually have the disease and were correctly identified)
- True Negatives (TN): 30 (patients who do not have the disease and were correctly identified)
- False Positives (FP): 10 (patients who do not have the disease but were incorrectly identified as having it)
- False Negatives (FN): 10 (patients who have the disease but were incorrectly identified as not having it)

The total observations can be calculated as:

$$
\text{Total Observations} = TP + TN + FP + FN = 50 + 30 + 10 + 10 = 100
$$

Now we can calculate the accuracy:

$$
\text{Accuracy} = \frac{TP + TN}{\text{Total Observations}} = \frac{50 + 30}{100} = \frac{80}{100} = 0.80
$$

This indicates that the model correctly identifies 80% of all cases.

**Relation to Other Metrics:**
While accuracy is a useful metric, it is essential to consider it alongside other metrics, especially in cases of class imbalance:
- **Precision:** Measures the accuracy of positive predictions.
  
- **Recall:** Measures the model's ability to identify all positive instances.

- **F1 Score:** The harmonic mean of precision and recall, providing a single score that balances both metrics.

**Conclusion:**
Accuracy is a fundamental metric for evaluating classification models, providing a straightforward assessment of overall performance. However, it is crucial to consider accuracy in conjunction with other metrics, particularly in imbalanced datasets or applications where the cost of false predictions varies. By understanding accuracy alongside precision, recall, and the F1 score, practitioners can gain a comprehensive view of model effectiveness.


In [1]:
import numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix

y_true = np.array([1, 0, 1, 1, 0, 1, 0, 0, 1, 0])
y_pred = np.array([1, 0, 1, 0, 0, 1, 1, 0, 1, 0])

accuracy = accuracy_score(y_true, y_pred)

print(f"Predicted Labels: {y_pred}")
print(f"True Labels: {y_true}")
print(f"Accuracy: {accuracy:.2f}")

conf_matrix = confusion_matrix(y_true, y_pred)

print("\nConfusion Matrix:")
print(conf_matrix)

TP = conf_matrix[1, 1]  # True Positives
TN = conf_matrix[0, 0]  # True Negatives
FP = conf_matrix[0, 1]  # False Positives
FN = conf_matrix[1, 0]  # False Negatives

print(f"\nTrue Positives (TP): {TP}")
print(f"True Negatives (TN): {TN}")
print(f"False Positives (FP): {FP}")
print(f"False Negatives (FN): {FN}")

Predicted Labels: [1 0 1 0 0 1 1 0 1 0]
True Labels: [1 0 1 1 0 1 0 0 1 0]
Accuracy: 0.80

Confusion Matrix:
[[4 1]
 [1 4]]

True Positives (TP): 4
True Negatives (TN): 4
False Positives (FP): 1
False Negatives (FN): 1
