<a href="https://colab.research.google.com/github/Vishal-113/NLP-2/blob/main/Evaluation_Metrices_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Perfect — let’s carefully compute step by step before writing the Python code.

We have a **3-class confusion matrix**:

| System \ Gold      | Cat | Dog | Rabbit | **Row sum (Predicted)** |
| ------------------ | --- | --- | ------ | ----------------------- |
| **Cat**            | 5   | 10  | 5      | 20                      |
| **Dog**            | 15  | 20  | 10     | 45                      |
| **Rabbit**         | 0   | 15  | 10     | 25                      |
| **Col sum (Gold)** | 20  | 45  | 25     | 90                      |


## **1. Per-Class Metrics**

**Definitions:**

* Precision(c) = TP / (TP + FP).
* Recall(c) = TP / (TP + FN).


### **Cat**

* TP = 5.
* Predicted Cat = 20 → FP = 20 − 5 = 15.
* Actual Cat = 20 → FN = 20 − 5 = 15.
* Precision = 5/20 = **0.25**.
* Recall = 5/20 = **0.25**.


### **Dog**

* TP = 20.
* Predicted Dog = 45 → FP = 45 − 20 = 25.
* Actual Dog = 45 → FN = 45 − 20 = 25.
* Precision = 20/45 ≈ **0.4444**.
* Recall = 20/45 ≈ **0.4444**.


### **Rabbit**

* TP = 10.
* Predicted Rabbit = 25 → FP = 25 − 10 = 15.
* Actual Rabbit = 25 → FN = 25 − 10 = 15.
* Precision = 10/25 = **0.4**.
* Recall = 10/25 = **0.4**.


 ## **2 .**Per-Class Results**

* Cat: Precision = 0.25, Recall = 0.25
* Dog: Precision ≈ 0.444, Recall ≈ 0.444
* Rabbit: Precision = 0.40, Recall = 0.40


## **3. Macro vs. Micro Averaging**

### **Macro-Averaged Precision/Recall**

* Precision\_macro = (0.25 + 0.444 + 0.40) / 3 = 1.094 / 3 ≈ **0.3647**.
* Recall\_macro = same (since per-class precision = recall in this matrix) ≈ **0.3647**.


### **Micro-Averaged Precision/Recall**

* Micro = aggregate TP / aggregate predicted (precision), aggregate TP / aggregate actual (recall).
* Total TP = 5 + 20 + 10 = 35.
* Total predicted = 90, total actual = 90.
* Precision\_micro = Recall\_micro = 35/90 ≈ **0.3889**.


### **Interpretation**

* **Macro averaging**: treats each class equally, regardless of size (good when class balance matters).
* **Micro averaging**: aggregates across all classes, so larger classes (Dog here) influence more (good when overall accuracy matters).






In [1]:
import numpy as np

# Confusion matrix
confusion = np.array([
    [5, 10, 5],   # Predicted Cat
    [15, 20, 10], # Predicted Dog
    [0, 15, 10]   # Predicted Rabbit
])

classes = ["Cat", "Dog", "Rabbit"]
num_classes = len(classes)

# Per-class precision and recall
precisions = []
recalls = []

for i in range(num_classes):
    TP = confusion[i, i]
    FP = confusion[i, :].sum() - TP
    FN = confusion[:, i].sum() - TP

    precision = TP / (TP + FP) if (TP + FP) > 0 else 0
    recall = TP / (TP + FN) if (TP + FN) > 0 else 0

    precisions.append(precision)
    recalls.append(recall)
    print(f"{classes[i]} -> Precision: {precision:.4f}, Recall: {recall:.4f}")

# Macro averages
macro_precision = np.mean(precisions)
macro_recall = np.mean(recalls)

# Micro averages
TP_total = np.trace(confusion)
total_pred = confusion.sum()
micro_precision = TP_total / total_pred
micro_recall = TP_total / total_pred  # same for precision and recall in multiclass

print("\nMacro-Averaged Precision:", round(macro_precision, 4))
print("Macro-Averaged Recall:", round(macro_recall, 4))
print("Micro-Averaged Precision:", round(micro_precision, 4))
print("Micro-Averaged Recall:", round(micro_recall, 4))


Cat -> Precision: 0.2500, Recall: 0.2500
Dog -> Precision: 0.4444, Recall: 0.4444
Rabbit -> Precision: 0.4000, Recall: 0.4000

Macro-Averaged Precision: 0.3648
Macro-Averaged Recall: 0.3648
Micro-Averaged Precision: 0.3889
Micro-Averaged Recall: 0.3889
