# ⚡ Quick Dip ⚡

Here we are looking at the **inside** of the `multiclass F1 score` in PyTorch.  
Specifically, the **'macro'** score 🧠.

### 🎯 Macro F1:
> Calculate metrics for each class separately and return their unweighted mean.  
> Classes with 0 true and predicted instances are ignored 🚫.

I **want** the result to be given as a **log string** in Python,  
but the string is the **raw markdown code** 📜.

They claim that the function will **ignore non-present classes**,  
however, we are about to **check** here 🕵️‍♂️.

We are checking because **PyTorch** likes to throw these warnings ⚠️:
```WARNING:root:Warning: Some classes do not exist in the target. F1 scores for these classes will be cast to zeros.```


And if we cast the F1 score to zero and still counted the class,  
that would royally **mess up** the scores! 😱 So we will **double-check** here 🔍.

In [20]:
import torch
from torcheval.metrics.functional import multiclass_f1_score


In [21]:
target = torch.tensor([2,2,2,2,1,1,1,0,0])
pred = torch.tensor([2,2,2,1,1,1,2,0,0])

multiclass_f1_score(pred, target, num_classes=3, average=None)

tensor([1.0000, 0.6667, 0.7500])

In [22]:
target = torch.tensor([2,2,2,2,1,1,1,0,0])
pred = torch.tensor([2,2,2,1,1,1,2,0,0])

multiclass_f1_score(pred, target, num_classes=3, average="macro")

tensor(0.8056)

So if we remove all the 0 instances here, then the **macro F1** should be:

$$
\frac{(0.75 + 0.6667)}{2} = 0.70833
$$

and not:

$$
\frac{(0.75 + 0.6667 + 0)}{3} = 0.4722
$$

In [23]:
target = torch.tensor([2,2,2,2,1,1,1])
pred = torch.tensor([2,2,2,1,1,1,2])

multiclass_f1_score(pred, target, num_classes=3, average='macro')



tensor(0.7083)

So it works properly!! WHOOP WHOOP

---

Lets double check if this woudl lead to the same result.

In [24]:
target0 = torch.tensor([2,2,2,2,1,1,1,0,0])
pred0  = torch.tensor([2,2,2,1,1,1,2,1,1])
target1 = torch.tensor([2,2,2,2,1,1,1])
pred1 = torch.tensor([2,2,2,1,1,1,2])

target_cat = torch.concat((target0, target1))
pred_cat = torch.concat((pred0, pred1))
multiclass_f1_score(pred_cat, target_cat, num_classes=3, average='macro')

tensor(0.4405)

In [25]:
f1_0 = multiclass_f1_score(pred0, target0, num_classes=3, average='macro')
f1_1 = multiclass_f1_score(pred1, target1, num_classes=3, average='macro')

print(f'f1_0:{f1_0}')
print(f'f1_1:{f1_1}')
print(f'Combined: {(f1_0 + f1_1) / 2}')



f1_0:0.4166666567325592
f1_1:0.7083333730697632
Combined: 0.5625


So this is not perfect it is more optomistic than it really should be but is it better than micro?

In [26]:
target0 = torch.tensor([2,2,2,2,1,1,1,0,0])
pred0  = torch.tensor([2,2,2,1,1,1,2,1,1])
target1 = torch.tensor([2,2,2,2,1,1,1])
pred1 = torch.tensor([2,2,2,1,1,1,2])

target_cat = torch.concat((target0, target1))
pred_cat = torch.concat((pred0, pred1))
multiclass_f1_score(pred_cat, target_cat, num_classes=3, average='micro')

tensor(0.6250)

In [27]:
f1_0 = multiclass_f1_score(pred0, target0, num_classes=3, average='micro')
f1_1 = multiclass_f1_score(pred1, target1, num_classes=3, average='micro')

print(f'f1_0:{f1_0}')
print(f'f1_1:{f1_1}')
print(f'Combined: {(f1_0 + f1_1) / 2}')

f1_0:0.5555555820465088
f1_1:0.714285671710968
Combined: 0.634920597076416


So its better than micro but still more optomistic than the real macro f1 score