## Bite size #1 

- Difference between binary cross-entropy and cross-entropy
- how do they behave when they have 2 classes and more

Formula of binary cross entropy loss

$\text{BCE} = -\frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(p(y_i)) + (1 - y_i) \log(1 - p(y_i)) \right]$

Formula for multi-category cross entropy

$\text{CE} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{k=1}^{K} y_{ik} \log(p(y_{ik}))$


In [1]:
# np.eye() --> "eye" is a play on words: it sounds like "I", which is commonly used to denote the identity matrix in linear algebra
import numpy as np

# Generate random predictions and actual labels for N instances
N = 1000
y_pred = np.random.rand(N)  # Predicted probabilities for class 1
y_actual = np.random.randint(0, 2, N)  # Actual labels (0 or 1)

# Calculate Binary Cross-Entropy (BCE)
bce_loss = -np.mean(y_actual * np.log(y_pred) + (1 - y_actual) * np.log(1 - y_pred))

# Calculate Cross-Entropy (CE) for 2 classes
y_pred_0 = 1 - y_pred  # Predicted probabilities for class 0
y_actual_0 = 1 - y_actual  # Actual labels for class 0
ce_loss_2class = -np.mean(y_actual * np.log(y_pred) + y_actual_0 * np.log(y_pred_0))

# Generate predictions and actual labels for 3-class problem
y_pred_3class = np.random.rand(N, 3)  # Predicted probabilities for 3 classes
y_pred_3class = y_pred_3class / np.sum(y_pred_3class, axis=1, keepdims=True)  # Normalize to sum to 1
y_actual_3class = np.random.randint(0, 3, N)  # Actual labels (0, 1, or 2)
y_actual_3class_onehot = np.eye(3)[y_actual_3class]  # One-hot encode actual labels

# Calculate Cross-Entropy (CE) for 3 classes
ce_loss_3class = -np.mean(np.sum(y_actual_3class_onehot * np.log(y_pred_3class), axis=1))

bce_loss, ce_loss_2class, ce_loss_3class


(1.0261140715602373, 1.0261140715602373, 1.3430935021291543)

In [2]:
from scipy.special import xlogy
from sklearn.metrics import log_loss

# Calculate Binary Cross-Entropy (BCE) using Scipy
bce_loss_scipy = -np.mean(xlogy(y_actual, y_pred) + xlogy(1 - y_actual, 1 - y_pred))

# Calculate Cross-Entropy (CE) for 2 classes using Scikit-learn
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html
ce_loss_2class_sklearn = log_loss(y_actual, np.column_stack([y_pred_0, y_pred]))

# Calculate Cross-Entropy (CE) for 3 classes using Scikit-learn
ce_loss_3class_sklearn = log_loss(y_actual_3class, y_pred_3class)

bce_loss_scipy, ce_loss_2class_sklearn, ce_loss_3class_sklearn


(1.0261140715602373, 1.0261140715602373, 1.3430935021291541)

When there are only two classes (let's say class 0 and class 1), we can represent $y_0 = 1 - y_1$ and $y_1 = y$. Also, due to the softmax operation, $p_0 = 1-p$. Now we can rewrite the multi-class formula in terms of these variables and it would be the same as the binary cross entropy.

# Bite size #2
- difference between the input of F.binary_cross_entropy and F.cross_entropy
- https://pytorch.org/docs/stable/generated/torch.nn.functional.binary_cross_entropy.html
- https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html


In [7]:
import torch
import torch.nn.functional as F

# Binary classification
# Input is probability after sigmoid
# input_prob = torch.tensor([1.7, 0.3], requires_grad=True) --> will get error
input_prob = torch.tensor([0.7, 0.3], requires_grad=True)
target = torch.tensor([1., 0.])
loss_binary = F.binary_cross_entropy(input_prob, target)
print(f"Binary Cross Entropy Loss: {loss_binary.item()}")

# Multi-class classification
# Input is raw logits before softmax
input_logits = torch.tensor([[1.5, -0.5], [0.5, 1.5]], requires_grad=True)
target = torch.tensor([0, 1])
loss_cross = F.cross_entropy(input_logits, target)
print(f"Cross Entropy Loss: {loss_cross.item()}")


Binary Cross Entropy Loss: 0.3566749691963196
Cross Entropy Loss: 0.22009485960006714
