# Binary Cross-Entropy Loss

**Binary Cross-Entropy** is a commonly used loss function mostly used in binary classification problems.   
It <u>measures the dissimilarity</u> between predicted probabilities and true binary labels.

$$H_p(q) = - \frac{1}{N}\sum_{i=1}^{N}y_{i}\cdot log(p(y_{i})) + (1 - y_{i}) \cdot log(1 - p(y_{i}))$$

In this formula, $N$ represents the number of instances in the dataset, $y_{i}$ is the true binary label for instance $i$, and $p(y_{i})$ is the predicted probability for instance $i$.

The **BCE loss** is commonly used in machine learning frameworks like *PyTorch*, *TensorFlow*, and *scikit-learn*. It serves as an *objective* or *optimization function* during the training process, guiding the model to make accurate predictions and update its parameters to minimize the loss.  

To train a binary classification model, the BCE loss is calculated for each data instance in the training set. The overall loss is obtained by taking the average or sum of the individual losses, depending on the implementation.

By <font color="green" size="2">**minimizing the BCE loss**</b></font>, the model learns to <font color="orange" size="2"><b>assign higher probabilities to positive instances (belonging to the positive class) and lower probabilities to negative instances  (belonging to the negative class)</b></font>. This helps the model in making more accurate predictions on unseen data.

Let's compute BCE Loss using different methods and compare the results.

In [1]:
import numpy as np

import torch
import torch.nn as nn

from sklearn.metrics import log_loss

### Compute Binary Cross-Entropy Loss in PyTorch

Below the `y_true` variable represents the true binary labels, and `y_pred` represents predicted probabilities.

In [2]:
# Simulated binary labels and predicted probabilities
y_true = torch.tensor([1, 0, 1, 0, 0, 1, 1, 1, 0, 0], dtype=torch.float32)  # True binary labels
y_proba = torch.tensor([0.7, 0.2, 0.4, 0.3, 0.8, 0.6, 0.5, 0.9, 0.6, 0.2], dtype=torch.float32)  # Predicted probabilities

In practice, we would usually apply an activation function (such as sigmoid for binary classification tasks) to the model's final layer to obtain the probabilities between 0 and 1:
```python
# Applying sigmoid activation to obtain predicted probabilities
y_proba = torch.sigmoid(y_proba)
```

Next, we use PyTorch's `nn.BCELoss` class to compute the Binary Cross-Entropy loss.

In [3]:
# Compute Binary Cross-Entropy loss
bce_loss = nn.BCELoss()(y_proba, y_true)

# Extract the loss value as scalar using .item() method
pytorch_bce_loss = round(bce_loss.item(), 5)

print("Binary Cross-Entropy Loss:", pytorch_bce_loss)

Binary Cross-Entropy Loss: 0.5911


### Compute Binary Cross-Entropy Loss with SciKit-Learn

Same as in the example above, `y_true` represents true binary labels and `y_proba` represents predicted probabilities, written as numpy arrays.

In [4]:
# Simulated binary labels and predicted probabilities
y_true = np.array([1, 0, 1, 0, 0, 1, 1, 1, 0, 0])
y_proba = np.array([0.7, 0.2, 0.4, 0.3, 0.8, 0.6, 0.5, 0.9, 0.6, 0.2])

# Compute Binary Cross-Entropy Loss via sklearn's log_loss
sklearn_bce_loss = round(log_loss(y_true, y_proba), 5)
print("Average Binary Cross-Entropy Loss:", bce_loss)

Average Binary Cross-Entropy Loss: tensor(0.5911)


### Compute Binary Cross-Entropy Loss with Numpy

Same as in the example above, `y_true` represents true binary labels and `y_proba` represents predicted probabilities, written as numpy arrays.

In [5]:
# Simulated binary labels and predicted probabilities
y_true = np.array([1, 0, 1, 0, 0, 1, 1, 1, 0, 0])
y_proba = np.array([0.7, 0.2, 0.4, 0.3, 0.8, 0.6, 0.5, 0.9, 0.6, 0.2])

Again, in practice we would have to apply an activation function (such as sigmoid) to the model's final layer to obtain the probabilities between 0 and 1.   

For binary classification tasts this can be achieved via applying simple sigmoid function.

```python
def sigmoid(arr):
    """
    Apply the sigmoid activation function element-wise to an array.

    Parameters:
        arr (numpy.ndarray): Array of numeric values.

    Returns:
        numpy.ndarray: Array with sigmoid activation applied element-wise.
    """
    return 1 / (np.ones(len(arr)) + np.exp(-arr))

y_proba = sigmoid(y_proba)
```

Let's define a custom function to compute Binary Cross-Entropy Loss:

In [6]:
def binary_cross_entropy(y_true, y_pred):
    """
    Compute Binary Cross-Entropy (BCE) loss.

    Parameters:
        y_true (numpy.ndarray): Array of true binary labels (0 or 1).
        y_pred (numpy.ndarray): Array of predicted probabilities between 0 and 1.

    Returns:
        numpy.ndarray: Array of Binary Cross-Entropy (BCE) loss values.
    """
    bce_loss = -(y_true * np.log(y_pred) + (np.ones(len(y_true)) - y_true) * np.log(np.ones(len(y_pred)) - y_pred))
    
    return bce_loss

In [7]:
# Compute Binary Cross-Entropy Loss element wise
bce_loss = binary_cross_entropy(y_true, y_proba)

# Compute average Binary Cross-Entropy Loss
numpy_bce_loss = round(np.mean(bce_loss), 5)

print("Average Binary Cross-Entropy Loss:", numpy_bce_loss)

Average Binary Cross-Entropy Loss: 0.5911


### Comparison of the BCE computation results between PyTorch, Sci-kit Learn and Numpy

Recall that we have previously saved Binary Cross-Entropy computation results rounded to 5 decimal numbers as:  
`pytorch_bce_loss`, `sklearn_bce_loss` and `numpy_bce_loss` variables

In [8]:
# Check for equality:
pytorch_bce_loss == sklearn_bce_loss == numpy_bce_loss

True