# Introduction to Loss Functions

In [3]:
import torch
import torch.nn as nn
from math import log

## Loss Functions for Regression

### Mean Squared Error (MSE) Loss in PyTorch

Mean Squared Error (MSE) is a common loss function for regression problems. It calculates the average of the squares of the differences between the predicted and actual values. Below is an example of calculating MSE Loss using built-in function in PyTorch:

In [None]:
predictions = torch.tensor([1.5, 2.5, 3.5])
targets = torch.tensor([1.0, 2.0, 3.0])

# Mean Squared Error Loss
mse_loss = nn.MSELoss()

# Calculate loss
loss = mse_loss(predictions, targets)
print(f"MSE Loss: {loss.item()}")

### Mean Absolute Error (MAE) Loss in PyTorch

Mean Absolute Error (MAE), also known as L1 loss, is another common loss function used in regression tasks. It calculates the average of the absolute differences between the predicted and actual values. Here's an example of PyTorch implementation:

In [None]:
predictions = torch.tensor([1.5, 2.5, 3.5])
targets = torch.tensor([1.0, 2.0, 3.0])

# Mean Absolute Error Loss
mae_loss = nn.L1Loss()

# Calculate loss
loss = mae_loss(predictions, targets)
print(f"MAE Loss: {loss.item()}")

## Loss Functions for Classification

 Below are the PyTorch code examples for Binary Cross-Entropy Loss, Cross-Entropy Loss, Kullback-Leibler Divergence Loss, and Negative Log Likelihood Loss:

### 1. Binary Cross-Entropy Loss

For binary classification problems:

In [19]:
# Example predictions and labels
predictions = torch.sigmoid(torch.randn(4))  # Sigmoid to get probabilities
labels = torch.tensor([1, 0, 1, 0], dtype=torch.float32)

# Binary Cross-Entropy Loss
criterion = nn.BCELoss()
loss = criterion(predictions, labels)

In [20]:
predictions

tensor([0.6882, 0.4268, 0.6981, 0.6192])

In [24]:
labels

tensor([1., 0., 1., 0.])

In [25]:
loss

tensor(0.5638)

In [23]:
-1/len(labels)*sum(labels*torch.log(predictions)+(1-labels)*torch.log(1-predictions))

tensor(0.5638)

```python
criterion = nn.BCELoss()
loss = criterion(predictions, labels)
```

- **Inputs**: The `predictions` should be probabilities (0 to 1), typically obtained with a sigmoid function.
- **Labels Format**: The `labels` should be binary (0 or 1) and match the shape of predictions.
- **Considerations**: Use `nn.BCEWithLogitsLoss` if the output layer of your model does not include a sigmoid.




### 2. Cross-Entropy Loss

For multi-class classification problems:

In [60]:
# Example predictions and labels
predictions = torch.randn(4, 5)  # 4 samples, 5 class predictions
labels = torch.tensor([1, 0, 3, 2], dtype=torch.long)  # Class labels for each sample

# Cross-Entropy Loss
criterion = nn.CrossEntropyLoss()
loss = criterion(predictions, labels)

In [61]:
predictions

tensor([[ 0.7114, -0.9152, -0.3899,  0.9435, -0.1819],
        [-0.6162, -0.0346, -0.5723, -1.1487, -1.4400],
        [ 0.3388, -0.9662, -0.2876, -0.1088, -0.1857],
        [-1.3066,  1.5789, -0.3382, -0.6364,  2.9867]])

In [62]:
labels

tensor([1, 0, 3, 2])

In [63]:
loss

tensor(2.3831)

In [66]:
softmax_predictions = torch.softmax(predictions, dim=1)
softmax_predictions

tensor([[0.3125, 0.0614, 0.1039, 0.3942, 0.1279],
        [0.2058, 0.3681, 0.2150, 0.1208, 0.0903],
        [0.3293, 0.0893, 0.1760, 0.2105, 0.1949],
        [0.0103, 0.1852, 0.0272, 0.0202, 0.7570]])

In [69]:
-1/4*sum(torch.tensor([torch.log(softmax_predictions[i][labels[i]]) for i in range(len(labels))]))

tensor(2.3831)

```python
criterion = nn.CrossEntropyLoss()
loss = criterion(predictions, labels)
```

- **Inputs**: The `predictions` (also known as logits) should not be passed through a softmax layer before this loss function, as `nn.CrossEntropyLoss` applies softmax internally.
- **Labels Format**: The `labels` should contain the class indices (not one-hot encoded) and should be of type `torch.long`.
- **Considerations**: This loss function is more numerically stable than using a separate softmax followed by a negative log likelihood loss.

### 3. Kullback-Leibler Divergence Loss

For comparing two probability distributions:

In [77]:
# Example predicted and target distributions
predicted_log_probs = torch.log_softmax(torch.randn(4, 5), dim=1)  # Log probabilities
true_probs = torch.softmax(torch.randn(4, 5), dim=1)

# Kullback-Leibler Divergence Loss
criterion = nn.KLDivLoss(reduction='batchmean')
loss = criterion(predicted_log_probs, true_probs)

In [78]:
predicted_log_probs

tensor([[-4.4963, -1.2036, -1.5853, -1.4722, -1.3688],
        [-2.6413, -2.2889, -2.2535, -0.5027, -2.1419],
        [-2.3875, -1.0028, -2.9582, -3.1580, -0.8055],
        [-2.3296, -0.4341, -2.2907, -2.1956, -3.1619]])

In [79]:
true_probs

tensor([[0.0345, 0.1775, 0.1944, 0.2313, 0.3622],
        [0.1333, 0.6237, 0.1228, 0.0795, 0.0406],
        [0.3026, 0.1969, 0.0661, 0.2365, 0.1979],
        [0.1164, 0.4287, 0.2206, 0.1695, 0.0648]])

In [80]:
loss

tensor(0.4276)

In [93]:
1/predicted_log_probs.shape[0]*sum(sum(true_probs*(torch.log(true_probs)-predicted_log_probs)))

tensor(0.4276)

```python
criterion = nn.KLDivLoss(reduction='batchmean')
loss = criterion(predicted_log_probs, true_probs)
```

- **Inputs**: `predicted_log_probs` should be log probabilities (use `torch.log_softmax`). The `true_probs` should be probabilities (use `torch.softmax` or equivalent).
- **Output Range**: The KL divergence can output a wider range of values compared to other loss functions, which might affect learning rates and convergence.
- **Considerations**: Ensure the dimensionality over which softmax and log-softmax are applied matches the expected dimensionality of the loss function.

### 4. Negative Log Likelihood Loss

Often used in combination with a log-softmax layer:

In [94]:
# Example log probabilities and labels
log_probs = torch.log_softmax(torch.randn(4, 5), dim=1)  # Log probabilities from log-softmax
labels = torch.tensor([1, 0, 3, 2])  # Class labels for each sample

# Negative Log Likelihood Loss
criterion = nn.NLLLoss()
loss = criterion(log_probs, labels)

In [95]:
log_probs

tensor([[-2.1952, -2.0639, -0.3655, -4.2204, -2.9338],
        [-2.2309, -1.6147, -1.2787, -1.6363, -1.5118],
        [-2.1938, -3.0970, -1.5740, -1.3624, -0.9675],
        [-1.9533, -1.1320, -3.2088, -1.2513, -1.5640]])

In [96]:
labels

tensor([1, 0, 3, 2])

In [97]:
loss

tensor(2.2165)

In [103]:
-1/4*(-2.0639-2.2309-1.3624-3.2088)

2.2165

In [102]:
criterion(torch.log_softmax(predictions, dim=1), labels)

tensor(2.3831)

Compare the output with the cross entropy loss results, what did you find?

```python
criterion = nn.NLLLoss()
loss = criterion(log_probs, labels)
```

- **Inputs**: The input should be log probabilities, typically obtained by applying `torch.log_softmax` to the neural network's output.
- **Labels Format**: Similar to `nn.CrossEntropyLoss`, the `labels` should contain class indices and be of type `torch.long`.
- **Usage Scenario**: Often used in conjunction with a network ending in a log_softmax layer. It's essentially like `CrossEntropyLoss` but requires the preceding layer to output log probabilities.

When implementing these loss functions, it's crucial to ensure that the input data and network architecture are compatible with the specific requirements of each function. Additionally, understanding the underlying mathematical principles can aid in debugging and fine-tuning the model training process.

## Create Custom Loss Function in PyTorch

### Creating custom loss function as a python function

Below, we'll create a simple Cross-Entropy Loss function:

In [5]:
def custom_cross_entropy_loss(y_pred, y_true):
  """
  Custom Cross-Entropy Loss implementation using PyTorch.
  :param y_pred: PyTorch tensor of predicted logits (not probabilities).
  :param y_true: PyTorch tensor of true labels.
  :return: Cross-entropy loss.
  """
  #Specifying the batch size
  my_batch_size = y_pred.size()[0]

  #Get the log probabilities values
  log_probabilities = torch.log_softmax(y_pred, dim=1)

  #Pick the probabilities corresponding to the true labels
  relevant_log_probs = log_probabilities[range(my_batch_size), y_true]

  #Take the negative and mean of these log probabilities
  loss = -torch.mean(relevant_log_probs)
  return loss

In [6]:
# Example usage
y_pred = torch.tensor([[1.5, 0.5, -0.5], [-0.5, 1.5, 0.5], [0.5, -0.5, 1.5]])  # Predicted logits for 3 classes
y_true = torch.tensor([0, 1, 2])  # True labels

loss = custom_cross_entropy_loss(y_pred, y_true)
print("Custom Cross-Entropy Loss:", loss.item())

Custom Cross-Entropy Loss: 0.40760597586631775


In [7]:
criterion = nn.CrossEntropyLoss()
loss_fn = criterion(y_pred, y_true)
print("Cross-Entropy Loss using nn.CrossEntropyLoss():",loss_fn.item())

Cross-Entropy Loss using nn.CrossEntropyLoss(): 0.40760597586631775


### Creating custom loss function with a class definition

Let's create a custom dice loss function, which computes the similarity between two samples, to act as a loss function for binary classification problems:

In [8]:
class CustomCrossEntropyLoss(nn.Module):
    def __init__(self):
        """
        Constructor for the custom Cross-Entropy Loss.
        """
        super(CustomCrossEntropyLoss, self).__init__()

    def forward(self, y_pred, y_true):
        """
        Forward pass for the custom Cross-Entropy Loss.

        :param y_pred: Tensor of predicted logits (not probabilities).
        :param y_true: Tensor of ground truth labels.
        :return: Computed Cross-Entropy Loss.
        """
        # Ensuring the predicted values are in log form probabilities
        log_probs = torch.log_softmax(y_pred, dim=1)

        # Picking the log probabilities corresponding to true labels
        relevant_log_probs = log_probs[range(len(y_true)), y_true]

        # Negative log likelihood loss
        loss = -torch.mean(relevant_log_probs)
        return loss

Let's compute the result of this custom cross entropy loss function with class definition using the same generated data in "Creating custom loss function as a python function" section. Compare the result with the loss computed by `custom_cross_entropy_loss` and `nn.CrossEntropyLoss`.

In [9]:
loss_function = CustomCrossEntropyLoss()
loss = loss_function(y_pred, y_true)
print("Custom Cross-Entropy Loss:", loss.item())

Custom Cross-Entropy Loss: 0.40760597586631775


## Dice Loss function for Image Segmentation

Image segmentation is a critical task in the field of computer vision, where the objective is to classify each pixel of an image into a particular class. A common challenge in image segmentation is dealing with class imbalance, which can adversely affect the performance of segmentation models. As the commonly used loss function for multi-class classification, cross entropy loss is sensitive to imbalanced data. Dice Loss became a popular loss function used to tackle this issue, especially in medical image segmentation.

Dice Loss is based on the Dice Coefficient, a statistical tool used to gauge the similarity of two samples. This coefficient is particularly useful in evaluating the similarity between the predicted segmentation and the ground truth.

### Dice Coefficient

The Dice Coefficient, also known as the Sørensen-Dice index or Dice's Coefficient, is defined as:

$$\small{\text{Dice Coefficient}=\frac{2×|X\cap Y|}{|X|+|Y|}}$$

where $\small X$ and $\small Y$ are two sets of samples. In the context of image segmentation, $\small X$ can be the predicted segmentation and $\small Y$ the ground truth.

In the case of estimating a Dice coefficient on predicted segmentation masks, we can approximate $\small|X\cap Y|$ as the element-wise multiplication between the prediction and target mask, and then sum the resulting matrix. The prediction mask is usually the predicted probability output by `sigmoid` or `softmax` function, ranging from 0 to 1.

A small constant (smooth) is added to the numerator and denominator to ensure numerical stability, particularly to avoid division by zero.


### Dice Loss

Dice Loss is formulated as:

$$\small\text{Dice Loss}=
\text{Dice Loss}=1−\text{Dice Coefficient}$$

**This loss function is particularly useful for datasets with class imbalance**, as it ensures that the model does not bias towards the majority class.


### Dice Loss Function Implementation in PyTorch

Let's create a custom dice loss function, which computes the similarity between two samples, to act as a loss function for binary classification problems:

In [4]:
class DiceLoss(nn.Module):
    def __init__(self):
        """
        Constructor for Dice Loss.
        """
        super(DiceLoss, self).__init__()

    def forward(self, y_pred, y_true, smooth=1.0):
        """
        Forward pass for Dice Loss.

        :param y_pred: Tensor of predicted outputs (after activation).
        :param y_true: Tensor of ground truth labels.
        :param smooth: A smoothing constant to avoid division by zero.
        :return: Computed Dice Loss.
        """
        # Check the sizes of y_pred and y_true are consistent
        assert y_pred.size() == y_true.size(), "Predicted and ground truth labels have different shapes"

        # Flatten label and prediction tensors
        y_pred = y_pred.view(-1)
        y_true = y_true.view(-1)

        intersection = (y_pred * y_true).sum()
        dice_coeff = (2. * intersection + smooth) / (y_pred.square().sum() + y_true.square().sum() +smooth)

        return 1.0 - dice_coeff

In [5]:
# Example usage
y_pred = torch.sigmoid(torch.randn(1, 1, 5, 5))  # Example predicted mask
y_true = torch.tensor([[[[1, 0, 0, 0, 1], [0, 1, 0, 0, 1], [0, 0, 1, 0, 1], [0, 0, 0, 1, 1], [1, 1, 1, 1, 1]]]])  # Example true mask

dice_loss = DiceLoss()
loss = dice_loss(y_pred, y_true)
print("Dice Loss:", loss.item())

Dice Loss: 0.3842737078666687


In [6]:
y_pred

tensor([[[[0.2829, 0.6562, 0.3265, 0.1825, 0.5295],
          [0.7689, 0.6150, 0.3891, 0.7714, 0.6671],
          [0.5337, 0.5505, 0.9265, 0.2700, 0.8181],
          [0.4257, 0.7626, 0.5778, 0.6394, 0.4290],
          [0.8714, 0.5705, 0.6270, 0.5504, 0.7423]]]])

In [7]:
y_true

tensor([[[[1, 0, 0, 0, 1],
          [0, 1, 0, 0, 1],
          [0, 0, 1, 0, 1],
          [0, 0, 0, 1, 1],
          [1, 1, 1, 1, 1]]]])

### Dice Loss Function Usage Example - Image Segmentation

In this section, under a simple setup, we use synthetic data to implement an example of image segmentation tasks. We'll create random tensors to simulate images and segmentation masks, and then apply the Dice Loss function.

Through the script below, you'll see how the self-defined Dice Loss function is used in practice with PyTorch for a segmentation task. Keep in mind that this is a simplified example primarily for illustrative purposes.

In [13]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Dice Loss Class Definition
class DiceLoss(nn.Module):
    def __init__(self):
        super(DiceLoss, self).__init__()

    def forward(self, inputs, targets, smooth=1):
        inputs = inputs.view(-1)
        targets = targets.view(-1)

        intersection = (inputs * targets).sum()
        dice = (2.*intersection + smooth)/(inputs.square().sum() + targets.square().sum() + smooth)

        return 1 - dice

# Dummy Convolutional Neural Network for Segmentation
class DummySegmentationModel(nn.Module):
    def __init__(self):
        super(DummySegmentationModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 1, kernel_size=3, padding=1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.conv1(x)
        x = self.sigmoid(x)
        return x

# Synthetic Dataset Generation
def generate_synthetic_data(batch_size=10, height=256, width=256):
    # Generating random images and masks
    images = torch.rand(batch_size, 1, height, width)  # 1 channel images
    masks = torch.randint(0, 2, (batch_size, 1, height, width)).float()  # Binary masks
    return images, masks

# Prepare Data Loader
batch_size = 5
images, masks = generate_synthetic_data(batch_size)
dataset = TensorDataset(images, masks)
data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

# Model, Loss Function, and Optimizer
model = DummySegmentationModel()
dice_loss = DiceLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training Loop
num_epochs = 2
for epoch in range(num_epochs):
    for batch_images, batch_masks in data_loader:
        optimizer.zero_grad()

        # Forward pass
        outputs = model(batch_images)

        # Loss computation
        loss = dice_loss(outputs, batch_masks)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print("Done.")

Epoch [1/2], Loss: 0.3873
Epoch [2/2], Loss: 0.3864
Done.
