<a href="https://colab.research.google.com/github/StanleyLiangYork/Advance_NN_techniques/blob/main/Loss_Functions_in_different_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mean Squared Error (MSE) Loss
Mean Squared Error (MSE) loss is a commonly used loss function in regression problems, where the goal is to predict a continuous variable. The loss is calculated as the average of the squared differences between the predicted and true values. The formula for MSE loss is:
\begin{align}
\text{MSE loss} =  \frac{1}{n}*\sum{(y_{pred}-y_{true})^{2}}
\end{align}


*   n is the number of samples in the dataset
*   y_pred is the predicted value of the target variable
*   y_true is the true value of the target variable


Numpy version

In [None]:
import numpy as np

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    n = len(y_true)
    mse_loss = np.sum((y_pred - y_true) ** 2) / n
    return mse_loss

TensorFlow version

In [None]:
import tensorflow as tf

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = tf.keras.losses.MeanSquaredError()
    mse_loss = mse(y_true, y_pred)
    return mse_loss

Pytorch version

In [None]:
import torch

def mse_loss(y_pred, y_true):
    """
    Calculates the mean squared error (MSE) loss between predicted and true values.
    
    Args:
    - y_pred: predicted values
    - y_true: true values
    
    Returns:
    - mse_loss: mean squared error loss
    """
    mse = torch.nn.MSELoss()
    mse_loss = mse(y_pred, y_true)
    return mse_loss

# Binary Cross-Entropy Loss
Binary Cross-Entropy loss, also known as log loss, is a common loss function used in binary classification problems. It measures the difference between the predicted probability distribution and the actual binary label distribution.
\begin{align}
\text{L}(y, \hat{y}) =  -[y * log(\hat{y})+(1-y)*log((1-\hat{y}))]
\end{align}
where y is the true binary label (0 or 1), $\hat{y}$ is the predicted probability (ranging from 0 to 1), and log is the natural logarithm. <p>
The first term of the equation calculates the loss when the true label is 1, and the second term calculates the loss when the true label is 0. The overall loss is the sum of both terms.

Numpy version

In [None]:
import numpy as np

# define true labels and predicted probabilities
y_true = np.array([0, 1, 1, 0])
y_pred = np.array([0.1, 0.9, 0.8, 0.3])

# calculate the binary cross-entropy loss
loss = -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean()

# print the loss
print(loss)

0.19763488164214868


Tensorflow version

In [None]:
import tensorflow as tf

# define true labels and predicted probabilities
y_true = tf.constant([0, 1, 1, 0], dtype=tf.float16)
y_pred = tf.constant([0.1, 0.9, 0.8, 0.3], dtype=tf.float16)

# define the loss function
bce_loss = tf.keras.losses.BinaryCrossentropy()

# calculate the loss
loss = bce_loss(y_true, y_pred)

# print the loss
print(loss)

tf.Tensor(0.1978, shape=(), dtype=float16)


Pytorch version

In [None]:
import torch

# define true labels and predicted probabilities
y_true = torch.tensor([0, 1, 1, 0], dtype=torch.float32)
y_pred = torch.tensor([0.1, 0.9, 0.8, 0.3], dtype=torch.float32)

# define the loss function
bce_loss = torch.nn.BCELoss()

# calculate the loss
loss = bce_loss(y_pred, y_true)

# print the loss
print(loss)

tensor(0.1976)


# Weighted Binary Cross-Entropy Loss
Weighted Binary Cross-Entropy loss is a variation of the binary cross-entropy loss that allows for assigning different weights to positive and negative examples. This can be useful when dealing with imbalanced datasets, where one class is significantly underrepresented compared to the other.
\begin{align}
\text{L}(y, \hat{y}) =  -[w_{pos} * y * log(\hat{y})+w_{neg}(1-y)*log((1-\hat{y}))]
\end{align}
The positive and negative weights can be chosen based on the relative importance of each class. For example, if the positive class is more important, a higher weight can be assigned to it. Similarly, if the negative class is more important, a higher weight can be assigned to it.
When the predicted probability is close to the true label, the loss is low, and when the predicted probability is far from the true label, the loss is high. This loss function is commonly used in neural network models that use sigmoid activation functions in the output layer to predict binary labels.

Tensorflow version

In [None]:
import tensorflow as tf


# define true labels and predicted probabilities
y_true = tf.constant([0, 1, 1, 0], dtype=tf.float32)
y_pred = tf.constant([0.1, 0.9, 0.8, 0.3], dtype=tf.float32)
pos_weight = 1.5

# cannot create an object - pos_weight assigned a weight to the positive class (i.e.y_ture=1), when pos_weight>1, it increase the impact of the recall
# when pos_weight<1, it increase the impact of pocision
weighted_bce = tf.nn.weighted_cross_entropy_with_logits(labels=y_true, logits=y_pred, pos_weight=1.5)

print(weighted_bce)

tf.Tensor([0.7443967 0.5117308 0.556651  0.8543553], shape=(4,), dtype=float32)


Pytorch version

In [None]:
import torch

# pos_weight = 3*torch.ones([1]) # the positive case has 3X weight
pos_weight = torch.tensor(3.0, dtype=torch.float32)
criterion = torch.nn.BCEWithLogitsLoss(weight=pos_weight)

y_true = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)
y_pred = torch.tensor([[0.1], [0.9], [0.8], [-0.05]], dtype=torch.float32)

loss = criterion(y_pred, y_true)
print(loss)

tensor(1.5938)


# Categorical Cross-Entropy Loss

The categorical cross-entropy loss is a popular loss function used in multi-class classification problems. It measures the dissimilarity between the true labels and the predicted probabilities for each class.
\begin{align}
\text{L}(y, \hat{y}) =  -\frac{1}{n}*\sum\sum( Y* log\hat{Y})
\end{align}

where Y is a matrix of true labels in one-hot encoding format, $\hat{Y}$ is a matrix of predicted probabilities for each class, N is the number of samples, and log represents the natural logarithm.

Numpy version

In [None]:
import numpy as np

# define true labels and predicted probabilities as NumPy arrays
y_true = np.array([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = np.array([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# calculate the loss
loss = -1/len(y_true) * np.sum(np.sum(y_true * np.log(y_pred)))

# print the loss
print(loss)

1.7661057888493454


Tensorflow version

In [None]:
import tensorflow as tf

# define true labels and predicted probabilities as TensorFlow Tensors
y_true = tf.constant([[0, 1, 0], [0, 0, 1], [1, 0, 0]])
y_pred = tf.constant([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
cce_loss = tf.keras.losses.CategoricalCrossentropy()

# calculate the loss
loss = cce_loss(y_true, y_pred)

# print the loss
print(loss.numpy())

1.7661058


Note that the CategoricalCrossentropy class handles the conversion of the true labels to one-hot encoding internally, so you don't need to do it explicitly. 

In [None]:
import tensorflow as tf

# define true labels and predicted probabilities as TensorFlow Tensors
y_true = tf.constant([1,2,0])
y_pred = tf.constant([0.9, 1.8, -0.1])

# create the loss object
cce_loss = tf.keras.losses.CategoricalCrossentropy()

# calculate the loss
loss = cce_loss(y_true, y_pred)

# print the loss
print(loss.numpy())

1.7963214


Pytorch version

In [None]:
import torch

# define true labels and predicted logits as PyTorch Tensors
# the CrossEntropyLoss class combines the softmax activation function and the categorical cross-entropy loss into a single operation, so you don't need to apply softmax separately.
y_true = torch.LongTensor([1, 2, 0]) # the true label must be integer, NOT one-hot encoding format
y_logits = torch.Tensor([[0.8, 0.1, 0.1], [0.2, 0.3, 0.5], [0.1, 0.6, 0.3]])

# create the loss object
ce_loss = torch.nn.CrossEntropyLoss()

# calculate the loss
loss = ce_loss(y_logits, y_true)

# print the loss
print(loss.item())

1.2276147603988647


# Sparse Categorical Cross-Entropy Loss
The sparse categorical cross-entropy loss is similar to the categorical cross-entropy loss, but it is used when the true labels are provided as integers rather than one-hot encoding.
\begin{align}
\text{L}(y, \hat{y}) =  -\frac{1}{n}*\sum\log (\hat{Y}_{i})
\end{align} 
where $\hat{Y}_{i}$ is the predicted probability for the true class label i for each sample, and N is the number of samples. <p>
The sparse categorical cross-entropy loss uses integer labels directly. The true label for each sample is represented as a single integer value $i$ between 0 and $C-1$, where $C$ is the number of classes.

Numpy version

In [None]:
import numpy as np

def sparse_categorical_crossentropy(y_true, y_pred):
    # convert true labels to one-hot encoding
    y_true_onehot = np.zeros_like(y_pred)
    y_true_onehot[np.arange(len(y_true)), y_true] = 1

    # calculate loss
    loss = -np.mean(np.sum(y_true_onehot * np.log(y_pred), axis=-1))

    return loss

# define true labels as integers and predicted probabilities as an array
y_true = np.array([1, 2, 0]) # as integer in (0, C-1)
y_pred = np.array([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]]) # as one-hot encoding

loss = sparse_categorical_crossentropy(y_true, y_pred)
print(loss)

0.6108604879161034


Tensorflow version

In [None]:
import tensorflow as tf

def sparse_categorical_crossentropy(y_true, y_pred):
  # set from_logits to False to ensure that y_pred represents probabilities rather than logit values.
    loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred, from_logits=False)
    return loss

# define true labels as integers and predicted probabilities as a tensor
y_true = tf.constant([1, 2, 0])
y_pred = tf.constant([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.numpy())

[0.22314355 0.6931472  0.91629076]


Pytorch version

In [None]:
import torch.nn.functional as F
import torch

def sparse_categorical_crossentropy(y_true, y_pred):
    loss = F.cross_entropy(y_pred, y_true)
    return loss

# define true labels as integers and predicted logits as a tensor
y_true = torch.tensor([1, 2, 0])
y_pred = torch.tensor([[0.1, 0.8, 0.1], [0.3, 0.2, 0.5], [0.4, 0.3, 0.3]])

# calculate the loss
loss = sparse_categorical_crossentropy(y_true, y_pred)

# print the loss
print(loss.item())

0.887542188167572


# Dice Loss
Dice loss, also known as the Sørensen–Dice coefficient or F1 score, is a loss function used in image segmentation tasks to measure the overlap between the predicted segmentation and the ground truth. The Dice loss ranges from 0 to 1, where 0 indicates no overlap and 1 indicates perfect overlap.
\begin{align}
\text Dice Loss = 1 - \frac{2 * intersection + smooth} {\sum({prediction})^{2} + \sum{ground truth + smooth}^{2}}
\end{align}

Numpy version

In [None]:
import numpy as np

def dice_loss(y_true, y_pred, smooth=1e-5):
  intersection = np.sum(y_true * y_pred, axis=(1,2,3))
  sum_of_squares_pred = np.sum(np.square(y_pred), axis=(1,2,3))
  sum_of_squares_true = np.sum(np.square(y_true), axis=(1,2,3))
  dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
  return dice

Tensorflow version

In [None]:
import tensorflow as tf

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = tf.reduce_sum(y_true * y_pred, axis=(1,2,3))
    sum_of_squares_pred = tf.reduce_sum(tf.square(y_pred), axis=(1,2,3))
    sum_of_squares_true = tf.reduce_sum(tf.square(y_true), axis=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

[-1.601713251907458e-15,
 0.46899559358927834,
 0.7219280948873594,
 0.8812908992306898,
 0.9709505944546657,
 0.9999999999999971]

Pytorch version

In [None]:
import torch

def dice_loss(y_true, y_pred, smooth=1e-5):
    intersection = torch.sum(y_true * y_pred, dim=(1,2,3))
    sum_of_squares_pred = torch.sum(torch.square(y_pred), dim=(1,2,3))
    sum_of_squares_true = torch.sum(torch.square(y_true), dim=(1,2,3))
    dice = 1 - (2 * intersection + smooth) / (sum_of_squares_pred + sum_of_squares_true + smooth)
    return dice

We assumes that y_true and y_pred are 4D tensors with dimensions (batch_size, num_classes, height, width)

# KL Divergence Loss
KL (Kullback-Leibler) divergence loss is a measure of how different two probability distributions are from each other. In the context of machine learning, it is often used as a loss function to train models that generate new samples from a given distribution.
\begin{align}
\text KL(p||q)=\sum{(p(x)*log(\frac{p(x)}{q(x)})}
\end{align}
where p represents the true distribution and q represents the predicted distribution. The KL divergence loss measures how well the predicted distribution matches the true distribution.

Numpy version

In [None]:
import numpy as np

def kl_divergence_loss(p, q):
    return np.sum(p * np.log(p / q))

Tensorflow version <p>
p and q are TensorFlow tensors representing the true distribution and predicted distribution, respectively. The tf.keras.losses.KLDivergence() function is used to compute the KL divergence loss between p and q. The result is a scalar tensor that represents the loss value.

In [None]:
import tensorflow as tf

# define true distribution and predicted distribution
p = tf.constant([0.2, 0.3, 0.5])
q = tf.constant([0.4, 0.3, 0.3])

kl_loss = tf.keras.losses.KLDivergence()
loss = kl_loss(p, q)
print(loss)

tf.Tensor(0.11678335, shape=(), dtype=float32)


In [None]:
p = tf.random.normal([224,224], 0, 255, tf.float32)
q = tf.random.normal([224,224], 0, 255, tf.float32)
kl_loss = tf.keras.losses.KLDivergence(reduction=tf.keras.losses.Reduction.AUTO)
loss = kl_loss(p, q)
print(loss)

tf.Tensor(894.798, shape=(), dtype=float32)


Pytorch version

In [None]:
import torch
import torch.nn.functional as F


criterion = torch.nn.KLDivLoss(reduction='batchmean')

p = F.log_softmax(torch.randn(3, 5, requires_grad=True), dim=1)
q = F.softmax(torch.rand(3, 5), dim=1)
loss = criterion(p,q)
print(loss.item())

0.5797662138938904
