<a href="https://colab.research.google.com/github/dnyanshwalwadkar/Adv-Deep-Learning/blob/main/Loss_Function_Guide.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Title: The Art of Loss Function Selection in Deep Learning: A Comprehensive Guide



# Mean Squared Error (MSE) Loss:

MSE is a widely used loss function for regression problems. It calculates the average squared difference between the true and predicted values. This loss function is sensitive to outliers, as the error gets squared, making it suitable for problems where larger errors should be penalized more heavily.

In [1]:
def mean_squared_error(y_true, y_pred):
    n = len(y_true)
    mse = sum((y_true[i] - y_pred[i])**2 for i in range(n)) / n
    return mse

y_true = [1, 2, 3, 4, 5]
y_pred = [1.1, 1.9, 3.2, 3.8, 5.1]
mse = mean_squared_error(y_true, y_pred)
print("MSE:", mse)


MSE: 0.02200000000000002


# Cross-Entropy Loss (Binary Classification):

Cross-entropy loss measures the dissimilarity between two probability distributions, in this case, the true labels and predicted probabilities for a binary classification problem. It penalizes the model more if it is confident and incorrect in its predictions, making it suitable for classification tasks where probabilities are required.

In [2]:
import math

def binary_cross_entropy(y_true, y_pred):
    n = len(y_true)
    bce = -sum(y_true[i]*math.log(y_pred[i]) + (1-y_true[i])*math.log(1-y_pred[i]) for i in range(n)) / n
    return bce

y_true = [0, 1, 1, 0, 1]
y_pred = [0.1, 0.9, 0.8, 0.3, 0.7]
bce = binary_cross_entropy(y_true, y_pred)
print("Binary Cross-Entropy:", bce)


Binary Cross-Entropy: 0.22944289410146546


## Hinge Loss (Binary Classification):

Hinge loss is used in binary classification problems, particularly in support vector machines (SVMs). It measures the distance between the true label (±1) and the predicted value. The loss is zero when the prediction is correct, and it linearly increases when the model is less confident or incorrect. It aims to maximize the margin between classes, making it suitable for problems with imbalanced data.

In [3]:
def hinge_loss(y_true, y_pred):
    n = len(y_true)
    hl = sum(max(0, 1 - y_true[i]*y_pred[i]) for i in range(n)) / n
    return hl

y_true = [-1, 1, 1, -1, 1]
y_pred = [-0.9, 0.8, 0.9, -0.7, 0.6]
hl = hinge_loss(y_true, y_pred)
print("Hinge Loss:", hl)


Hinge Loss: 0.22000000000000003


## Triplet Loss:

Triplet loss is used for learning embeddings in similarity-based tasks, where the goal is to learn a representation that places similar data points closer together in the feature space. The loss function takes three inputs: anchor, positive, and negative. The anchor and positive inputs are of the same class, while the negative input is from a different class. The goal is to minimize the distance between the anchor and positive while maximizing the distance between the anchor and negative.

In [4]:
def triplet_loss(anchor, positive, negative, margin=0.5):
    ap_distance = sum((anchor[i] - positive[i])**2 for i in range(len(anchor)))
    an_distance = sum((anchor[i] - negative[i])**2 for i in range(len(anchor)))
    tl = max(ap_distance - an_distance + margin, 0)
    return tl

anchor = [1.0, 1.1, 1.2]
positive = [1.0, 1.0, 1.3]
negative = [2.1, 2.3, 2.4]
tl = triplet_loss(anchor, positive, negative)
print("Triplet Loss:", tl)


Triplet Loss: 0


## Log-Cosh Loss (Regression):

Log-Cosh loss is a smooth approximation of the absolute error for regression problems. It is less sensitive to outliers than the Mean Squared Error, as it does not square the error. The loss function calculates the average logarithm of the hyperbolic cosine of the difference between true and predicted values.

In [5]:
import math

def log_cosh_loss(y_true, y_pred):
    n = len(y_true)
    lcl = sum(math.log(math.cosh(y_pred[i] - y_true[i])) for i in range(n)) / n
    return lcl

y_true = [1, 2, 3, 4, 5]
y_pred = [1.1, 1.9, 3.2, 3.8, 5.1]
lcl = log_cosh_loss(y_true, y_pred)
print("Log-Cosh Loss:", lcl)


Log-Cosh Loss: 0.010942242028990806


# Dice Loss (Binary Segmentation):

Dice loss is commonly used for binary segmentation problems, particularly in medical image segmentation. It measures the overlap between the true and predicted segmentation masks. The loss function calculates the similarity coefficient between the true and predicted masks, which ranges between 0 (no overlap) and 1 (perfect overlap). The Dice loss is 1 minus this similarity coefficient.

In [6]:
def dice_loss(y_true, y_pred, smooth=1e-7):
    intersection = sum(y_true[i] * y_pred[i] for i in range(len(y_true)))
    union = sum(y_true[i] for i in range(len(y_true))) + sum(y_pred[i] for i in range(len(y_pred)))
    dl = 1 - (2 * intersection + smooth) / (union + smooth)
    return dl

y_true = [1, 0, 1, 0, 1]
y_pred = [0.9, 0.1, 0.8, 0.2, 0.7]
dl = dice_loss(y_true, y_pred)
print("Dice Loss:", dl)


Dice Loss: 0.15789473407202215


## Focal Loss (Binary Classification):

Focal loss is an extension of the cross-entropy loss designed to address the class imbalance problem in object detection tasks. It adds a modulating term to the cross-entropy loss, which reduces the loss for well-classified examples, allowing the model to focus on hard-to-classify instances. It has two hyperparameters: alpha, which balances the importance of positive and negative examples, and gamma, which controls the focus on hard examples.

In [7]:
def focal_loss(y_true, y_pred, gamma=2.0, alpha=0.25):
    n = len(y_true)
    fl = -sum(alpha * y_true[i] * (1 - y_pred[i])**gamma * math.log(y_pred[i]) +
              (1 - alpha) * (1 - y_true[i]) * y_pred[i]**gamma * math.log(1 - y_pred[i]) for i in range(n)) / n
    return fl

y_true = [0, 1, 1, 0, 1]
y_pred = [0.1, 0.9, 0.8, 0.3, 0.7]
fl = focal_loss(y_true, y_pred)
print("Focal Loss:", fl)


Focal Loss: 0.007077157124841256


## Kullback-Leibler (KL) Divergence Loss:

KL divergence loss measures the dissimilarity between two probability distributions, which can be useful in tasks such as unsupervised learning or generative modeling. It calculates the average difference in the logarithm of probabilities between the true and predicted distributions. KL divergence is not symmetric, meaning that the order of the true and predicted distributions affects the result.

In [8]:
import math

def kl_divergence(y_true, y_pred):
    n = len(y_true)
    kld = sum(y_true[i] * math.log(y_true[i] / y_pred[i]) for i in range(n)) / n
    return kld

y_true = [0.2, 0.3, 0.1, 0.4]
y_pred = [0.25, 0.35, 0.15, 0.25]
kld = kl_divergence(y_true, y_pred)
print("Kullback-Leibler Divergence Loss:", kld)


Kullback-Leibler Divergence Loss: 0.014145256669114606
