# Loss Functions

Loss functions, also known as cost functions, measure the inconsistency between the predicted output and the actual output in machine learning.

## Mean Squared Error (MSE) Loss

**Description**: It measures the average of the squares of the errors, i.e., the average squared difference between the estimated values and the actual value.

$$\text{MSE}(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$


where $y$ is the actual value, $\hat{y}$ is the predicted value, and $n$ is the number of samples.

## Mean Absolute Error (MAE) Loss

**Description**: It measures the average of the absolute errors between the predicted values and the actual values. It is less sensitive to outliers compared to MSE.

$$\text{MAE}(y, \hat{y}) = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

## Cross-Entropy Loss (or Log Loss)

**Description**: Commonly used in classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1.

$$\text{Cross-Entropy}(y, \hat{y}) = -\frac{1}{n} \sum_{i=1}^{n} [y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i)]$$

$$\text{Cross-Entropy}(y, \hat{y}) = -\frac{1}{n} \sum_{i=1}^{n} \sum_{c=1}^{M} y_{ic} \log(\hat{y}_{ic})$$

where $M$ is the number of classes, $y_{ic}$ is a binary indicator of whether class label $c$ is the correct classification for observation $i$, and $\hat{y}_{ic}$ is the predicted probability that observation $i$ is of class $c$.

## Huber Loss

**Description**: Often used in regression problems. It's less sensitive to outliers than the MSE as it combines MSE and MAE, using a parameter to switch between the two.

$$\text{Huber Loss}(y, \hat{y}) = \begin{cases}
\frac{1}{2} (y - \hat{y})^2 & \text{for } |y - \hat{y}| \le \delta, \\
\delta |y - \hat{y}| - \frac{1}{2} \delta^2 & \text{otherwise.}
\end{cases}
$$

where $\delta$ is a hyperparameter to choose.

## Hinge Loss

**Description**: Commonly used for binary classification problems, such as with Support Vector Machines.

$$\text{Hinge Loss}(y, \hat{y}) = \max(0, 1 - y_i \cdot \hat{y}_i)$$

where $y \in {-1, 1}$ is the actual label and $\hat{y}$ is the predicted label.

## Categorical Crossentropy

**Description**: A variant of Cross-Entropy Loss used when the target is categorical.



In [1]:
import numpy as np
import tensorflow as tf

## Mean Squared Error (MSE) Loss

In [2]:
# Mean Squared Error (MSE) Loss
def MSELoss(groundTruth:np.ndarray,predictions:np.ndarray)->np.ndarray:
  return np.mean(np.square(groundTruth-predictions))

In [37]:
groundTruth = np.array([1,2,3,4,5])
predictions = np.array([1.5,2.5,3,3.5,4.5])
print(f"Mean Squared Error: {MSELoss(groundTruth,predictions)}")

Mean Squared Error: 0.2


## Mean Absolute Error (MAE) Loss

In [3]:
# Mean Absolute Error (MAE) Loss
def MAELoss(groundTruth:np.ndarray,predictions:np.ndarray)->np.ndarray:
  return np.mean(np.abs(groundTruth-predictions))

In [36]:
groundTruth = np.array([1,2,3,4,5])
predictions = np.array([1.5,2.5,3,3.5,4.5])
print(f"Mean Absolute Error: {MAELoss(groundTruth,predictions)}")

Mean Absolute Error: 0.4


## Cross-Entropy Loss (Binary Classification)

In [4]:
# Cross-Entropy Loss (Binary Classification)
def BinaryCrossEntropy(groundTruth:np.ndarray,predictions:np.ndarray)->np.ndarray:
  epsilon = 1e-15
  predictions = np.clip(predictions,epsilon,1-epsilon)
  return -np.mean(groundTruth*np.log(predictions)+(1-groundTruth)*np.log(1-predictions))

In [34]:
groundTruth = np.array([0,1,1,0,1])
predictions = np.array([0.1,0.9,0.8,0.4,0.5])
print(f"Binary Cross-Entropy Loss: {BinaryCrossEntropy(groundTruth,predictions)}")

Binary Cross-Entropy Loss: 0.32756747739115966


## Huber Loss

In [31]:
# Huber Loss
def HuberLossWrapper(deltaValue:int|float):
  def HuberLoss(groundTruth:np.ndarray,predictions:np.ndarray)->np.ndarray:
    huber = tf.keras.losses.Huber(delta=deltaValue)
    return huber(groundTruth,predictions)
  return HuberLoss

In [32]:
groundTruth = tf.constant(
    [1,2,3,4,5],
    dtype=tf.float32
)
predictions = tf.constant(
    [1.5,2.5,3,3.5,4.5],
    dtype=tf.float32
)

In [33]:
huberLoss = HuberLossWrapper(deltaValue=1.0)
print(f"Huber Loss:\n{huberLoss(groundTruth,predictions).numpy()}")

Huber Loss:
0.10000000149011612


## Hinge Loss

In [6]:
# Hinge Loss
def HingeLoss(groundTruth:np.ndarray,predictions:np.ndarray)->np.ndarray:
  return np.mean(np.maximum(0,1-groundTruth*predictions))

In [11]:
groundTruth = np.array([-1,1,1,-1,1])
predictions = np.array([-0.8,0.8,0.3,-0.5,-0.1])

In [12]:
print(f"Hinge Loss:\n{HingeLoss(groundTruth,predictions)}")

Hinge Loss:
0.54


## Categorical Crossentropy

In [17]:
groundTruth = tf.constant(
    [
        [0,1,0],
        [1,0,0],
        [0,0,1]
    ],
    dtype=tf.float32
)
predictions = tf.constant(
    [
        [0.05,0.95,0],
        [0.9,0.05,0.05],
        [0.1,0.1,0.8]
    ],
    dtype=tf.float32
)
print(f"Shape of Ground Truth: {groundTruth.shape}")
print(f"Shape of predictions: {predictions.shape}")

Shape of Ground Truth: (3, 3)
Shape of predictions: (3, 3)


In [18]:
loss = tf.keras.losses.CategoricalCrossentropy()
print(f"Categorical Cross-Entropy: {loss(groundTruth,predictions).numpy()}")

Categorical Cross-Entropy: 0.12659913301467896
