<h1 style="text-align: center; font-size: 36px;">Cost/Loss Functions</h1>

---

<h2 style="text-align: center; font-size: 24px;">Table of Contents</h2>

1. Cross-Entropy Loss Function
2. #### TODO: Add more functions


## Cross-Entropy Loss Function

In deep learning and machine learning, the cross-entropy loss 
is a loss function used to measure the difference between two 
probability distributions `H(P, Q)`. The cross-entropy loss 
is commonly used in classification problems to measure the 
difference between the predicted probabilities and the true labels. 
It measures the variance between the two distributions namely `P` and `Q`.

The cross-entropy loss is defined as follows:

$$
\begin{equation}
L = H(P, Q) = \mathbb{E}_{z \sim P(z)}\left[\log Q(z)\right] = \int_{i}^{N} P(z) \log Q(z) \, dz.
\end{equation}
$$

Where:
- ( L ) is the cross-entropy loss
- ( H ) is the cross-entropy function
- ( P ) is the true probability distribution
- ( Q ) is the predicted probability distribution
- ( z ) is the random variable
- ( 𝔼 ) is the expected value
- ( $\log$ ) is the natural logarithm
- ( ∫ ) is the integration function over the interval (i, N)
- ( i ) is the initial value
- ( N ) is the final value 
- ( dz ) is the differential of the random variable
- ( P(z) ) is the true probability distribution
- ( Q(z) ) is the predicted probability distribu

<h3 id="mathematical-description-of-the-cross-entropy-equation" 
style="text-align: center; font-size: 24px;">
Mathematical Description of the Cross-Entropy Equation</h3>

The cross-entropy loss function can be described mathematically as follows:

$$
\begin{equation*}
L = H(P, Q) = -\sum_{i=1}^{N} \left( P_i \log Q_i + (1 - P_i) \log (1 - Q_i) \right)
\end{equation*}
$$

Where:
- \( $L$ \) is the cross-entropy loss.
- \( $H(P, Q)$ \) is the cross-entropy function.
- \( $P_i$ \) represents the true label for the i-th data point `(1 for the positive class, 0 for the negative class)`.
- \( $Q_i$ \) represents the predicted probability that the i-th data point belongs to the positive class `(between 0 and 1)`.
- \( $N$ \) is the total number of data points.
- The logarithm `($log$)` is typically the natural logarithm `(base \($e$\))`.


<h3 id="laymans-description-of-cross-entropy-loss"
style="text-align: center; font-size: 24px;">
Layman's Description of Cross-Entropy Loss</h3>

The cross-entropy loss measures how well our model's predictions match the actual labels `(AKA "truth" or "ground truth" )` for a set of data. Here's a simplified way to describe it:

1. **Prediction vs. Reality**: Imagine you have a model that predicts the likelihood of something happening, like whether an email is spam or not. The model gives you a probability for each email being spam.

2. **True Labels**: In reality, each email is either spam `(label 1)` or not spam `(label 0)`. These are the true labels.

3. **Comparison**: The cross-entropy loss compares the model's predicted probabilities to the true labels. It checks:
   - How close the predicted probability for spam is to `1`, if the email is actually spam.
   - How close the predicted probability for not spam is to `1`, if the email is not spam.

4. **Penalizing Bad Predictions**: If the model predicts a probability far from the true label (e.g., predicting 0.2 for an email that is actually spam), the loss is higher. This penalizes the model for being wrong.

5. **Summing It Up**: The cross-entropy loss adds up all these differences (errors) for every email (or data point) in the dataset. The goal is to minimize this loss, meaning you want the model to make predictions that are as close to the true labels as possible.


<h3 id="real-world-analogy-cross-entropy-loss-in-a-binary-classification-problem"
style="text-align: center; font-size: 24px;">
Real-World Analogy: Cross-Entropy Loss in a Binary Classification Problem</h3>

Let's consider a binary classification problem where we want to classify whether 
an email is spam or not. We have a model that predicts the probability that an 
email is spam. The true label \( $P$ \) for each email is either 1 (spam) or 0 (not spam), 
and the predicted probability \( $Q$ \) is the model's output.

Suppose we have the following true labels and predicted probabilities:

| Email ID | True Label ( $P$ ) | Predicted Probability ( $Q$ )  |
|----------|------------------|--------------------------------|
| 1        | 1                | 0.9                            |
| 2        | 0                | 0.1                            |
| 3        | 1                | 0.8                            |
| 4        | 0                | 0.3                            |

$$
\begin{align*}
\text{The cross-entropy loss for each email is calculated as:} \\
L &= H(P, Q) = -\sum_{i=1}^{N} \left( P_i \log Q_i + (1 - P_i) \log (1 - Q_i) \right)
\end{align*}
$$

$$
\begin{align*}
\text{For each email, the loss is calculated as:} \\
L_1 &= -\left( 1 \times \log 0.9 ) + (1 - 1) \times \log (1 - 0.9) \right) = 0.1054 \\
L_2 &= -\left( 0 \times \log 0.1 ) + (1 - 0) \times \log (1 - 0.1) \right) = 0.1054 \\
L_3 &= -\left( 1 \times \log 0.8 ) + (1 - 1) \times \log (1 - 0.8) \right) = 0.2231 \\
L_4 &= -\left( 0 \times \log 0.3 ) + (1 - 0) \times \log (1 - 0.3) \right) = 0.3567 \\
\\
\text{Calculate the average loss:} \\
L &= \frac{1}{4} \times (0.1054 + 0.1054 + 0.2231 + 0.3567) \\
L &= 0.19765
\end{align*}
$$

The average cross-entropy loss for these emails is 0.19765. This means that \
the model's predictions are not very accurate. The goal is to minimize this \
loss by adjusting the model's parameters during training.

#### Graphing the Cross-Entropy Loss Function

![Cross-Entropy Loss Function](../assets/images/cross-entropy-loss-charts.png)

#### Key Points:
- The cross-entropy loss measures the difference between the true labels and predicted probabilities.
- Lower cross-entropy loss indicates better model predictions.
- The goal is to minimize the cross-entropy loss during model training.
- The cross-entropy loss is commonly used in classification problems.
- The loss function penalizes the model for incorrect predictions.
- The loss is calculated as the average of the losses for each sample in the dataset.
- The loss is typically minimized using optimization algorithms like gradient descent.
- The cross-entropy loss is a fundamental concept in deep learning and machine learning.
- The loss function plays a crucial role in training neural networks and evaluating model performance.
- The loss function guides the learning process by providing feedback on the model's predictions.
- The cross-entropy loss is widely used in various applications, including image classification, natural language processing, and more.
- Understanding the cross-entropy loss is essential for building and training machine learning models.
- The loss function helps quantify the model's performance and provides insights into its predictive accuracy.
- The cross-entropy loss is a key component of the training process in supervised learning tasks.


In [11]:
import numpy as np

# Example usage
true_labels = np.array([1, 0, 1, 0])

# This is the output of a model after applying the sigmoid function
predicted_probs = np.array([0.9, 0.1, 0.8, 0.3])  
L = []
# Manual application of the cross-entropy loss function
for P, Q in zip(true_labels, predicted_probs):
    loss = - (P * np.log(Q) + (1 - P) * np.log(1 - Q))
    L.append(loss)
    print(f"True Label: {P}, Predicted Probability: {Q}, Cross-Entropy Loss: {loss:.4f}")
print(f"Average Cross-Entropy Loss: {sum(L)/len(L):.4f}")

True Label: 1, Predicted Probability: 0.9, Cross-Entropy Loss: 0.1054
True Label: 0, Predicted Probability: 0.1, Cross-Entropy Loss: 0.1054
True Label: 1, Predicted Probability: 0.8, Cross-Entropy Loss: 0.2231
True Label: 0, Predicted Probability: 0.3, Cross-Entropy Loss: 0.3567
Average Cross-Entropy Loss: 0.1976


In [12]:
import numpy as np
from functions.losses import sort_cross_entropy_loss

# Example usage
predicted_probs = np.array([0.9, 0.1, 0.8, 0.3])
cross_entropy_losses = np.array([0.1054, 0.1054, 0.2231, 0.3567])

sorted_probs, sorted_losses = sort_cross_entropy_loss(predicted_probs, cross_entropy_losses)
print("Sorted Predicted Probabilities:", sorted_probs)
print("Sorted Cross-Entropy Losses:", sorted_losses)

Sorted Predicted Probabilities: [0.1 0.3 0.8 0.9]
Sorted Cross-Entropy Losses: [0.1054 0.3567 0.2231 0.1054]


In [13]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
from functions.losses import cross_entropy_loss, sort_cross_entropy_loss
import numpy as np

# Example usage
true_labels = np.array([1, 0, 1, 0])

# This is the output of a model after applying the sigmoid function
predicted_probs = np.array([0.9, 0.1, 0.8, 0.3])  

# Calculate the cross-entropy loss using the function
loss, losses = cross_entropy_loss(true_labels, predicted_probs)
sorted_probs, sorted_losses = sort_cross_entropy_loss(predicted_probs, losses)
print(f"""Cross-entropy loss: {loss:.4f}. \n
The sorted probabilities are: {sorted_probs}.\n
The sorted losses are: {sorted_losses}.\n
""")

# Plot the true labels and predicted probabilities
fig = make_subplots(
    rows=1, cols=3, 
    subplot_titles=("True Labels", "Predicted Probabilities", "Cross-Entropy Loss"))

# Add bar chart for true labels
fig.add_trace(
    go.Bar(
        x=["Email 1", "Email 2", "Email 3", "Email 4"], 
        y=true_labels, 
        name="True Labels"),
    row=1, col=1
)

# Add bar chart for predicted probabilities
fig.add_trace(
    go.Bar(
        x=["Email 1", "Email 2", "Email 3", "Email 4"], 
        y=predicted_probs, 
        name="Predicted Probabilities"),
    row=1, col=2
)

# predicted_probs.sort()
fig.add_trace(
    go.Scatter(
        x=sorted_probs, 
        y=sorted_losses, 
        mode="lines+markers", 
        name="Cross-Entropy Loss", 
        marker=dict(color="red", line=dict(color="black", width=2))

    ),
    row=1, 
    col=3
)

# Update x-axis and y-axis labels
fig.update_xaxes(title_text="Email ID", row=1, col=1)
 
 # Update x-axis and y-axis labels
fig.update_xaxes(title_text="Email ID", row=1, col=2)

# Update y-axis labels
fig.update_yaxes(title_text="Label", row=1, col=1)

# Update y-axis labels
fig.update_yaxes(title_text="Probability", row=1, col=2)

# Update x-axis and y-axis labels
fig.update_xaxes(title_text="Predicted Probability", 
                 row=1, col=3, tickvals=sorted_probs, 
                 ticktext=[f"{prob:.1f}" for prob in sorted_probs])

# Update y-axis labels
fig.update_yaxes(title_text="Cross-Entropy Losses", row=1, col=3)

# Update layout
fig.update_layout(
    title="Cross-Entropy Loss for Each Email, Comparing True Labels and Predicted Probabilities",
    
                  showlegend=True)


# Show the plot
fig.show()

Cross-entropy loss: 0.1976. 

The sorted probabilities are: [0.1 0.3 0.8 0.9].

The sorted losses are: [0.10536052 0.35667494 0.22314355 0.10536052].




---

## The Mean Square Error / Quadratic Loss / L2 Loss Function

The Mean Squared Error (MSE) is a widely used and straightforward loss 
function for regression tasks. It measures the average squared difference 
between the predicted values and the actual values in the dataset. The 
formula for MSE is as follows:

$$
\begin{align*}
\text{J} &= \frac{1}{N} \sum_{i=1}^{N} L &= \frac{1}{N} \sum_{i=1}^{N} \left(\hat{Y}_{i} - \text{Y}_{i} \right)
\end{align*}
$$
    
#### Where:

- ( $J$ ) is the cost function or the mean loss
- $N$ is the number of samples
- $L$ is the loss function or the cross-entropy loss
- $i$ is the index of the sample
- $\sum$ is the sum function that returns the sum of the values in a matrix
- $\hat{Y}_{i}$ is the predicted value for the i-th sample
- $Y_{i}$ is the actual value for the i-th sample


#### Key Points:

- Easy to interpret: The MSE provides a clear measure of the average squared difference between the predictions and the actual values, allowing for easy understanding and comparison.
- Always differential: The squaring operation ensures that the loss function is always differentiable, which is essential for optimizing the model parameters using gradient-based methods.
- Only one local minima: MSE has a unique global minimum, making it easier to train the model and converge to an optimal solution.

#### Disadvantages:

- Error unit in the square: The unit of error in the MSE is squared, which may not be intuitive to interpret in real-world scenarios. It can lead to difficulties in understanding the magnitude of the error.
- Not robust to outliers: MSE assigns high importance to large errors due to the squaring operation, making it sensitive to outliers in the dataset. Outliers can disproportionately influence the loss and impact the model’s performance.

- **Note**: In regression tasks, it is common to use a linear activation function in the output neuron to directly predict the continuous target variable.



In [14]:
from functions.losses import mean_square_error
# Example usage
__y_true = np.array([10, 20, 30, 40, 50])
__y_pred = np.array([12, 18, 32, 38, 48])
mse = mean_square_error(__y_true, __y_pred)
print(f"Mean Square Error (MSE): {mse:.4f}")
# Output: Mean Square Error (MSE): 4

4.0
Mean Square Error (MSE): 4.0000
