# neuralthreads
[medium](https://neuralthreads.medium.com/i-was-not-satisfied-by-any-deep-learning-tutorials-online-37c5e9f4bea1)

## Chapter 4 — Losses and their derivatives

Categorical cross-entropy loss — The most important loss function

### 4.3 What is Categorical cross-entropy loss and how to compute the gradients?

This post is the most important post in the fourth chapter. Here we will talk about Categorical cross-entropy loss and what it means.

Suppose I ask you ‘Who is this actor?’

![Alt text](image.png)

And I give you 3 options.

1. Chris Evans
2. RDJ
3. Chris Hemsworth

We all know he is RDJ. So, the correct answer ‘y_true’ is

In [4]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_true = 
    \begin{bmatrix*}
    0 \\
    1 \\
    0
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

But, you don’t know who he is and you guess the answer with probabilities ‘y_pred’

In [3]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_pred =  
    \begin{bmatrix*}
    0.2 \\
    0.45 \\
    0.35 
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

Two things,

**First**, in both answers, i.e. correct and predicted, the sum of entries is equal to 1. Because they are probabilities or answers to the same question.

**Second**, we can be optimistic and say that we will take the entry with the highest magnitude. In that case, your predicted answer is correct because the highest probability of 0.45 is of the second entry, i.e., RDJ. But there is still an error because the predicted answer for RDJ is not 1 or close to 1.

Here, we will use Categorical cross-entropy loss.

Suppose we have true values,

In [7]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_true = y =
    \begin{bmatrix*}
    y_1 \\
    y_2 \\
    y_3
    \end{bmatrix*}
\end{gather*}


<IPython.core.display.Latex object>

and predicted values,

In [8]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_pred = \hat{y} =
    \begin{bmatrix*}
    \hat{y_1} \\
    \hat{y_2} \\
    \hat{y_3}
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

Then Categorical cross-entropy liss is calculated as follow:

In [9]:
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

SyntaxError: unexpected character after line continuation character (3550618597.py, line 1)

We can easily calculate Mean Absolute Error in Python like this.

In [5]:
import numpy as np                             # importing NumPy
np.random.seed(42)

def mae(y_true, y_pred):                     # MSE
    return np.mean(abs(y_true - y_pred))

Now, we know that

In [7]:
%%latex
\begin{gather*}
    MAE = f(\hat{y_1},\hat{y_2},\hat{y_3})
\end{gather*}

<IPython.core.display.Latex object>

So, like MSE activation function, we have a **Jacobian** for MSE.

In [8]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{2.5}
    J = \dfrac{\partial{(MAE)}}{(\hat{y_1},\hat{y_2},\hat{y_3})} =     
    \begin{bmatrix*}
    \dfrac{\partial{(MAE)}}{\partial(\hat{y_1})} \\
    \dfrac{\partial{(MAE)}}{\partial(\hat{y_2})} \\
    \dfrac{\partial{(MAE)}}{\partial(\hat{y_3})} 
    \end{bmatrix*}
    
\end{gather*}


<IPython.core.display.Latex object>

We can easily find each term in this Jacobian.

In [17]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{3}
    \Rightarrow J =  
    \begin{bmatrix*}
      \frac{-(y_1 - \hat{y_1})}{3|y_1 - \hat{y_1}|} \\
      \frac{-(y_2 - \hat{y_2})}{3|y_2 - \hat{y_2}|} \\ 
      \frac{-(y_3 - \hat{y_3})}{3|y_3 - \hat{y_3}|} \\
    \end{bmatrix*} \Rightarrow  \\
    \newcommand{\arraystretch}{1.5}
    \Rightarrow J =  - \frac{1}{3} (
    \begin{bmatrix*}
      y_{1} - \hat{y_{1}} \\
      y_{2} - \hat{y_{2}} \\ 
      y_{3} - \hat{y_{3}} \\
    \end{bmatrix*} / 
    \begin{bmatrix*}
      |y_1 - \hat{y_1}| \\
      |y_2 - \hat{y_2}| \\ 
      |y_3 - \hat{y_3}| \\
    \end{bmatrix*}
    ) \Rightarrow  \\

    \\
  \Rightarrow J = - \frac{1}{3} \frac{y\_true - y\_pred}{|y\_true - y\_pred|}
\end{gather*}

<IPython.core.display.Latex object>

> Note — Here, 3 represents ‘N’, i.e., the entries in y_true and y_pred

We can reduce it to define the MAE Jacobian in Python like this.

In [19]:
def mae_grad(y_true, y_pred):
    N = y_true.shape[0]
    return -((y_true - y_pred) / (abs(y_true - y_pred)+ 10**-100))/N

> Note — 10**-100 is for stability.

Let us have a look at an example.

In [20]:
y_true = np.array([[1.5], [0.2], [3.9], [6.2], [5.2]])
print(y_true)
y_pred = np.array([[1.2], [0.5], [3.2], [4.2], [3.2]])
print(y_pred)
print(y_pred.shape)

[[1.5]
 [0.2]
 [3.9]
 [6.2]
 [5.2]]
[[1.2]
 [0.5]
 [3.2]
 [4.2]
 [3.2]]
(5, 1)


In [21]:
mae(y_true, y_pred)

1.06

In [22]:
mae_grad(y_true, y_pred)

array([[-0.2],
       [ 0.2],
       [-0.2],
       [-0.2],
       [-0.2]])

I hope you now understand how to implement Mean Absolute Error.