# neuralthreads
[medium](https://neuralthreads.medium.com/i-was-not-satisfied-by-any-deep-learning-tutorials-online-37c5e9f4bea1)

## Chapter 4 — Losses and their derivatives

Mean Square Error — The most used Regression loss

Let us start the fourth chapter — Losses and their gradients or derivatives with Mean Square error. This error is generally used if regression problems

### 4.1 What is Mean Square error and how to compute its gradients?

Suppose we have true values,

In [11]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_true = y = 
    \begin{bmatrix*}
    y_1 \\
    y_2 \\
    y_3 
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

and predicted values,

In [12]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_pred = y = 
    \begin{bmatrix*}
    \hat{y_1} \\
    \hat{y_2} \\
    \hat{y_3} 
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

Then Mean Square Error is calculated as follow:

In [17]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
   MSE = \dfrac{1}{N}\sum_{i=1}^{i=N}(y\_true_i - y\_pred_i)^2 \Rightarrow \\
    \\
\Rightarrow
   MSE = \dfrac{1}{N}\sum_{i=1}^{i=N}(y_i - \hat{y_i})^2 \Rightarrow \\
    \\
\Rightarrow MSE = \dfrac{1}{3}[(y_1 - \hat{y_1})^2 + (y_2 - \hat{y_2})^2 + (y_3 - \hat{y_3})^2] \\

\end{gather*}

<IPython.core.display.Latex object>

We can easily calculate Mean Square Error in Python like this.

In [83]:
import numpy as np                             # importing NumPy
np.random.seed(42)

def mse(y_true, y_pred):                     # MSE
    return np.mean((y_true - y_pred)**2)

Now, we know that

In [18]:
%%latex
\begin{gather*}
    MSE = f(\hat{y_1},\hat{y_2},\hat{y_3})
\end{gather*}

<IPython.core.display.Latex object>

So, like the [**Softmax**](./20_activation_softmax.ipynb) activation function, we have a **Jacobian** for MSE.

In [23]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{2.5}
    J = \dfrac{\partial{(MSE)}}{(\hat{y_1},\hat{y_2},\hat{y_3})} =     
    \begin{bmatrix*}
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_1})} \\
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_2})} \\
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_3})} 
    \end{bmatrix*}
    
\end{gather*}


<IPython.core.display.Latex object>

We can easily find each term in this Jacobian.  \\\\\\\\\

In [77]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{2}
    J =  
    \begin{bmatrix*}
      y_1(1-y_1)    & -y_1 y_2      & -y_1 y_3      & -y_1 y_4   \\
      -y_1 y_2      & y_2(1-y_2)    & -y_2 y_3      & -y_2 y_4   \\
      -y_1 y_3      & -y_2 y_3      & y_3(1-y_3)    & -y_3 y_4   \\
      -y_1 y_4      & -y_2 y_4      & -y_3 y_4      & y_4(1-y_4) \\
    \end{bmatrix*}\\
\end{gather*}

<IPython.core.display.Latex object>

We can reduce it to define the Softmax Jacobian in Python like this.

In [82]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{2}
    J =  \begin{bmatrix*}
      y_1 \\
      y_2 \\
      y_3 \\
      y_4 \\
    \end{bmatrix*} * 
    \begin{bmatrix*}
      1-y_1   & -y_2    & -y_3     & -y_4   \\
      -y_1    & 1-y_2   & -y_3     & -y_4   \\
      -y_1    & -y_2    & 1-y_3    & -y_4   \\
      -y_1    & -y_2    & -y_4     & 1-y_4  \\
    \end{bmatrix*}\\
    \\
    J =  \begin{bmatrix*}
      y_1 \\
      y_2 \\
      y_3 \\
      y_4 \\
    \end{bmatrix*} * (
    \begin{bmatrix*}
      1   & 0    & 0    & 0   \\
      0   & 1    & 0    & 0   \\
      0   & 0    & 1    & 0   \\
      0   & 0    & 0    & 1  \\
    \end{bmatrix*} - 
    \begin{bmatrix*}
      y_1   & y_2    & y_3     & y_4   \\
    \end{bmatrix*} ) \Rightarrow \\
    \\
\Rightarrow J = softmax(x) * (I - softmax(x)^{T})
\end{gather*}

<IPython.core.display.Latex object>

You must have noticed that it is very similar to the derivative of the Sigmoid function but not exactly the same.

In [84]:
def softmax_dash(x):                           # Softmax Jacobian
    
    I = np.eye(x.shape[0])
    
    return softmax(x) * (I - softmax(x).T)

Let us have a look at an example

In [86]:
x = np.array([[0.25], [-1], [2.3], [-0.2], [1]])
print(x)
print(softmax(x))
print(np.sum(softmax(x)))

[[ 0.25]
 [-1.  ]
 [ 2.3 ]
 [-0.2 ]
 [ 1.  ]]
[[0.08468093]
 [0.02426149]
 [0.6577931 ]
 [0.05399495]
 [0.17926953]]
0.9999999999999999


You must have noticed that the sum of scalars in softmax ‘y’ is equal to 1.
Why? It is obvious from the definition of the Softmax function.

y1, y2, y3, y4… can be treated as probabilities or answers to a question because their sum is equal to 1. We will see more on this later when we will talk about the Categorical Cross-entropy loss function.

In [87]:
print(softmax_dash(x))
softmax_dash(x) == softmax_dash(x).T

[[ 0.07751007 -0.00205449 -0.05570253 -0.00457234 -0.01518071]
 [-0.00205449  0.02367287 -0.01595904 -0.00131    -0.00434935]
 [-0.05570253 -0.01595904  0.22510134 -0.0355175  -0.11792226]
 [-0.00457234 -0.00131    -0.0355175   0.05107949 -0.00967965]
 [-0.01518071 -0.00434935 -0.11792226 -0.00967965  0.14713197]]


array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

We can see that the Jacobian of the Softmax function is a symmetric matrix.

I hope that now you understand the Softmax function and its Jacobian.

With this, the third chapter is over. In the next post, we will start the Fourth Chapter — Losses and their derivatives with Mean Square Error.