# neuralthreads
[medium](https://neuralthreads.medium.com/i-was-not-satisfied-by-any-deep-learning-tutorials-online-37c5e9f4bea1)

## Chapter 4 — Losses and their derivatives

Mean Square Error — The most used Regression loss

Let us start the fourth chapter — Losses and their gradients or derivatives with Mean Square error. This error is generally used if regression problems

### 4.1 What is Mean Square error and how to compute its gradients?

Suppose we have true values,

In [11]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_true = y = 
    \begin{bmatrix*}
    y_1 \\
    y_2 \\
    y_3 
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

and predicted values,

In [2]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
    y\_pred = \hat{y} = 
    \begin{bmatrix*}
    \hat{y_1} \\
    \hat{y_2} \\
    \hat{y_3} 
    \end{bmatrix*}
\end{gather*}

<IPython.core.display.Latex object>

Then Mean Square Error is calculated as follow:

In [17]:
%%latex
\begin{gather*}
\newcommand{\arraystretch}{1.5}
   MSE = \dfrac{1}{N}\sum_{i=1}^{i=N}(y\_true_i - y\_pred_i)^2 \Rightarrow \\
    \\
\Rightarrow
   MSE = \dfrac{1}{N}\sum_{i=1}^{i=N}(y_i - \hat{y_i})^2 \Rightarrow \\
    \\
\Rightarrow MSE = \dfrac{1}{3}[(y_1 - \hat{y_1})^2 + (y_2 - \hat{y_2})^2 + (y_3 - \hat{y_3})^2] \\

\end{gather*}

<IPython.core.display.Latex object>

We can easily calculate Mean Square Error in Python like this.

In [29]:
import numpy as np                             # importing NumPy
np.random.seed(42)

def mse(y_true, y_pred):                     # MSE
    return np.mean((y_true - y_pred)**2)

Now, we know that

In [18]:
%%latex
\begin{gather*}
    MSE = f(\hat{y_1},\hat{y_2},\hat{y_3})
\end{gather*}

<IPython.core.display.Latex object>

So, like the [**Softmax**](./20_activation_softmax.ipynb) activation function, we have a **Jacobian** for MSE.

In [23]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{2.5}
    J = \dfrac{\partial{(MSE)}}{(\hat{y_1},\hat{y_2},\hat{y_3})} =     
    \begin{bmatrix*}
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_1})} \\
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_2})} \\
    \dfrac{\partial{(MSE)}}{\partial(\hat{y_3})} 
    \end{bmatrix*}
    
\end{gather*}


<IPython.core.display.Latex object>

We can easily find each term in this Jacobian.

In [26]:
%%latex
\begin{gather*}
    \newcommand{\arraystretch}{3}
    \Rightarrow J =  
    \begin{bmatrix*}
      \frac{-2(y_1 - \hat{y_1})}{3} \\
      \frac{-2(y_2 - \hat{y_2})}{3} \\ 
      \frac{-2(y_3 - \hat{y_3})}{3} \\
    \end{bmatrix*} \Rightarrow  \\
    \newcommand{\arraystretch}{1.5}
    \Rightarrow J =  - \frac{2}{3} (
    \begin{bmatrix*}
      y_{1} - \hat{y_{1}} \\
      y_{2} - \hat{y_{2}} \\ 
      y_{3} - \hat{y_{3}} \\
    \end{bmatrix*} ) \Rightarrow  \\
     \newcommand{\arraystretch}{1.5}
    \Rightarrow J =  - \frac{2}{3} (
    \begin{bmatrix*}
      y_{1} \\
      y_{2} \\ 
      y_{3} \\
    \end{bmatrix*} - 
    \begin{bmatrix*}
      \hat{y_{2}} \\
      \hat{y_{2}} \\ 
      \hat{y_{3}} \\
    \end{bmatrix*}) \Rightarrow  \\
    \\
  \Rightarrow J = - \frac{2}{3} (y\_true - y\_pred)
\end{gather*}

<IPython.core.display.Latex object>

> Note — Here, 3 represents ‘N’, i.e., the entries in y_true and y_pred

We can reduce it to define the MSE Jacobian in Python like this.

In [37]:
def mse_grad(y_true, y_pred):
    N = y_true.shape[0]
    return -2*(y_true - y_pred)/N

Let us have a look at an example.

In [38]:
y_true = np.array([[1.5], [0.2], [3.9], [6.2], [5.2]])
print(y_true)
y_pred = np.array([[1.2], [0.5], [3.2], [4.2], [3.2]])
print(y_pred)
print(y_pred.shape)

[[1.5]
 [0.2]
 [3.9]
 [6.2]
 [5.2]]
[[1.2]
 [0.5]
 [3.2]
 [4.2]
 [3.2]]
(5, 1)


In [39]:
mse(y_true, y_pred)

1.734

In [40]:
mse_grad(y_true, y_pred)

array([[-0.12],
       [ 0.12],
       [-0.28],
       [-0.8 ],
       [-0.8 ]])