## MLP

In [6]:
import graphviz; graphviz.Source('''digraph { rankdir=LR; splines=false; node[shape=circle fontsize=12]
{rank=same x0[label=1] x1[label=<<i>x<sub>1</sub></i>>] xvdots[shape=none label="&#8942;"] 
xn[label=<<i>x<sub>n</sub></i>>]}
{rank=same h0[label=1] h1[label=<<i>h<sub>1</sub></i>>] hvdots[shape=none label="&#8942;"] 
hm[label=<<i>h<sub>m</sub></i>>]}
y[label=y] {x0 x1 xn} -> {h1 hm} -> y; h0 -> y
edge[style=invis] x0->x1->xvdots->xn h0->h1->hvdots->hm x0->h0
}''').render(filename='0000_03_MLP', format='svg');

<div><table border-collapse: collapse><tr>
<td style="border: none; text-align:left; vertical-align:top; padding:0; margin:0;" width=500>

**Entrada-salida:** $\qquad\boldsymbol{x}=(1,1,1)^t\qquad y=1$

**Capa oculta:** $\qquad\boldsymbol{h}=\boldsymbol{\sigma}(\boldsymbol{z})\qquad$ con $\qquad\boldsymbol{z}=\mathbf{W}\boldsymbol{x}+\boldsymbol{b}_1$
$$\mathbf{W}=\begin{pmatrix}1&1&1\\1&1&1\end{pmatrix}\qquad\boldsymbol{b}_1=\begin{pmatrix}1\\1\end{pmatrix}$$

**Capa de salida:** $\qquad\hat{y}=\mathbf{V}\boldsymbol{h}+b_2$ 
$$\mathbf{V}=\begin{pmatrix}1&1\end{pmatrix}\qquad b_2=1$$

**Pérdida cuadrática:** $\qquad\mathcal{L}=\frac{1}{2}(y-\hat{y})^2$

</td><td style="border: none; padding:0; margin:0;" width=10><br></td>
<td style="border: none; text-align:left; vertical-align:top; padding:0; margin:0;" width=400>

<img src="0000_03_MLP.svg" width=400>

</td></tr></table></div>

**Forward:** $\;$ pre-activaciones, activaciones y pérdida

In [7]:
import numpy as np; np.set_printoptions(precision=4)
sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x));
x = np.array([1, 1, 1]); y = 1
W = np.array([[1, 1, 1], [1, 1, 1]]); b1 = np.array([1, 1])
V = np.array([1, 1]); b2 = 1
z = W @ x + b1;                      print("z =", z)
h = sigmoid(z);                      print("h =", h)
haty = V @ h + b2;                   print("haty =", haty)
loss = .5 * np.square(y-haty).sum(); print("loss =", round(loss, 4))

z = [4 4]
h = [0.982 0.982]
haty = 2.964027580075817
loss = 1.9287


**Backward:** $\;$ Jacobianas de la pérdida con respecto a $\ldots$
$$\begin{align*}
u=&\dfrac{\partial\mathcal{L}}{\partial\hat{y}}=\hat{y}-y
&&\text{la predicción (activación de la capa de salida)}\\
\boldsymbol{g}_{\mathbf{V}}=u&\dfrac{\partial\hat{y}}{\partial\mathbf{V}}=\boldsymbol{h}u
&&\text{la transformación lineal de la capa de salida}\\
g_{\boldsymbol{b}_2}=u&\dfrac{\partial\hat{y}}{\partial b_2}=u
&&\text{el sesgo de la capa de salida}\\
\boldsymbol{u}^t=u&\dfrac{\partial\hat{y}}{\partial\boldsymbol{h}}=u\mathbf{V}
&&\text{la activación de la capa oculta}\\
\boldsymbol{u}^t=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{h}}{\partial\boldsymbol{z}}=\boldsymbol{u}^t\operatorname{diag}(\boldsymbol{\sigma}'(\boldsymbol{z}))
&&\text{la pre-activación de la capa oculta}\\
\boldsymbol{g}_{\mathbf{W}}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\boldsymbol{x}\boldsymbol{u}^t
&&\text{la transformación lineal de la capa oculta}\\
\boldsymbol{g}_{\boldsymbol{b}_1}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\boldsymbol{b}_1}=\boldsymbol{u}^t
&&\text{el sesgo de la capa oculta}
\end{align*}$$

In [8]:
J_haty = haty-y;                         print("J_haty =", round(J_haty, 4))
J_V = np.outer(h, J_haty);               print("J_V =", J_V)
J_b2 = J_haty;                           print("J_b2 =", round(J_b2, 4))
J_h = J_haty * V;                        print("J_h =", J_h)
J_z = J_h * sigmoid(z) * sigmoid(-z);    print("J_z =", J_z)
J_W = np.outer(x, J_z);                  print("J_W =", J_W)
J_b1 = J_z;                              print("J_b1 =", J_b1);

J_haty = 1.964
J_V = [[1.9287]
 [1.9287]]
J_b2 = 1.964
J_h = [1.964 1.964]
J_z = [0.0347 0.0347]
J_W = [[0.0347 0.0347]
 [0.0347 0.0347]
 [0.0347 0.0347]]
J_b1 = [0.0347 0.0347]


**Actualización de parámetros:** $\quad\boldsymbol{\theta}=\boldsymbol{\theta}-\rho\boldsymbol{g}_{\boldsymbol{\theta}}^t$

In [9]:
V  = V  - 1.0 * J_V.T; print("V =", V)
b2 = b2 - 1.0 * J_b2;  print("b2 =", round(b2, 4))
W  = W  - 1.0 * J_W.T; print("W =", W)
b1 = b1 - 1.0 * J_b1;  print("b1 =", b1)

V = [[-0.9287 -0.9287]]
b2 = -0.964
W = [[0.9653 0.9653 0.9653]
 [0.9653 0.9653 0.9653]]
b1 = [0.9653 0.9653]
