## MLP 1d

In [1]:
import graphviz; graphviz.Source('''digraph { rankdir=LR; splines=false;
{rank=same x0[label=1] x[label=x]}
{rank=same h0[label=1] h[label="h"]}
{rank=same y0[shape=none label=""] y[label=y]}
x0->h[label=<<i>b<sub>1</sub></i>>] x->h[label=<<i>w</i>>]
h0->y[label=<<i>b<sub>2</sub></i>>] h->y[label=<<i>v</i>>]
edge[style=invis] x0->x h0->h y0->y x0->h0 h0->y0
}''').render(filename='0000_02_MLP1d', format='svg');

<div><table border-collapse: collapse><tr>
<td style="border: none; text-align:left; vertical-align:top; padding:0; margin:0;" width=500>

**Capa oculta:** $\quad h=\sigma(z)\quad$ con $\quad z=wx+b_1\quad w=b_1=1$

**Capa de salida:** $\qquad y=vh+b_2\quad$ con $\quad v=b_2=1$

**Pérdida cuadrática:** $\quad\mathcal{L}=\frac{1}{2}(y-\hat{y})^2\quad$ para $\quad\boldsymbol{x}=1\quad y=1$

</td><td style="border: none; padding:0; margin:0;" width=50><br></td>
<td style="border: none; text-align:left; vertical-align:top; padding:0; margin:0;" width=450>

<img src="0000_02_MLP1d.svg" width=400>

</td></tr></table></div>

**Forward:** $\;$ pre-activaciones, activaciones y pérdida

In [2]:
import numpy as np; np.set_printoptions(precision=4)
sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x));
x = 1.0; y = 1.0; w = b1 = 1.0; v = b2 = 1.0;
z = w * x + b1;                print("z =", round(z, 4))
h = sigmoid(z);                print("h =", round(h, 4))
haty = z * z + b2;             print("haty =", round(haty, 4))
loss = .5 * np.square(y-haty); print("loss =", round(loss, 4))

z = 2.0
h = 0.8808
haty = 5.0
loss = 8.0


**Backward:** $\;$ Jacobianas (derivadas) de la pérdida con respecto a $\ldots$
$$\begin{align*}
u=&\dfrac{d\mathcal{L}}{d\hat{y}}=\hat{y}-y
&&\text{la predicción (activación de la capa de salida)}\\
g_{v}=u&\dfrac{d\hat{y}}{dv}=hu
&&\text{la transformación lineal de la capa de salida}\\
g_{b_2}=u&\dfrac{d\hat{y}}{db_2}=u
&&\text{el sesgo de la capa de salida}\\
u=u&\dfrac{d\hat{y}}{dh}=uv
&&\text{la activación de la capa oculta}\\
u=u&\dfrac{dh}{dz}=u\sigma'(z)
&&\text{la pre-activación de la capa oculta}\\
g_{w}=u&\dfrac{dz}{dw}=xu
&&\text{la transformación lineal de la capa oculta}\\
g_{b_1}=u&\dfrac{dz}{db_1}=u
&&\text{el sesgo de la capa oculta}
\end{align*}$$

In [3]:
J_haty = haty-y;                         print("J_haty =", round(J_haty, 4))
J_v = h * J_haty;                        print("J_v =", round(J_v, 4))
J_b2 = J_haty;                           print("J_b2 =", round(J_b2, 4))
J_h = J_haty * v;                        print("J_h =", round(J_h, 4))
J_z = J_h * sigmoid(z) * sigmoid(-z);    print("J_z =", round(J_z, 4))
J_w = x * J_z;                           print("J_w =", round(J_w, 4))
J_b1 = J_z;                              print("J_b1 =", round(J_b1, 4));

J_haty = 4.0
J_v = 3.5232
J_b2 = 4.0
J_h = 4.0
J_z = 0.42
J_w = 0.42
J_b1 = 0.42


**Actualización de parámetros:** $\quad\boldsymbol{\theta}=\boldsymbol{\theta}-\rho\boldsymbol{g}_{\boldsymbol{\theta}}^t\quad$ con $\quad\rho=1$

In [4]:
v  = v  - 1.0 * J_v;  print("v =", round(v, 4))
b2 = b2 - 1.0 * J_b2; print("b2 =", round(b2, 4))
w  = w  - 1.0 * J_w;  print("w =", round(w, 4))
b1 = b1 - 1.0 * J_b1; print("b1 =", round(b1, 4))

v = -2.5232
b2 = -3.0
w = 0.58
b1 = 0.58
