**Capa oculta:** $\quad h=\sigma(z)\quad$ con $\quad z=\mathbf{W}\boldsymbol{x}+b_1\quad\mathbf{W}=\begin{pmatrix}-1&-1\end{pmatrix}\quad b_1=1$

**Capa de salida:** $\quad\hat{\boldsymbol{y}}=\boldsymbol{\sigma}(\boldsymbol{a})\quad$ con $\quad\boldsymbol{a}=\mathbf{V}h+\boldsymbol{b}_2\quad\mathbf{V}=\begin{pmatrix}-1\\-1\end{pmatrix}\quad\boldsymbol{b}_2=\begin{pmatrix}1\\1\end{pmatrix}$

**Pérdida cuadrática para un par entrada-salida:** $\quad\mathcal{L}=\frac{1}{2}\lVert\boldsymbol{y}-\hat{\boldsymbol{y}}\rVert_2^2\quad$ para $\quad\boldsymbol{x}=(-1,-1)^t\quad\boldsymbol{y}=(1,0)^t$

**Forward:** $\;$ pre-activaciones, activaciones y pérdida

In [1]:
import numpy as np; np.set_printoptions(precision=4)
sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x));
x = np.array([-1, -1]); y = np.array([1, 0])
W = np.array([-1, -1]); b1 = 1
V = np.array([-1, -1]); b2 = np.array([1, 1])
z = (W @ x + b1);                    print("z =", round(z, 4))
h = sigmoid(z);                      print("h =", round(h, 4))
a = np.dot(V, h) + b2;               print("a =", a)
haty = sigmoid(a);                   print("haty =", haty)
loss = .5 * np.square(y-haty).sum(); print("loss =", round(loss, 4))

z = 3
h = 0.9526
a = [0.0474 0.0474]
haty = [0.5119 0.5119]
loss = 0.2501


**Backward:** $\;$ Jacobianas de la pérdida con respecto a $\ldots$
$$\begin{align*}
\boldsymbol{u}^t=&\dfrac{\partial\mathcal{L}}{\partial\hat{\boldsymbol{y}}}=(\hat{\boldsymbol{y}}-\boldsymbol{y})^t
&&\text{la predicción (activación de la capa de salida)}\\
\boldsymbol{u}^t=\boldsymbol{u}^t&\dfrac{\partial\hat{\boldsymbol{y}}}{\partial\boldsymbol{a}}=\boldsymbol{u}^t\operatorname{diag}(\boldsymbol{\sigma}'(\boldsymbol{a}))
&&\text{la pre-activación de la capa de salida}\\
\boldsymbol{g}_{\mathbf{V}}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{a}}{\partial\mathbf{V}}=\boldsymbol{h}\boldsymbol{u}^t
&&\text{la transformación lineal de la capa de salida}\\
\boldsymbol{g}_{\boldsymbol{b}_2}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{a}}{\partial\boldsymbol{b}_2}=\boldsymbol{u}^t
&&\text{el sesgo de la capa de salida}\\
u=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{a}}{\partial h}=\boldsymbol{u}^t\mathbf{V}
&&\text{la activación de la capa oculta}\\
u=u&\dfrac{\partial h}{\partial z}=u\sigma'(z)
&&\text{la pre-activación de la capa oculta}\\
\boldsymbol{g}_{\mathbf{W}}=u&\dfrac{\partial z}{\partial\mathbf{W}}=\boldsymbol{x}u
&&\text{la transformación lineal de la capa oculta}\\
g_{b_1}=u&\dfrac{\partial z}{\partial b_1}=u
&&\text{el sesgo de la capa oculta}
\end{align*}$$

In [2]:
J_haty = haty-y;                         print("J_haty =", J_haty)
J_a = J_haty * sigmoid(a) * sigmoid(-a); print("J_a =", J_a)
J_V = np.outer(h, J_a);                  print("J_V =", J_V)
J_b2 = J_a;                              print("J_b2 =", J_b2)
J_h = J_a @ V;                           print("J_h =", round(J_h, 4))
J_z = J_h * sigmoid(z) * sigmoid(-z);    print("J_z =", round(J_z, 4))
J_W = np.outer(x, J_z);                  print("J_W =", J_W)
J_b1 = J_z;                              print("J_b1 =", round(J_b1, 4));

J_haty = [-0.4881  0.5119]
J_a = [-0.122   0.1279]
J_V = [[-0.1162  0.1218]]
J_b2 = [-0.122   0.1279]
J_h = -0.0059
J_z = -0.0003
J_W = [[0.0003]
 [0.0003]]
J_b1 = -0.0003


**Actualización de parámetros:** $\quad\mathbf{V}=\mathbf{V}-\rho\boldsymbol{g}_{\mathbf{V}}^t\qquad\boldsymbol{b}_2=\boldsymbol{b}_2-\eta\boldsymbol{g}_{\boldsymbol{b}_2}^t\qquad\mathbf{W}=\mathbf{W}-\eta\boldsymbol{g}_{\mathbf{W}}^t\qquad\boldsymbol{b}_1=\boldsymbol{b}_1-\eta\boldsymbol{g}_{\boldsymbol{b}_1}^t$


In [3]:
V  = V  - 1.0 * J_V.T; print("V =", V)
b2 = b2 - 1.0 * J_b2;  print("b2 =", b2)
W  = W  - 1.0 * J_W.T; print("W =", W)
b1 = b1 - 1.0 * J_b1;  print("b1 =", round(b1, 4))

V = [[-0.8838 -0.8838]
 [-1.1218 -1.1218]]
b2 = [1.122  0.8721]
W = [[-1.0003 -1.0003]]
b1 = 1.0003
