**Capa oculta:** $\quad\boldsymbol{h}=\boldsymbol{\sigma}(\boldsymbol{z})\quad$ con $\quad\boldsymbol{z}=\mathbf{W}\boldsymbol{x}+\boldsymbol{b}_1$ 
$$\mathbf{W}=\begin{pmatrix}-1.5&-1&0.5\\1&1&-0.5\end{pmatrix}\qquad\boldsymbol{b}_1=\begin{pmatrix}1\\-2\end{pmatrix}$$

**Capa de salida:** $\quad\hat{\boldsymbol{y}}=\mathbf{V}\boldsymbol{h}+\boldsymbol{b}_2$
$$\mathbf{V}=\begin{pmatrix}-1.5&1\\0.5&1\\-0.5&-1\end{pmatrix}\qquad\boldsymbol{b}_2=\begin{pmatrix}0.5\\1\\-0.5\end{pmatrix}$$

**Pérdida cuadrática (para un par entrada-salida):** $\quad\mathcal{L}=\frac{1}{2}\lVert\boldsymbol{y}-\hat{\boldsymbol{y}}\rVert_2^2$

**Par entrada-salida:** $\quad\boldsymbol{x}=(1,0,2)^t\qquad\boldsymbol{y}=(-1,1,-2)^t$

**Forward:** $\;$ pre-activaciones, activaciones y pérdida

In [1]:
import numpy as np; np.set_printoptions(precision=4)
sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x));
x = np.array([1, 0, 2]); y = np.array([-1, 1, -2])
W = np.array([[-1.5, -1, 0.5], [1, 1, -0.5]]); b1 = np.array([1, -2])
V = np.array([[-1.5, 1], [0.5, 1], [-0.5, -1]]); b2 = np.array([0.5, 1, -0.5])
z = (W @ x + b1);                    print("z =", z)
h = sigmoid(z);                      print("h =", h)
haty = V @ h + b2;                   print("haty =", haty)
loss = .5 * np.square(y-haty).sum(); print("loss =", round(loss, 4))

z = [ 0.5 -2. ]
h = [0.6225 0.1192]
haty = [-0.3145  1.4304 -0.9304]
loss = 0.8996


**Backward:** $\;$ Jacobianas de la pérdida con respecto a $\ldots$
$$\begin{align*}
\boldsymbol{u}^t=&\dfrac{\partial\mathcal{L}}{\partial\hat{\boldsymbol{y}}}=(\hat{\boldsymbol{y}}-\boldsymbol{y})^t
&&\text{la predicción (activación de la capa de salida)}\\
\boldsymbol{g}_{\mathbf{V}}=\boldsymbol{u}^t&\dfrac{\partial\hat{\boldsymbol{y}}}{\partial\mathbf{V}}=\boldsymbol{h}\boldsymbol{u}^t
&&\text{la transformación lineal de la capa de salida}\\
\boldsymbol{g}_{\boldsymbol{b}_2}=\boldsymbol{u}^t&\dfrac{\partial\hat{\boldsymbol{y}}}{\partial\boldsymbol{b}_2}=\boldsymbol{u}^t
&&\text{el sesgo de la capa de salida}\\
\boldsymbol{u}^t=\boldsymbol{u}^t&\dfrac{\partial\hat{\boldsymbol{y}}}{\partial\boldsymbol{h}}=\boldsymbol{u}^t\mathbf{V}
&&\text{la activación de la capa oculta}\\
\boldsymbol{u}^t=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{h}}{\partial\boldsymbol{z}}=\boldsymbol{u}^t\operatorname{diag}(\boldsymbol{\sigma}'(\boldsymbol{z}))
&&\text{la pre-activación de la capa oculta}\\
\boldsymbol{g}_{\mathbf{W}}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\boldsymbol{x}\boldsymbol{u}^t
&&\text{la transformación lineal de la capa oculta}\\
\boldsymbol{g}_{\boldsymbol{b}_1}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\boldsymbol{b}_1}=\boldsymbol{u}^t
&&\text{el sesgo de la capa oculta}
\end{align*}$$

In [2]:
J_haty = haty-y;                         print("J_haty =", J_haty)
J_V = np.outer(h, J_haty);               print("J_V =", J_V)
J_b2 = J_haty;                           print("J_b2 =", J_b2)
J_h = J_haty @ V;                        print("J_h =", J_h)
J_z = J_h * sigmoid(z) * sigmoid(-z);    print("J_z =", J_z)
J_W = np.outer(x, J_z);                  print("J_W =", J_W)
J_b1 = J_z;                              print("J_b1 =", J_b1);

J_haty = [0.6855 0.4304 1.0696]
J_V = [[0.4267 0.2679 0.6658]
 [0.0817 0.0513 0.1275]]
J_b2 = [0.6855 0.4304 1.0696]
J_h = [-1.3478  0.0464]
J_z = [-0.3167  0.0049]
J_W = [[-0.3167  0.0049]
 [-0.      0.    ]
 [-0.6335  0.0097]]
J_b1 = [-0.3167  0.0049]


**Actualización de parámetros:** $\quad\mathbf{V}=\mathbf{V}-\rho\boldsymbol{g}_{\mathbf{V}}^t\qquad\boldsymbol{b}_2=\boldsymbol{b}_2-\eta\boldsymbol{g}_{\boldsymbol{b}_2}^t\qquad\mathbf{W}=\mathbf{W}-\eta\boldsymbol{g}_{\mathbf{W}}^t\qquad\boldsymbol{b}_1=\boldsymbol{b}_1-\eta\boldsymbol{g}_{\boldsymbol{b}_1}^t$


In [3]:
V  = V  - 1.0 * J_V.T; print("V =", V)
b2 = b2 - 1.0 * J_b2;  print("b2 =", b2)
W  = W  - 1.0 * J_W.T; print("W =", W)
b1 = b1 - 1.0 * J_b1;  print("b1 =", b1)

V = [[-1.9267  0.9183]
 [ 0.2321  0.9487]
 [-1.1658 -1.1275]]
b2 = [-0.1855  0.5696 -1.5696]
W = [[-1.1833 -1.      1.1335]
 [ 0.9951  1.     -0.5097]]
b1 = [ 1.3167 -2.0049]
