**Capa oculta:** $\quad\boldsymbol{h}=\boldsymbol{\sigma}(\boldsymbol{z})\quad$ con $\quad\boldsymbol{z}=\mathbf{W}\boldsymbol{x}+\boldsymbol{b}_1$ 
$$\mathbf{W}=\begin{pmatrix}-1&-1\\1&-1\end{pmatrix}\qquad\boldsymbol{b}_1=\begin{pmatrix}1\\-1\end{pmatrix}$$

**Capa de salida:** $\quad\hat{y}=\mathbf{V}\boldsymbol{h}+b_2$
$$\mathbf{V}=\begin{pmatrix}1&-1\end{pmatrix}\qquad b_2=1$$

**Pérdida cuadrática (para un par entrada-salida):** $\quad\mathcal{L}=\frac{1}{2}(y-\hat{y})^2$

**Par entrada-salida:** $\quad\boldsymbol{x}=(2,2)^t\qquad y=-1$

**Forward:** $\;$ pre-activaciones, activaciones y pérdida

In [1]:
import numpy as np; np.set_printoptions(precision=4)
sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x));
x = np.array([2, 2]); y = -1
W = np.array([[-1, -1], [1, -1]]); b1 = np.array([1, -1])
V = np.array([1, -1]); b2 = 1
z = (W @ x + b1); print("z =", z)
h = sigmoid(z); print("h =", h)
haty = V @ h + b2; print("haty =", round(haty, 4))
loss = .5 * np.square(y-haty).sum(); print("loss =", round(loss, 4))

z = [-3 -1]
h = [0.0474 0.2689]
haty = 0.7785
loss = 1.5815


**Backward:** $\;$ Jacobianas de la pérdida con respecto a $\ldots$
$$\begin{align*}
u=&\dfrac{\partial\mathcal{L}}{\partial\hat{y}}=\hat{y}-y
&&\text{la predicción (activación de la capa de salida)}\\
\boldsymbol{g}_{\mathbf{V}}=u&\dfrac{\partial\hat{y}}{\partial\mathbf{V}}=\boldsymbol{h}u
&&\text{la transformación lineal de la capa de salida}\\
g_{\boldsymbol{b}_2}=u&\dfrac{\partial\hat{y}}{\partial b_2}=u
&&\text{el sesgo de la capa de salida}\\
\boldsymbol{u}^t=u&\dfrac{\partial\hat{y}}{\partial\boldsymbol{h}}=u\mathbf{V}
&&\text{la activación de la capa oculta}\\
\boldsymbol{u}^t=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{h}}{\partial\boldsymbol{z}}=\boldsymbol{u}^t\operatorname{diag}(\boldsymbol{\sigma}'(\boldsymbol{z}))
&&\text{la pre-activación de la capa oculta}\\
\boldsymbol{g}_{\mathbf{W}}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\boldsymbol{x}\boldsymbol{u}^t
&&\text{la transformación lineal de la capa oculta}\\
\boldsymbol{g}_{\boldsymbol{b}_1}=\boldsymbol{u}^t&\dfrac{\partial\boldsymbol{z}}{\partial\boldsymbol{b}_1}=\boldsymbol{u}^t
&&\text{el sesgo de la capa oculta}
\end{align*}$$

In [2]:
J_haty = haty-y;                         print("J_haty =", J_haty)
J_V = np.outer(h, J_haty);               print("J_V =", J_V)
J_b2 = J_haty;                           print("J_b2 =", J_b2)
J_h = J_haty * V;                        print("J_h =", J_h)
J_z = J_h * sigmoid(z) * sigmoid(-z);    print("J_z =", J_z)
J_W = np.outer(x, J_z);                  print("J_W =", J_W)
J_b1 = J_z;                              print("J_b1 =", J_b1);

J_haty = 1.7784844518075715
J_V = [[0.0843]
 [0.4783]]
J_b2 = 1.7784844518075715
J_h = [ 1.7785 -1.7785]
J_z = [ 0.0803 -0.3497]
J_W = [[ 0.1607 -0.6993]
 [ 0.1607 -0.6993]]
J_b1 = [ 0.0803 -0.3497]


**Actualización de parámetros:** $\quad\mathbf{V}=\mathbf{V}-\rho\boldsymbol{g}_{\mathbf{V}}^t\qquad b_2=b_2-\eta g_{b_2}\qquad\mathbf{W}=\mathbf{W}-\eta\boldsymbol{g}_{\mathbf{W}}^t\qquad\boldsymbol{b}_1=\boldsymbol{b}_1-\eta\boldsymbol{g}_{\boldsymbol{b}_1}^t$


In [3]:
V  = V  - 1.0 * J_V.T; print("V =", V)
b2 = b2 - 1.0 * J_b2;  print("b2 =", b2)
W  = W  - 1.0 * J_W.T; print("W =", W)
b1 = b1 - 1.0 * J_b1;  print("b1 =", b1)

V = [[ 0.9157 -1.4783]]
b2 = -0.7784844518075715
W = [[-1.1607 -1.1607]
 [ 1.6993 -0.3007]]
b1 = [ 0.9197 -0.6503]
