# 13.3.3 Productos vector-Jacobiana para las capas

**Recordatorio sobre la Jacobiana de $\boldsymbol{f}$:** $\quad\mathbf{J}_{\boldsymbol{f}}(\boldsymbol{x})\in\mathbf{R}^{m\times n}$
$$\mathbf{J}_{\boldsymbol{f}}(\boldsymbol{x})%
=\frac{\partial\boldsymbol{f}(\boldsymbol{x})}{\partial\boldsymbol{x}}%
=\begin{bmatrix}%
\frac{\partial f_1}{\partial x_1}&\cdots&\frac{\partial f_1}{\partial x_n}\\%
\vdots&\ddots&\vdots\\%
\frac{\partial f_m}{\partial x_1}&\cdots&\frac{\partial f_m}{\partial x_n}%
\end{bmatrix}%
=\begin{bmatrix}
\frac{\partial f_1}{\partial\boldsymbol{x}}\\%
\vdots\\%
\frac{\partial f_m}{\partial\boldsymbol{x}}%
\end{bmatrix}%
=\begin{bmatrix}
\frac{\partial\boldsymbol{f}}{\partial x_1},\ldots,\frac{\partial\boldsymbol{f}}{\partial x_n}%
\end{bmatrix}$$

**VJPs para capas:** $\;$ el paso backward de backprop requiere calcular productos vector-Jacobiana para distintos tipos de capa
* Capa de entropía cruzada
* Capa de no-linealidad elemental
* Capa lineal

## Forward con el modelo inicial para XOR

Los VJPs para las capas del modelo XOR (paso backward) requieren aplicar primero el paso forward.

<div align="center">

|$x_1$|$x_2$|$\boldsymbol{z}=\mathbf{W}\boldsymbol{x}+\boldsymbol{b}_1$|$\boldsymbol{h}=\operatorname{ReLU}(\boldsymbol{z})$|$\boldsymbol{a}=\mathbf{V}\boldsymbol{h}+\boldsymbol{b}_2$|$\hat{\boldsymbol{y}}=\mathcal{S}(\boldsymbol{a})$|$-\boldsymbol{y}^t\log(\hat{\boldsymbol{y}})$|
|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|$0$|$0$|$(-1, 0.5)^t$|$(0, 0.5)^t$|$(0.5, -0.5)^t$|$(0.7311, 0.2689)^t$|$(0.3133, 0)^7$|
|$0$|$1$|$(0, 1.5)^t$ |$(0, 1.5)^t$|$(-0.5, 0.5)^t$|$(0.2689, 0.7311)^t$|$(0, 0.3133)^t$|
|$1$|$0$|$(0, 1.5)^t$ |$(0, 1.5)^t$|$(-0.5, 0.5)^t$|$(0.2689, 0.7311)^t$|$(0, 0.3133)^t$|
|$1$|$1$|$(1, 2.5)^t$ |$(1, 2.5)^t$|$(-0.5, 0.5)^t$|$(0.2689, 0.7311)^t$|$(1.3133, 0)^t$|

</div>

In [12]:
import numpy as np; np.set_printoptions(precision=4)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[1, 0], [0, 1], [0, 1], [1, 0]])
W = np.array([[1, 1], [1, 1]]); b1 = np.array([-1,  .5])
z = X @ W + b1; print('z =', str(z).replace('\n',''))
h = np.maximum(0, z); print('h =', str(h).replace('\n',''))
V = np.array([[1, -1], [-1, 1]]); b2 = np.array([ 1, -1])
a = h @ V + b2; print('a =', str(a).replace('\n',''))
p = np.exp(a); p = np.transpose(p.T / p.sum(axis=1)); print('p =', str(p).replace('\n',''))
Ln = -y * np.log(p); print('Ln =', str(Ln).replace('\n',''))
print('loss =', np.sum(Ln)/4.)

z = [[-1.   0.5] [ 0.   1.5] [ 0.   1.5] [ 1.   2.5]]
h = [[0.  0.5] [0.  1.5] [0.  1.5] [1.  2.5]]
a = [[ 0.5 -0.5] [-0.5  0.5] [-0.5  0.5] [-0.5  0.5]]
p = [[0.7311 0.2689] [0.2689 0.7311] [0.2689 0.7311] [0.2689 0.7311]]
Ln = [[ 0.3133 -0.    ] [-0.      0.3133] [-0.      0.3133] [ 1.3133 -0.    ]]
loss = 0.5632616875182226
