# 13.3.3 Productos vector-Jacobiana para las capas

**Recordatorio sobre la Jacobiana de $\boldsymbol{f}$:** $\quad\mathbf{J}_{\boldsymbol{f}}(\boldsymbol{x})\in\mathbf{R}^{m\times n}$
$$\mathbf{J}_{\boldsymbol{f}}(\boldsymbol{x})%
=\frac{\partial\boldsymbol{f}(\boldsymbol{x})}{\partial\boldsymbol{x}}%
=\begin{bmatrix}%
\frac{\partial f_1}{\partial x_1}&\cdots&\frac{\partial f_1}{\partial x_n}\\%
\vdots&\ddots&\vdots\\%
\frac{\partial f_m}{\partial x_1}&\cdots&\frac{\partial f_m}{\partial x_n}%
\end{bmatrix}%
=\begin{bmatrix}
\frac{\partial f_1}{\partial\boldsymbol{x}}\\%
\vdots\\%
\frac{\partial f_m}{\partial\boldsymbol{x}}%
\end{bmatrix}%
=\begin{bmatrix}
\frac{\partial\boldsymbol{f}}{\partial x_1},\ldots,\frac{\partial\boldsymbol{f}}{\partial x_n}%
\end{bmatrix}$$

**VJPs para capas:** $\;$ el paso backward de backprop requiere calcular productos vector-Jacobiana para distintos tipos de capa
* Capa de entropía cruzada
* Capa de no-linealidad elemental
* Capa lineal

# 13.3.3.1 Capa de entropía cruzada

**Capa de entropía cruzada:** $\;$ toma logits $\boldsymbol{a}$ y etiquetas (one-hot) $\boldsymbol{y}\in\{0,1\}^C$; devuelve un escalar
$$\mathcal{L}=\operatorname{CrossEntropy}(\boldsymbol{y},\boldsymbol{a})=-\sum_cy_c\log p_c%
\qquad\text{con}\qquad%
p_c=\mathcal{S}(\boldsymbol{a})_c=\dfrac{\exp(a_c)}{\sum_{c'}\exp(a_{c'})}$$

**Jacobiana con respecto a $\boldsymbol{a}$:** $\quad\displaystyle\mathbf{J}=\frac{\partial \mathcal{L}}{\partial\boldsymbol{a}}=(\boldsymbol{p}-\boldsymbol{y})^t\in\mathbb{R}^{1\times C}$

* Si $\,\boldsymbol{y}=\operatorname{one-hot}(c):\qquad\displaystyle \mathcal{L}=-\log(p_c)=-\log\biggl[\frac{\exp(a_c)}{\sum_j\exp(a_j)}\biggr]=\log\biggl[\sum\nolimits_j\exp(a_j)\biggr]-a_c$

* Para todo $i:\qquad\displaystyle \frac{\partial \mathcal{L}}{\partial a_i}=\frac{\partial}{\partial a_i}\log\sum_j\exp(a_j)-\frac{\partial}{\partial a_i}a_c=\frac{\exp(a_i)}{\sum_j\exp(a_j)}-\mathbb{I}(i=c)=p_i-\mathbb{I}(i=c)$

# 13.3.3.2 Capa de no-linealidad elemental

**Capa de no-linealidad elemental:** $\qquad\boldsymbol{h}=\boldsymbol{f}(\boldsymbol{z})=\varphi(\boldsymbol{z})$

**Jacobiana con respecto a $\boldsymbol{z}$:** $\qquad\displaystyle\mathbf{J}=\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{z}}=\operatorname{diag}(\varphi'(\boldsymbol{z}))$

* Para toda fila $i$ y columna $j$: $\qquad\displaystyle\frac{\partial h_i}{\partial z_j}=\mathbb{I}(i=j)\,\frac{d}{dz_i}\varphi(z_i)=\mathbb{I}(i=j)\,\varphi'(z_i)$

* Si $\;\varphi(z)=\operatorname{ReLU}(z)=\max(z,0):\qquad \varphi'(z)=H(z)=\mathbb{I}(z>0)$

# 13.3.3.3 Capa lineal

**Capa lineal:** $\qquad\boldsymbol{z}=\boldsymbol{f}(\boldsymbol{x},\mathbf{W})=\mathbf{W}\boldsymbol{x}\quad\text{donde}\quad\boldsymbol{x}\in\mathbb{R}^n,\;\mathbf{W}\in\mathbb{R}^{m\times n},\;\boldsymbol{z}\in\mathbb{R}^m$

## VJP de la Jacobiana con respecto a $\boldsymbol{x}$

**Jacobiana con respecto a $\boldsymbol{x}$:** $\qquad\displaystyle\mathbf{J}=\frac{\partial\boldsymbol{z}}{\partial\boldsymbol{x}}=\mathbf{W}$

* Para todo $i:\qquad\displaystyle z_i=\sum_{k=1}^nW_{ik}x_k$

* Para toda fila $i$ y columna $j$: $\qquad\displaystyle\frac{\partial z_i}{\partial x_j}=\frac{\partial}{\partial x_j}\sum_{k=1}^nW_{ik}x_k=\sum_{k=1}^nW_{ik}\frac{\partial x_k}{\partial x_j}=\sum_{k=1}^nW_{ik}\mathbb{I}(k=j)=W_{ij}$

**VJP de $\boldsymbol{u}^t\in\mathbb{R}^{1\times m}$ y la Jacobiana con respecto a $\boldsymbol{x}$:** $\qquad\displaystyle\boldsymbol{u}^t\frac{\partial\boldsymbol{z}}{\partial\boldsymbol{x}}=\boldsymbol{u}^t\mathbf{W}\in\mathbb{R}^{1\times n}$

## VJP de la Jacobiana con respecto a $\mathbf{W}$

**Jacobiana con respecto a un peso $W_{ij}$:** $\qquad\displaystyle\frac{\partial\boldsymbol{z}}{\partial W_{ij}}=\begin{pmatrix}\boldsymbol{0}\\x_j\\\boldsymbol{0}\end{pmatrix}\in\mathbb{R}^m\quad$ ($x_j$ en posición $i$)

* Para todo $k:\qquad\displaystyle z_k=\sum_{l=1}^mW_{kl}x_l$

* Para todo $i$ y $j$: $\qquad\displaystyle\frac{\partial z_k}{\partial W_{ij}}=\frac{\partial}{\partial W_{ij}}\sum_{l=1}^mx_lW_{kl}=\sum_{l=1}^mx_l\frac{\partial W_{kl}}{\partial W_{ij}}=\sum_{l=1}^mx_l\mathbb{I}(i=k)\mathbb{I}(j=l)=x_j\mathbb{I}(i=k)$

**Jacobiana con respecto a $\mathbf{W}$:**  $\qquad\displaystyle\mathbf{J}=\frac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\begin{bmatrix}\frac{\partial\boldsymbol{z}}{\partial W_{1,1}},\ldots,\frac{\partial\boldsymbol{z}}{\partial W_{m,n}}\end{bmatrix}\in\mathbb{R}^{m\times mn}$

* Formato: $\qquad\displaystyle\frac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\begin{bmatrix}\frac{\partial z_1}{\partial\mathbf{W}}\\\vdots\\\frac{\partial z_m}{\partial\mathbf{W}}\end{bmatrix}\in\mathbb{R}^{m\times mn}\rightarrow\left[\begin{bmatrix}\frac{\partial z_1}{\partial W_{11}}&\cdots&\frac{\partial z_1}{\partial W_{1n}}\\\vdots&\ddots&\vdots\\\frac{\partial z_m}{\partial W_{m1}}&\cdots&\frac{\partial z_1}{\partial W_{mn}}\end{bmatrix},\dotsc,\begin{bmatrix}\frac{\partial z_m}{\partial W_{11}}&\cdots&\frac{\partial z_m}{\partial W_{1n}}\\\vdots&\ddots&\vdots\\\frac{\partial z_m}{\partial W_{m1}}&\cdots&\frac{\partial z_m}{\partial W_{mn}}\end{bmatrix}\right]^t\in\mathbb{R}^{m\times(m\times n)}$ 

**VJP de $\boldsymbol{u}^t\in\mathbb{R}^{1\times m}$ y la Jacobiana con respecto a $\mathbf{W}$:** $\qquad\displaystyle\biggl[\boldsymbol{u}^t\frac{\partial\boldsymbol{z}}{\partial\mathbf{W}}\biggr]_{1,:}=\boldsymbol{u}\boldsymbol{x}^t\in\mathbb{R}^{m\times n}$

* Formato: $\qquad\displaystyle\boldsymbol{u}^t\frac{\partial\boldsymbol{z}}{\partial\mathbf{W}}=\left[u_1\begin{bmatrix}\frac{\partial z_1}{\partial W_{11}}&\cdots&\frac{\partial z_1}{\partial W_{1n}}\\\vdots&\ddots&\vdots\\\frac{\partial z_m}{\partial W_{m1}}&\cdots&\frac{\partial z_1}{\partial W_{mn}}\end{bmatrix}+\cdots+u_m\begin{bmatrix}\frac{\partial z_m}{\partial W_{11}}&\cdots&\frac{\partial z_m}{\partial W_{1n}}\\\vdots&\ddots&\vdots\\\frac{\partial z_m}{\partial W_{m1}}&\cdots&\frac{\partial z_m}{\partial W_{mn}}\end{bmatrix}\right]$

* Para todo $i$ y $j$: $\qquad\displaystyle\boldsymbol{u}^t\frac{\partial\boldsymbol{z}}{\partial W_{ij}}=\sum_{k=1}^mu_k\frac{\partial z_k}{\partial W_{ij}}=u_ix_j$