# Neural network general


Two perspectives on neural network representation. 

- Scalar-centered, in which single weights and units have their own nodes
- Vector-centered, which groups weights that we treat in the same way in the network.

For illustration purposes, neural networks are often depicted in their scalar form, but implemented using vectors. For large networks the vector representation has some advantages 

 

Feed-forward neural networks could be alternately named *sequential non-linear vector projections*.


## The purpose of non-linearities

## Forward pass

$$
r_0 = g_r(x_0 W_{x_0, r_0} + x_1 W_{x_1, r_0})\\
r_1 = g_r(x_0 W_{x_0, r_1} + x_1 W_{x_1, r_1})\\
s_0 = g_s(r_0 W_{r_0, s_0} + r_1 W_{r_1, s_0})\\
s_1 = g_s(r_0 W_{r_0, s_1} + r_1 W_{r_1, s_1})\\
y = g_y(s_0 W_{s_0, y} + s_1 W_{s_1, y})
$$
$$
\begin{align}
y = g_y\Big(& g_s\big(g_r(x_0 W_{x_0, r_0} + x_1 W_{x_1, r_0}) W_{r_0, s_0} + g_r(x_0 W_{x_0, r_1} + x_1 W_{x_1, r_1}) W_{r_1, s_0}\big) W_{s_0, y} +\\& g_s\big(g_r(x_0 W_{x_0, r_0} + x_1 W_{x_1, r_0}) W_{r_0, s_1} + g_r(x_0 W_{x_0, r_1} + x_1 W_{x_1, r_1}) W_{r_1, s_1}\big) W_{s_1, y}\Big)
\end{align}
$$

$$
\begin{align}
y = g_y \Big(& g_s \big( \color{blue}{g_r(x_0 W_{x_0, r_0} + x_1 W_{x_1, r_0})} W_{r_0, s_0} + \color{blue}{g_r(x_0 W_{x_0, r_1} + x_1 W_{x_1, r_1})} W_{r_1, s_0} \big) W_{s_0, y} +\\& g_s \big( \color{blue}{g_r(x_0 W_{x_0, r_0} + x_1 W_{x_1, r_0})} W_{r_0, s_1} + \color{blue}{g_r(x_0 W_{x_0, r_1} + x_1 W_{x_1, r_1})} W_{r_1, s_1} \big) W_{s_1, y} \Big)
\end{align}
$$

## Backward pass


$$
\begin{align}
%\frac{\partial f}{\partial w(s_0 \to y)} = & f'(s_0 \to y)\\
%\frac{\partial f}{\partial w(s_1 \to y)} = & f'(s_1 \to y)\\
\frac{\partial f}{\partial (r_0 \to s_0)} = & \frac{\partial f}{\partial (s_0 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_0)}\\
\frac{\partial f}{\partial (r_0 \to s_1)} = & \frac{\partial f}{\partial (s_1 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_1)}\\
\frac{\partial f}{\partial (r_1 \to s_0)} = & \frac{\partial f}{\partial (s_0 \to y)} + r_0 \frac{\partial g_s}{\partial (r_1 \to s_0)}\\
\frac{\partial f}{\partial (r_1 \to s_1)} = & \frac{\partial f}{\partial (s_0 \to y)} + r_0 \frac{\partial g_s}{\partial (r_1 \to s_1)}\\
\frac{\partial f}{\partial (x_0 \to r_0)} = & x_0 \frac{\partial g_r}{\partial (x_0 \to r_0)} \Bigg( \color{blue}{\Big( \frac{\partial f}{\partial (s_0 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_0)} \Big)} + \color{blue}{\Big( \frac{\partial f}{\partial (s_1 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_1)} \Big)} \Bigg)\\
\frac{\partial f}{\partial (x_0 \to r_1)} = & x_0 \frac{\partial g_r}{\partial (x_0 \to r_1)} \Bigg( \color{blue}{\Big( \frac{\partial f}{\partial (s_0 \to y)} + r_1 \frac{\partial g_s}{\partial (r_1 \to s_0)} \Big)} + \color{blue}{\Big( \frac{\partial f}{\partial (s_1 \to y)} + r_1 \frac{\partial g_s}{\partial (r_1 \to s_1)} \Big)} \Bigg)\\
\frac{\partial f}{\partial (x_1 \to r_0)} = & x_1 \frac{\partial g_r}{\partial (x_1 \to r_0)} \Bigg( \color{blue}{\Big( \frac{\partial f}{\partial (s_0 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_0)} \Big)} + \color{blue}{\Big( \frac{\partial f}{\partial (s_1 \to y)} + r_0 \frac{\partial g_s}{\partial (r_0 \to s_1)} \Big)} \Bigg)\\
\frac{\partial f}{\partial (x_1 \to r_1)} = & x_1 \frac{\partial g_r}{\partial (x_1 \to r_1)} \Bigg( \color{blue}{\Big( \frac{\partial f}{\partial (s_0 \to y)} + r_1 \frac{\partial g_s}{\partial (r_1 \to s_0)} \Big)} + \color{blue}{\Big( \frac{\partial f}{\partial (s_1 \to y)} + r_1 \frac{\partial g_s}{\partial (r_1 \to s_1)} \Big)} \Bigg)\\
\end{align}
$$