## Model

### Assumptions
* Neuron values are either 0 or 1 (binary). 
* Weights are real values, which are maybe bounded. 
* Fixed bias to fix ideas (subject to change).

### Model overview
Let $H(x)$ be the binary step function 
$$
H(x):=\begin{cases}1 \quad x>0\\0 \quad x\leq 0\end{cases}.
$$

The *forward pass* of one layer reads 
$$
L_i(x,w) := H\left( \sum_{j}  w[i,j]\cdot x[j] - B \right),
$$
where $x$ are the inputs, or neuron values, and $w$ are the weights, that is the synaptic values. We denote by $B$ the bias, it is a positive constant.

From inputs to final outputs with $n$ layers, the forward pass looks like 
$$
x_0 \xrightarrow{w_0} x_1 \xrightarrow{w_1} x_2 \rightarrow \dots \rightarrow x_{n-1} \xrightarrow{w_{n-1}} x_n,
$$
with 
$$
x_i = L(x_{i-1}, w_{i-1}), \quad x_0 = \text{given inputs}.
$$
Again, $x$ has only entries in $\{0, 1\}$, and $w$ has floats as entries (bounded?).  

The *backward pass* is given by
$$
gx_n \xrightarrow{gw_{n-1}} gx_{n-1} \xrightarrow{gw_{n-2}} gx_{n-2} \rightarrow \dots \rightarrow gx_{1} \xrightarrow{gw_{0}} gx_0,
$$
where 
\begin{split}
gx_i &= G(gx_{i+1}, x_{i+1}, x_{i}, w_{i})\\
gw_i &= W(gx_{i+1}, x_{i+1}, x_i) \\
gx_n &= Eq(\text{desired outputs}, x_n).
\end{split}
We will define $G$, $W$, and $V$ in the following. 

As for the forward pass, $gx$ has entries in $\{0, 1\}$, and $gw$ has floats as entries. Moreover, the shape of $x_i$ equals the shape of $gx_i$, and the shape of $w_i$ equals the shape of $gw_i$.

### Backward pass formulas

#### Eq
For $x, y \in \{0, 1\}$, we set
$$
Eq(x, y) = \begin{cases} 0 \quad  &x=y, \\ 1 \quad &x\neq y. \end{cases} 
$$
This is generalized to $x$ and $y$ of the same shape by applying entry by entry. The signature is
```
Eq: (bool, shape) x (bool, shape) -> (bool, shape)  
```

#### W
Let $n$ be the shape of $x$ and $gx$, and let $m$ be the shape of $x'$. We define $W(gx, x, x')$, which is of shape $n\times m$, by 
$$
W(gx, x, x') := \left( gx[i] \cdot P(x[i], x'[j]) \right)_{i,j},
$$
where 
$$
P(0, 1) = 1, \quad P(1, 1) = -1, \quad P(0, 0) = Q, \quad P(1, 0) = -Q,  
$$
with a positive constant $Q<1$.

The signature is
```
W: (bool, n) x (bool, n) x (bool, m) -> ({-1, 0, 1}, n x m).  
```

#### G
Let $n$ be the shape of $x$ and $gx$, and $m$ the shape of $x'$, then $w'$ must have shape $n\times m$. We define 
$$
G(gx, x, x', w') = \left(H\left( \sum_{i} gx[i]\cdot Seq(x[i], x'[j])  \cdot w[i,j] - B\right) \right)_{j},
$$
where 
$$
Seq(x,y) = \begin{cases} 1 \quad  &x=y, \\ -1 \quad &x\neq y. \end{cases}
$$
The signature is 
```
G: (bool, n) x (bool, n) x (bool, m) x (float, n x m) -> (bool, m).
```

### Examples

All examples start with random initialisation of weights in the range $(0, 1)$.

#### Copy one neuron

Architecture: `input_neuron -> output_neuron` 

Desired behavior: `0->0, 1->1`

Explicitly, for forward $y=H(x\cdot w-B)$. If $x=0$ then $y=0$, which is desired behavior. Suppose $x=1$ and $y=0$. Desired value is $y'=1$. For backward: the first gradient is $gy= Eq(y', y)=1$. The weight gradient reads $gw = W(gy, y, x)= 1$, pointing to increasing $w$, which is correct behavior. 

Once correct behavior is established, the weight gradients are zero. 

#### Copy one neuron with a neuron in between

Architecture: `input_neuron -> other_neuron -> output_neuron` 

Desired behavior: `0->0, 1->1`

The backpropagation of 
$$
x_\text{in} \xrightarrow{w_\text{in}} x_\text{other} \xrightarrow{w_\text{other}} x_{\text{out}}
$$
is computed as follows
$$
gx_\text{out} \xrightarrow{gw_\text{other}} gx_\text{other} \xrightarrow{gw_\text{in}} gx_\text{in}$$
Independent of the $x_{\text{other}}$ values, $gw_\text{other}$ will be positive if the desired value for $x_\text{out}$ is not reached. Once $w_\text{other}$ is large enough, $gx_\text{other}$ will become positive if $x_\text{out}=0$ is not desired and $x_{\text{other}}=0$. This will push $gw_\text{in}$ to positive, fixing $w_\text{in}$. The model will solve the task.  



#### Always disagree with one neuron

Architecture: `input_neuron -> output_neuron` 

Desired behavior: `1->0`. Note that `0->0` is automatic. 

Explicitly, for forward $y=H(x\cdot w-B)$. If $x=1$ and $y=1$, then the weight gradient is
$
gw = -1
$
pushing the weights in the negative direction, which is correct behavior. 

#### Switch positions

Architecture: `input_neuron_0, input_neuron_1  -> output_neuron_0, output_neuron_1` 

Desired behavior: `I:(1,0)->(0,1), II:(0,1)->(1,0), III:(1,1)->(1,1)`

Explicitly, for forward $y_i=H(x_0\cdot w_{i,0} + x_1\cdot w_{i,1} - B)$. Let's consider the weights one by one:
* $w_{0,0}$: since `I` and `III` send contradictory impulses at first, no changes.
* $w_{0,1}$: increasing, so it will fix `II` and have a strong angle on `III`. This will allow $w_{0,0}$ to focus on `I` and fix it. 
* $w_{1,0}$: Same as $w_{0,1}$.
* $w_{1,1}$: Same as $w_{0,0}$.