# EE 304 - Neuromorphics: Brains in Silicon


##  Synaptic Weights: Deterministic versus Probabilistic

#### The story thus far

We reviewed how much the four different architectures increase energy consumption and reduce peak throughput when they save area by sharing hardware elements among neural elements. 

We found that Shared Dendrite was the only architecture that could match the Fully Dedicated's AET cost. It amortized its energy cost and time cost by delivering synpatic input to $O(\surd N)$ neurons neighboring the one targeted and using $O(\surd N)$-size RAM banks to store the weights.

####  Outline for this lecture

<img src="files/lecture13/NEF+ClassicNets.pdf" width="840">

We will study how to implement synaptic weights in architectures that use shared synapses (i.e. Shared Synapse and Shared Dendrite).

We will find that a probabilistic approach is compatible with shared synapses. 

However, this compatibility comes at the cost of trading a linear scaling of SNR with spike-rate for a square-root scaling.

## Deterministic Approach

<img src="files/lecture13/DeterministicWeights.pdf" width="1320">

Consider a group of neurons (indexed by $j$) connected to another group of neurons (indexed by $i$) with a weight matrix ${\bf W}$ (indexed by $ij$). Given that neuron $j$ fires its $k^{\rm th}$ spike at $t^j_k$, the input current neuron $i$ receives at any time $t$ is expected to be  

\begin{eqnarray*}
    E[I_i(t)] 
           & = & {\rm E}\!\!\left[\sum_j w_{ij}\left(h(t)\!*\!\!\sum_k \delta(t-t_k^j)\right)\right]\\
           & = & \sum_j w_{ij}{\rm E}\!\!\left[\left(\sum_k h(t-t_k^j)\right)\right]\\
           & = & \sum_j w_{ij}\lambda_j(t)
\end{eqnarray*}

when the spikes are filtered by decaying-exponential synapses with time-constant $\tau$ (i.e., $h(t)=u(t)e^{-t/\tau}/\tau$). Here, $\lambda_j(t)$ is neuron $j$'s mean spike-rate over an interval of time whose length accounts for most of $h(t)$'s area. This result follows because $h(t)$ has unit area. These unit-areas sum together to produce a total area equal to the number of spikes observed over the said interval. Hence, the total area divided by the elapsed time--the average value--will be equal to the mean spike-rate. For decaying exponential synapses, an interval of $3\tau$ captures 95% of the $h(t)$'s area.  

% Add two pictures: First one will explain this using box-shaped impulse responses. Second one will extend this to exponentially decaying impulse responses

### Variance of Input Current  

<img src="files/lecture13/SumOfUniformSpikeTrains.png" width="1400">

Assuming neuron $j$'s spike-train is described by a periodic point-process, the variance in neuron $i$'s input current is
\begin{eqnarray*}
    {\rm Var}[I_i(t)] 
    & = & {\rm Var}\!\!\left[\sum_j w_{ij}\left(h(t)\!*\!\!\sum_k \delta(t-t_k^j)\right)\right]\\
    & = & \sum_j w_{ij}^2{\rm Var}\!\!\left[\left(\sum_k h(t-t_k^j)\right)\right]\\
    & = & \sum_j w_{ij}^2\!\!\left\{\lambda_j^2\!\left(\!\frac{1}{2\tau \lambda_j}\coth\!\left(\!\frac{1}{2\tau\lambda_j}\!\right)-1\!\right)\right\}
\end{eqnarray*}
where $\lambda_j(t)$'s time-dependence is suppressed for brevity. For details, see <a href="http://nbviewer.ipython.org/github/fragapanagos/notebooks/blob/master/synapse/synapse_SNR_periodic.ipynb">Sam's Periodic Spikes Notebook</a>. The expression in squiggly brackets is plotted above (middle panel; $j$ index is dropped). Note that it increases linearly when $\tau\lambda<0.1$ and asymtotes to $1/12\tau^2$ ($\approx0.0833/\tau^2$) when $\tau\lambda>1$.

% Mention rule for how variances sum and stress that variables must be independent

### Asymptotic Approximations

In the two asymptotic cases, where neuron $j$ fires much more or much less than one spike in $\tau$ seconds, the variance's expression can be simplified:

$$
{\rm Var}[I_i(t)] \approx 
\left\{ 
  \begin{matrix}
    \frac{1}{12\tau^2}\!\sum_j w_{ij}^2 & & \tau\lambda_j \gg 1\\
    \frac{1}{2\tau}\!\sum_j \lambda_j w_{ij}^2 & & \tau \lambda_j \ll 1
  \end{matrix} \right.
$$

% To show first result, write the Coth as exponentials, multiply numerator and demoninator by exp(-1/2\tau\lambda), and taylor-series expand the numerator and demoninator up to second and third-order, respectively. 

% Point out that second result is what is expected for Poisson and link to Sam's Poisson notebook: http://nbviewer.ipython.org/github/fragapanagos/notebooks/blob/master/synapse/synapse_SNR_poisson.ipynb

To gain some intuition for what is going on, assume all the weights are equal ($w_{ij}=w$) and compute the signal-to-noise ratio (SNR), defined as the mean divided by the standard deviation. In both cases, express the total spike-rate as the product of the population's average spike-rate, $\lambda$, and its neuron count, $N$: 

- For the **high spike-rate** regime 
$$
    \frac{{\rm E}[I_i(t)]}{\sqrt{{\rm Var}[I_i(t)]}} 
    \approx \frac{w\!\sum_j\!\lambda_j}{\sqrt{N w^2/(12\tau^2)}} 
    = 2\tau\sqrt\frac{3}{N}\!\sum_{j=1}^{N}\!\lambda_j
    = 2\tau\lambda\sqrt{3N}
$$
    - Increases linearly with $\lambda$, the average spike-rate
    - Increases as the square-root of $N$, the neuron count 

- For the **low spike-rate** regime 
$$
    \frac{{\rm E}[I_i(t)]}{\sqrt{{\rm Var}[I_i(t)]}} 
    \approx \frac{w\!\sum_j\!\lambda_j}{\sqrt{w^2\!\sum_j\!\lambda_j/(2\tau)}} 
    = \sqrt{2\tau}\surd\!\sum_{j=1}^{N}\!\lambda_j
    = \sqrt{2\tau\lambda N}
$$
    - Increases as the square-root of $\lambda$, the average spike-rate
    - Increases as the square-root of $N$, the neuron count
        - Doubling $\lambda$ while keeping $N$ constant is equivalent to doubling $N$ while keeping $\lambda$ constant

Its linear--versus square-root--dependence on $\lambda$ makes the high-rate regime's SNR $\sqrt{6\tau\lambda}$ times higher. 

% Add plots to show develop intuition for the two cases--family of SNR vs \lambda plots for different values of N---and work through some design examples 

## Probabilistic Approach

<img src="files/lecture13/ProbabilisticWeights.pdf" width="1320">

An alternative approach to implementing synaptic weights is to deliver neuron $j$'s spike to neuron $i$ with probability equal to $w_{ij}$. Spikes that get through are weighted equally, making this scheme compatible with hardware that uses shared-synapse circuits. Let  $I_{ij}^k \in \{0,1\}$ indicate that neuron $j$'s $k^{\rm th}$ spike made it through to neuron $i$. In which case, the current neuron $i$ receives at any time $t$ is expected to be  

\begin{eqnarray*}
    E[I_i(t)] 
           & = & {\rm E}\!\!\left[\sum_j \left(h(t)\!*\!\!\sum_k \!I_{ij}^k\delta(t-t_k^j)\right)\right]\\
           & = & \sum_j {\rm E}\!\!\left[\left(\sum_k \!I_{ij}^kh(t-t_k^j)\right)\right]\\
           & = & \sum_j {\rm E}[I_{ij}^k]\lambda_j(t)
\end{eqnarray*}

Therefore, neuron $j$'s spike-rate is weighted correctly if $I_{ij}^k$ takes on the value 1 with probability $p = w_{ij}$ and value 0 with probability $p = 1-w_{ij}$. Mathematicians have a name for this process: **p-thinning**. Note that, for this to work, all the weights must be smaller than 1. If this is not the case, they have to be rescaled  appropriately and the scale factor accounted for in the sum above.   

% Add picture with original spike train, values of indicator variable, and thinned spike-train, all lined up vertically.  

### Variance of Input Current

<img src="files/lecture13/p-thinned-uniform_09.png" width="1000">
<img src="files/lecture13/p-thinned-uniform_05.png" width="1000">
<img src="files/lecture13/p-thinned-uniform_01.png" width="1000">

Assuming neuron $j$'s spike-train, with periodic rate $\lambda_j$, is $w_{ij}$-thinned, the variance in neuron $i$'s input current is
\begin{eqnarray*}
    {\rm Var}[I_i(t)] 
    & = & {\rm Var}\!\!\left[\sum_j \left(h(t)\!*\!\!\sum_k I_{ij}^k\delta(t-t_k^j)\right)\right]\\
    & = & \sum_j {\rm Var}\!\!\left[\left(\sum_k I_{ij}^k h(t-t_k^j)\right)\right]\\
    & = & \sum_j w_{ij}^2\lambda_j^2\!\left(\!\frac{1}{2\tau\lambda_j}\coth\!\left(\!\frac{1}{2\tau\lambda_j}\!\right)-1\!\right)
          + w_{ij}(1-w_{ij})\!\left(\!\frac{\lambda_j}{2\tau}\!\right)
\end{eqnarray*}
where $\lambda_j(t)$'s time-dependence is suppressed. For details, see <a href="http://nbviewer.ipython.org/github/fragapanagos/notebooks/blob/master/synapse/synapse_SNR_periodic_pthinned.ipynb">Sam's p-Thinned Periodic Spikes Notebook</a>. One component of the sum is plotted above (middle column; $w_{ij}$ is replaced by $p=0.9, 0.5, \&\; 0.1$). 


In each of the sum's components, which correspond to the contribution of a single spike-train, the first term is identical to the expression we obtained previously. Thus, the second term accounts entirely for the variance added by $p$-thinning. This term tends to zero when thinning is light (i.e., $w_{ij}$ approaches 1) and tends to the variance of a Poission point-process, with rate $w_{ij}\lambda_j$, when thinning is heavy (i.e., $w_{ij}$ approaches 0). These observations explain the trends in the plots above: $p$-thinned Periodic approaches Periodic when $p$ is high and approaches Poisson when $p$ is low (the figure replaces Periodic with Uniform).   

### Asymptotic Approximations

Replacing the first term with the asymptotic approximations we obtained in the periodic case yields:

$$
{\rm Var}[I_i(t)] \approx 
\left\{ 
  \begin{matrix}
    \sum_j w_{ij}^2\left(\!\frac{1}{12\tau^2}\!\right) + w_{ij}(1-w_{ij})\!\left(\!\frac{\lambda_j}{2\tau}\!\right) & & \tau\lambda_j \gg 1\\
    \sum_j w_{ij}^2\left(\!\frac{\lambda_j}{2\tau}\!\right) + w_{ij}(1-w_{ij})\!\left(\!\frac{\lambda_j}{2\tau}\!\right) & & \tau \lambda_j \ll 1
  \end{matrix} \right.
$$

Observing that the first result's first term is negligible (because $1/\tau \ll \lambda_j$) and simplifying the second result yields

$$
{\rm Var}[I_i(t)] \approx 
\left\{ 
  \begin{matrix}
     \frac{1}{2\tau}\!\sum_j (1-w_{ij})w_{ij}\lambda_j & & \tau\lambda_j \gg 1\\
    \frac{1}{2\tau}\!\sum_j w_{ij}\lambda_j & & \tau \lambda_j \ll 1
  \end{matrix} \right.
$$

In these cases, assuming equal weights, we find that: 

- For the **high spike-rate** regime 
$$
    \frac{{\rm E}[I_i(t)]}{\sqrt{{\rm Var}[I_i(t)]}} 
    \approx \frac{\!\sum_j\!w_{ij}\lambda_j}{\sqrt{\frac{1}{2\tau}\!\sum_j (1-w_{ij})w_{ij}\lambda_j}} 
    = \sqrt{2\tau}\sqrt\frac{w}{1-w}\surd\!\sum_{j=1}^{N}\!\lambda_j
    = \sqrt\frac{2\tau w\lambda N}{1-w}
$$
  
- For the **low spike-rate** regime 
$$
    \frac{{\rm E}[I_i(t)]}{\sqrt{{\rm Var}[I_i(t)]}} 
    \approx \frac{ \! \sum_j \! w_{ij}\lambda_j}{\sqrt{\frac{1}{2\tau}\!\sum_j w_{ij}\lambda_j}} 
    = \sqrt{2\tau}\surd\!\sum_{j=1}^N w_{ij}\lambda_j
    = \sqrt{2\tau w\lambda N}
$$
 
Thus, except for differing by a factor of $\sqrt{1-w}$, these regimes have similar behavior: SNR increases as the square-root of the **total** spike-rate **after** thinning. 

We conclude that, the Poisson assumption is a good one even in the high-rate regime, since the weights tend to be small. For $w<0.1$, SNR is less than 5.4% greater than $\sqrt{2\tau w\lambda N}$. 

### Implementation

<img src="files/lecture13/ImplementProbabilisticWeight.pdf" width="720">

Probabilitic weights are implemented by adding a thinning circuit to the pathway from neuron $j$ to neuron $i$. This pathway consists of:

- <b>Transmitter</b>: Encodes neuron $j$'s spike as a unique address 
- <b>RAM</b>: Uses this address to retrieve neuron $i$'s address and $w_{ij}$, the weight.  
- <b>Thinner</b>: Passes neuron $i$'s address on only if $X<w_{ij}$
- <b>Receiver</b>: Decodes neuron $i$'s address and feeds a spike to that location

The uniformly-distributed random number ($0<X<1$) is not actually completely random. It is generated by a linear-feedback shift register (LFSR). Using a $b$-bit shift-register, this digital circuit generates a sequence that includes all $2^b$ $b$-bit numbers except zero. Then it repeats this exact sequence forever. Therefore, the random numbers are said to be **pseudorandom**. We find that this works well enough for our purposes. Our weights are 8-bit and we use a ??-bit shift-register. 

### Scaling

| Rate/neuron | Naive               | Condition                | Determin.                     |
|-------------|---------------------|--------------------------|-------------------------------|
| Look-ups    | ${r^2\over2\tau w}$ | ${\log(N)r^2\over2\tau}$ | ${r\over2\tau}\surd{N\over3}$ |
| Traffic     | $r^2\over2\tau$     | $r^2\over2\tau$          | ${r\over2\tau}\surd{N\over3}$ |

The last three operations above (RAM, Thinner, Transmitter) are repeated for $i = 1 \ldots N$. 

- For, every spike
    - $N$ look-ups are made
    - $w N$ addresses are transmitted 
- Thus, for every neuron
    - look-up rate is $\lambda N$ 
    - address-transmission rate is $w\lambda N$ 
- For SNR of $r$ at each neuron's input
    - $w\lambda N = r^2/2\tau$
    - Implies that $w \propto 1/N$
- With this spec
    - look-up rate is $r^2/2\tau w$ 
    - address-transmission rate is $r^2/2\tau$ 
    
Look-ups may be reduced dramatically by using **conditional probabilies**

- look-ups/spike drop from $N$ to $w N\log(N)$
- look-ups/neuron drop from $\lambda N$ to $w\lambda N\log(N)=r^2\log(N)/2\tau$
- Thus, $\log(N)$ replaces $1/w \propto N$