## Model of how translation invariance and weight covariance relate ##

I am interested in how for a set of overlapping receptive subunit receptive fields with their own degrees of TI what is the TI of the output of their sum. 


### Two fully overlapped subunit receptive fields ###
We will start with the simplest possible non-trivial case: two completely overlapping subunit receptive fields.

The variables we will be dealing with are:

$\vec{r}_{c, p}$ which is the vector (n stimuli) of responses from the ith subunit at the pth position. Here we will assume it is unit length.

$\vec{R}_{p}$ which is the sum over subunits.

$W \in [-1, 1]$ weight covariance the correlation between subunits responses across all stimuli.

$TI_i \in [-1, 1]$ a subunits TI here we for only two positions i.e. the population (all stimuli) correlation between two positions.

so:

$TI_1 = \vec{r}_{1, 1} \vec{r}_{1,2}$

$TI_2 = \vec{r}_{2, 1} \vec{r}_{2,2}$

Then the response is just the sum of the two so:

$\vec{R}_{1} = \vec{r}_{c=1, p=1} + \vec{r}_{c=2, p=1}$

$\vec{R}_{2} = \vec{r}_{c=1, p=2} + \vec{r}_{c=2, p=2}$

Then:

$num(TI_R) =  (\vec{R}_{1} \vec{R}_{2}) = (\vec{r}_{c=1, p=2} + \vec{r}_{c=2, p=2}) (\vec{r}_{c=1, p=1} + \vec{r}_{c=2, p=1})$

since:

$(a+b)(c+d) = ac + ad + bc + bd$

$num(TI_R) = \vec{r}_{1,1}\vec{r}_{1,2} + \vec{r}_{1,1} \vec{r}_{2,2} + \vec{r}_{2,1} \vec{r}_{1,2} +    \vec{r}_{c=2, 1}\vec{r}_{2,2}$

$= TI_1 + TI_2 + $ the correlation of different subinits responses to stimuli at different positions to the stimuli.


lets call this cross-channel-position product $cp$, it is the only real unknown we will need to make some assumptions to estimate here. Clearly it will be a function of  $(W, TI_1, TI_2)$. 

$\vec{r}_{c=1, p=1} \vec{r}_{c=2, p=2}$

We can think of the act of translating the stimuli, or changing spatial channels, as first multiplying the vector by the sign of the correlation then perturbing the response vector with sphericial gaussian noise. So begining with a fixed random vector, which is then perturbed to on average have $W$ correlation with the original fixed vector, say at $p=1$. Then one, say ${\vec{r}_{c=2, p=1}}$ is perturbed by spherical random noise so that on average it has a correlation of $TI_2$ with the original and this is $\vec{r}_{c=2, p=2}$. 

Assuming these perturbations are independent we must then determine $\sigma^2_{TI}$ and $\sigma^2_{W}$ which produced TI and W on average. That is we need to determine the form of the function 

$E[\hat{R}(\vec{r}, sgn(W) \vec{r} + N(0, \sigma^2))]$. The random variable inside the expectation is the signed square root of the non-central beta distribution $\hat{R}^2 \sim Beta(\frac{1}{2}, \frac{n}{2}, \frac{|\vec{r}|}{\sigma^2} )$ so we take the square root and because of symmettry we can just multiply by the sign. 

Its expectation is a complex function but is also a function of these three values and its absolute value is  stochastically decreasing with $\sigma^2$. Lets call this function :

$E[\hat{R}(\vec{r}, \vec{r} + N(0, \sigma^2))] = \nu{(\sigma^2)}$. 

So if we want the amount of noise assosciated with a given W:

$\sigma_W^2 = \nu^{-1}(W)$

$\sigma_{TI}^2 = \nu^{-1}(TI)$

Thus the expected value of the correlation after switching positions and feature channels is:

$E[\vec{r}_{c=1, p=1} \vec{r}_{c=2, p=2}] = \nu{(\sigma_W^2  +\sigma_{TI}^2 )}$

$$ num(TI_R) = TI_1 + TI_2 + \nu{(\sigma_W^2  +\sigma_{TI_1}^2 )} +\nu{(\sigma_W^2  +\sigma_{TI_2}^2 )}$$

$den(TI_R) = |\vec{r}_{c=1, p=1} + \vec{r}_{c=2, p=1}| |\vec{r}_{c=1, p=2} + \vec{r}_{c=2, p=2}| $

$|\vec{r}_{c=1, p=1} + \vec{r}_{c=2, p=1}| = 2 + 2\vec{r}_{c=1, p=1}\vec{r}_{c=2, p=1} = 2 + 2W $

$|\vec{r}_{c=1, p=2} + \vec{r}_{c=2, p=2}| = 2 + 2\vec{r}_{c=1, p=2}\vec{r}_{c=2, p=2} = 2 + 2W$

Putting it all together approximately:

$$E[\hat{TI}] = \frac{TI_1 + TI_2 + \nu{(\sigma_W^2  +\sigma_{TI_1}^2 )} +\nu{(\sigma_W^2  +\sigma_{TI_2}^2 )}}{|2 + 2W|}$$

We see 2/3 of TI is the TI in previous layers and the remaining TI is related to the weight covariance.

Now lets try simulating this:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

c_len = 5000
filter_width = 11 # make this odd so there is center to cov matrix

#need to generate covariance matrix with some fall off
#we will do this by convolving diagonal matrix

eye = np.eye(filter_width)
kernel = (filter_width/2. - np.abs(np.linspace(-filter_width/2,filter_width/2, filter_width))
         )/(filter_width/2.)
plt.figure()
plt.stem(kernel)
eye_c = np.array([np.convolve(kernel, a_cov_slice, mode='same') 
         for a_cov_slice in eye])

plt.figure()
plt.imshow(eye_c);plt.colorbar();

#need to generate two filters with some prescribed auto-correlation (non mean subtracted).
x = np.random.multivariate_normal(np.zeros(filter_width), eye_c, size=(c_len))
print(x.shape)

plt.imshow(np.corrcoef(x.T))


#need to generate a set of stimuli at two positions and record responses from filter.
#need to calculate TI 



### Scratch###
Things we want to take into account:

Size of stimuli.
Positions tested.
RF overlap.
TI of subunits.
Preference for stimuli. 

Where variability in relationship comes from: non-preferred stimuli, .

You are taking a weighted sum of  vectors drifting at different rates proportional to translation invariance

Its the distribution of a set of random walks with different distances moved at each step and from different starting points. Is this a radial distribution in the limit?

Max pooling gives you an average jump in TI because... there was the lecunn pooling paper. Its a sparsity prior, your saying that the max is the most likely to be the true signal as opposed to thinking that the signal is spread through out.