# Are we just in the perturbative regime?

In the 2009 paper [Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can't](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000380), Roudi et al. suggest that when we are in the perturbative regime, characterised by a small mean probability of observing neurons spike and small number of neurons $N$, the pairwise maxent model can appear to be a good model for a distribution. However, we cannot extrapolate the behaviour of the pairwise model to larger $N$, and predict that it will remain a good fit outside of the perturbative regime. We try and investigate these claims computationally.

The perturbative regime is defined as $N\bar{v}\delta t \ll 1$, where $\bar{v}$ is the mean firing rate, $\delta t$ is the size of the time bin. For sufficiently small time bins where we observe at most one spike within each bin, and we can identify $\bar{v}\delta t $ with the mean probability of observing a neuron fire. Thus, for $N=5$ neurons, we should be in the perturbative regime with $\bar{p}=\bar{v}\delta t \ll 0.2$. With 5 states, it is possible to sum over all $2^5=32$ states, so we can work out quantities such as the KL divergence $D_{KL}(p\| q) = \sum_s p \ln (p/q) $ exactly. 

In [1]:
import numpy as np
from NumericIsing import Ising

## All or nothing model
We start by considering a very simple distribution that has higher order correlations. Let us say we have 5 neurons which always fire in sync. Thus $p(1,1,1,1,1) = c$, $p(0,0,0,0,0)=(1-c)$ and all other events have probability 0. The mean firing rate of individual neurons will be c, as will the correlations. We will vary the probability $c$ of all of them firing, and see whether the pairwise model appears to be a good fit. 

In [44]:
N = 5
c = 0.5
avgs = c*np.ones(N) # prob of every neuron firing in a window is 0.5
corrs = c*np.triu(np.ones((N,N)),1) # prob of 2 neurons firing in the same window is 0.2 
print(avgs,corrs, sep="\n")

[0.5 0.5 0.5 0.5 0.5]
[[0.  0.5 0.5 0.5 0.5]
 [0.  0.  0.5 0.5 0.5]
 [0.  0.  0.  0.5 0.5]
 [0.  0.  0.  0.  0.5]
 [0.  0.  0.  0.  0. ]]


In [45]:
p_wise = Ising(N, avgs, corrs, lr=0.5) 

In [58]:
p_wise.gradient_ascent() # 100 steps of gradient ascent. Repeat until accurate. 

In [59]:
print("Predicted averages:", p_wise.averages(), "Predicted correlations:", p_wise.correlations(),sep="\n")

Predicted averages:
[0.50380264 0.50383075 0.50452295 0.50528602 0.50601873]
Predicted correlations:
[[0.50380264 0.50217338 0.50154196 0.50113466 0.50072423]
 [0.         0.50383075 0.5007258  0.50031627 0.49993876]
 [0.         0.         0.50452295 0.49972255 0.49937287]
 [0.         0.         0.         0.50528602 0.49898707]
 [0.         0.         0.         0.         0.50601873]]


Now that we have trained a maximum entropy model, let us see what it thinks the true probability distribution looks like.

In [61]:
for state in [p_wise.states[0],p_wise.states[-1]]:
    print(state,np.round(p_wise.p(state),2))

[0. 0. 0. 0. 0.] 0.48
[1. 1. 1. 1. 1.] 0.5


Interestingly, the pairwise model is able to accurately predict the full probability distribution of the 'all or nothing model' for different values of $c$. I honestly wasn't sure what to expect here, and would be interested in relating this to the results from the Roudi et al. paper. We will have to consider slightly more complex distributions to 'break' the pairwise model. 

## A couple of states or nothing
The next model that came to mind that takes on two states:

- $p(0,1,1,1,1)=a$
- $p(1,1,1,1,0)=b$
- $p(0,0,0,0,0)=1-(a+b)$

We define $a+b \doteq c$

The expectation of the neurons will be: 

    (b, c, c, c, c, a)

The pairwise correlations will be:

        1 2 3 4 5
      1   b b b 0
      2     c c a    
      3       c a
      4         a

In [74]:
N = 5
a = 0.2
b = 0.4
c = a + b
avgs = np.array([b, c, c, c, a])
corrs = np.array([[0,b,b,b,0],
                  [0,0,c,c,a],
                  [0,0,0,c,a],
                  [0,0,0,0,a]])
print(avgs,corrs, sep="\n")

[0.4 0.6 0.6 0.6 0.2]
[[0.  0.4 0.4 0.4 0. ]
 [0.  0.  0.6 0.6 0.2]
 [0.  0.  0.  0.6 0.2]
 [0.  0.  0.  0.  0.2]]


In [75]:
p_wise = Ising(N, avgs, corrs, lr=0.5) 

In [100]:
p_wise.gradient_ascent() # 100 steps of gradient ascent. Repeat until accurate. 

In [101]:
print("Predicted averages:", p_wise.averages(), "Predicted correlations:", p_wise.correlations(),sep="\n")

Predicted averages:
[0.40120964 0.60166855 0.60201396 0.60243361 0.20180175]
Predicted correlations:
[[0.40120964 0.40017571 0.40004125 0.39989105 0.00277631]
 [0.         0.60166855 0.60019171 0.59988656 0.19942072]
 [0.         0.         0.60201396 0.59956424 0.19928799]
 [0.         0.         0.         0.60243361 0.19916612]
 [0.         0.         0.         0.         0.20180175]]


In [103]:
for state in p_wise.states:
    print(state,np.round(p_wise.p(state),3))

[0. 0. 0. 0. 0.] 0.392
[0. 0. 0. 0. 1.] 0.002
[0. 0. 0. 1. 0.] 0.002
[0. 0. 0. 1. 1.] 0.0
[0. 0. 1. 0. 0.] 0.001
[0. 0. 1. 0. 1.] 0.0
[0. 0. 1. 1. 0.] 0.0
[0. 0. 1. 1. 1.] 0.0
[0. 1. 0. 0. 0.] 0.0
[0. 1. 0. 0. 1.] 0.0
[0. 1. 0. 1. 0.] 0.0
[0. 1. 0. 1. 1.] 0.0
[0. 1. 1. 0. 0.] 0.0
[0. 1. 1. 0. 1.] 0.001
[0. 1. 1. 1. 0.] 0.004
[0. 1. 1. 1. 1.] 0.196
[1. 0. 0. 0. 0.] 0.001
[1. 0. 0. 0. 1.] 0.0
[1. 0. 0. 1. 0.] 0.0
[1. 0. 0. 1. 1.] 0.0
[1. 0. 1. 0. 0.] 0.0
[1. 0. 1. 0. 1.] 0.0
[1. 0. 1. 1. 0.] 0.0
[1. 0. 1. 1. 1.] 0.0
[1. 1. 0. 0. 0.] 0.0
[1. 1. 0. 0. 1.] 0.0
[1. 1. 0. 1. 0.] 0.0
[1. 1. 0. 1. 1.] 0.0
[1. 1. 1. 0. 0.] 0.001
[1. 1. 1. 0. 1.] 0.0
[1. 1. 1. 1. 0.] 0.396
[1. 1. 1. 1. 1.] 0.003


Again, the pairwise model is able to capture the probability distribution.  

## XOR 

In the 2003 Schneidman paper, *Network Information and Connected Correlations*, they say,

> if $\sigma_3$ is formed as the exclusive OR (XOR) of the variables $\sigma_1$ and $\sigma_2$, then the essential structure of $p(\sigma_1,\sigma_2,\sigma_3)$ is contained in a three–spin interaction. 

This might give us a simple example of something the ising model can't model.

Let us say that $\sigma_1$ and $\sigma_2$ firing independently with probabilities $p(\sigma_1{=}1)=a$ and $p(\sigma_2{=}1)=b$. 

        s_1 s_2 s_3  p(s_1, s_2, s_3)
        0   0   0    (1-a)(1-b)
        0   1   1    (1-a)b
        1   0   1    a(1-b)
        1   1   0    ab
Thus, the averages are:

        (a, b, b+a-2ab)

And the correlations are:

        s_1 s_2, s_1 s_3, s_2,s_3
        ab       a(1-b)   (1-a)b        

In [2]:
N = 3
a = 0.2
b = 0.4
avgs = np.array([a,b,b+a-2*a*b])
corrs = np.array([[0,a*b,a*(1-b)],
                  [0,0,(1-a)*b]])
print(avgs,corrs, sep="\n")

[0.2  0.4  0.44]
[[0.   0.08 0.12]
 [0.   0.   0.32]]


In [3]:
p_wise = Ising(3, avgs, corrs, lr=0.5) 

In [18]:
p_wise.gradient_ascent() # 100 steps of gradient ascent. Repeat until accurate. 

In [19]:
print("Predicted averages:", p_wise.averages(), "Predicted correlations:", p_wise.correlations(),sep="\n")

Predicted averages:
[0.20000128 0.40000042 0.44000421]
Predicted correlations:
[[0.20000128 0.08000851 0.11999224]
 [0.         0.40000042 0.31999632]
 [0.         0.         0.44000421]]


In [20]:
for state in p_wise.states:
    print(state,np.round(p_wise.p(state),3))

[0. 0. 0.] 0.406
[0. 0. 1.] 0.074
[0. 1. 0.] 0.074
[0. 1. 1.] 0.246
[1. 0. 0.] 0.074
[1. 0. 1.] 0.046
[1. 1. 0.] 0.006
[1. 1. 1.] 0.074


In [22]:
print([0,0,0], (1-a)*(1-b))
print([0,1,1], (1-a)*b)
print([1,0,1], a*(1-b))
print([1,1,0], a*b)

[0, 0, 0] 0.48
[0, 1, 1] 0.32000000000000006
[1, 0, 1] 0.12
[1, 1, 0] 0.08000000000000002


Notice how the events `[0,0,1]` and `[1,1,1]` are assigned non-zero probabilities, when they should in fact be zero. In general, we can see the predictions are far off. 