# Timing the difference ways of working out averages

We want to work out empirical expectations, such as correlations. Given a set of empirical observations $D$, containing $M$ samples, this involves the sum:
$$
    \langle \sigma_i \sigma_j \rangle_D = \frac{1}{M}\sum_{\boldsymbol{\sigma}\in D} \sigma_i\sigma_j y(\boldsymbol{\sigma})
$$
We can think of this as selecting all elements of the vector $\boldsymbol{y}$ where $\sigma_i$ and  $\sigma_j$ are 1, summing them up and divinding by $M$. There are a couple of different ways we can implement this, so let's try them out and time how long it takes.

In [1]:
import numpy as np 
import matplotlib.pyplot as plt
from numba import njit

In [17]:
N = 150 #no neurons
M = 20000 #number of samples
X = np.random.randint(2,size=(M,N)) #matrix (M x N) of binary random variables
Y = np.random.rand(M) #vector of size M of random numbers 

In [18]:
@njit(cache=True)
def sample_corrs(X,Y):
    """
    X is an M x N matrix of states
    Y is an M vector of values for each state
    This way relies on numba to make the for loops more efficient. It also selects the entries from Y before the summation
    """
    corrs = np.zeros((N,N))
    for i in range(N-1):
        for j in range(i+1,N):
            corrs[i,j] = np.sum( Y[ X[:,i]*X[:,j] == 1 ] ) / M
    return corrs

In [19]:
corr_numba = sample_corrs(X,Y)

In [20]:
# An alternative way is to implement this as a series of matrix products
corr_mat = np.triu(X.T.dot(np.diag(Y)).dot(X),1)/M

In [21]:
np.testing.assert_array_almost_equal(corr_numba,corr_mat)

Let's time the difference using the iPython `%timeit` command

In [29]:
%timeit sample_corrs(X,Y)
%timeit np.triu(X.T.dot(np.diag(Y)).dot(X),1)/M

3.57 s ± 361 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.44 s ± 816 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Sample averages

In [27]:
@njit(cache=True)
def sample_avgs(X,Y):
    """
    X is an M x N matrix of states
    Y is an M vector of values for each state
    This way relies on numba to make the for loops more efficient. It also selects the entries from Y before the summation
    """
    avgs = np.zeros(N)
    for i in range(N):
        avgs[i] = np.sum( Y[ X[:,i] == 1 ] ) 
    avgs /= M
    return avgs

In [30]:
avgs_numba = sample_avgs(X,Y)

In [36]:
avgs_mat = X.T.dot(Y)/M

In [37]:
np.testing.assert_array_almost_equal(avgs_numba,avgs_mat)

In [38]:
%timeit sample_avgs(X,Y)
%timeit X.T.dot(Y)/M

35 ms ± 3.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
9.75 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
