## Tutorial for utilizing pCCA-FA for neural population activity

In [None]:
import sys

# add path to where dependencies are stored (FA and CCA)
# ensure codes are in folders named "fa" and "cca", respectively
sys.path.append('../')

# Imports and params
import numpy as np
import pcca_fa_mdl as pf
import sim_pcca_fa as spf
from timeit import default_timer as timer
import matplotlib.pyplot as plt

# random seed for reproducibility
rand_seed = 10

# plot colors
color_map = {
    'across':np.array([255,76,178])/255, # pink
    'within1':np.array([111,192,255])/255, # light blue - right hemisphere
    'within2':np.array([0,87,154])/255, # dark blue - left hemisphere
}

First, we need data that will be fed into the pCCA-FA model. In the following cell, we will simulate data according to the generative model specifed by pCCA-FA to use. But, any two spike count matrices with size (number_of_observations x number_of_neurons) can be used in place of this.

In [None]:
# parameters for simulating data
xDim,yDim = 30,30 # this parameter indicates the number of neurons in area X and area Y
zDim,zxDim,zyDim = 3,2,1 # this parameter indicates the number of latent variables for across-area, within-area X, and within-area Y
n_trials = 5000 # number of trials or observations

# simulate data according to generative model pCCA-FA
pf_simulator = spf.sim_pcca_fa(xDim,yDim,zDim,zxDim,zyDim,rand_seed=rand_seed)
X,Y = pf_simulator.sim_data(n_trials,rand_seed=rand_seed)
sim_params = pf_simulator.get_params()

Now, the variables X and Y represent our two areas' spike count matrices. Each has dimensionality (number_of_observations x number_of_neurons). The next step is to fit the model. We need to decide which latent dimensionalities to test. This will correspond to the dimensionality of latent variables tested during model cross-validation.

In [None]:
# select dimensionalities to test, needs to be a list of integers
zDim_list = np.arange(0,6)

Now we get to train the model! First, we initialize it, then fit it by cross-validating the latent dimensionalities for within- and across-area dimensions. Note, this takes a while...

In [None]:
# train model
model = pf.pcca_fa()
start = timer()
LL_curves = model.crossvalidate(X,Y,rand_seed=rand_seed,verbose=True,zDim_list=zDim_list,zxDim_list=zDim_list,zyDim_list=zDim_list,parallelize=True,n_folds=5)
end = timer()
cv_z,cv_zx,cv_zy = LL_curves['zDim'],LL_curves['zxDim'],LL_curves['zyDim']
print(f'{end-start:.2f} seconds elapsed...')
print(f'Identified dimensionalities - across-area: {cv_z:d}, within-area X: {cv_zx:d}, within-area Y: {cv_zy:d}')

In place of cross-validating (or after we have completed a round of cross-validation), we can save time by simply fitting a model of given dimensionality. We can do this below:

In [None]:
model = pf.pcca_fa()
start = timer()
model.train(X,Y,zDim,zxDim,zyDim)
end = timer()
print(f'{end-start:.2f} seconds elapsed...')

Now, we have a model fit. What does it all mean? model_params is a dictionary where each key is a parameter name of the model, and each value is the fit value of that parameter. <br>
<b>mu</b>: mean firing rate for each neuron <br>
<b>L_total</b>: combined loadings for across- and within-area latent variables <br>
<b>W</b>: loadings for across-area variance for each neuron and each across-area latent variable <br>
<b>L</b>: loadings for within-area variance for each neuron and each within-area latent variable <br>
<b>psi</b>: independent variance of each neuron <br>
<b>zDim</b>: dimensionality for across-area <br>
<b>zxDim</b>: dimensionality for within-area x <br>
<b>zyDim</b>: dimensionality for within-area y <br>

In [None]:
model_params = model.get_params()
print(model_params.keys())

We can compute data metrics using the model, such as the shared variance explained by the model.

In [None]:
# compute metrics - from ground truth parameters
sim_model = pf.pcca_fa()
sim_model.set_params(sim_params)
true_psv = sim_model.compute_metrics(cutoff_thresh=0.95)['psv']

# compute cross-validated percent of shared variance using bootstrapping
train_psv,test_psv = model.compute_cv_psv(X,Y,zDim,zxDim,zyDim,n_boots=50,rand_seed=rand_seed,return_each=True,test_size=0.1,verbose=True)

In [None]:
# plot the results
pad = 3
fig,ax = plt.subplots(2,2, tight_layout=True, sharex=True)

ax[0,0].errorbar(1,np.mean(train_psv['psv_x']),yerr=np.std(train_psv['psv_x']),fmt='o',label='training',color=color_map['across'])
ax[0,0].errorbar(2,np.mean(test_psv['psv_x']),yerr=np.std(test_psv['psv_x']),fmt='o',label='heldout',color=color_map['across'])
ax[0,0].set_xlim([0.5,2.5])
ax[0,0].set_ylim([true_psv['psv_x']-pad,true_psv['psv_x']+pad])
ax[0,0].plot(ax[0,0].get_xlim(),np.ones(2)*true_psv['psv_x'],'k--',label='true')
ax[0,0].set_ylabel('across-area %sv', color=color_map['across'])
ax[0,0].set_xticks([])
ax[0,0].set_title('area 1')

ax[0,1].errorbar(1,np.mean(train_psv['psv_priv_x']),yerr=np.std(train_psv['psv_priv_x']),fmt='o',label='training',color=color_map['within1'])
ax[0,1].errorbar(2,np.mean(test_psv['psv_priv_x']),yerr=np.std(test_psv['psv_priv_x']),fmt='o',label='heldout',color=color_map['within1'])
ax[0,1].set_xlim([0.5,2.5])
ax[0,1].set_ylim([true_psv['psv_priv_x']-pad,true_psv['psv_priv_x']+pad])
ax[0,1].plot(ax[0,1].get_xlim(),np.ones(2)*true_psv['psv_priv_x'],'k--',label='true')
ax[0,1].set_ylabel('within-area %sv', color=color_map['within1'])
ax[0,1].set_xticks([])
ax[0,1].set_title('area 1')

ax[1,0].errorbar(1,np.mean(train_psv['psv_y']),yerr=np.std(train_psv['psv_y']),fmt='o',label='training',color=color_map['across'])
ax[1,0].errorbar(2,np.mean(test_psv['psv_y']),yerr=np.std(test_psv['psv_y']),fmt='o',label='heldout',color=color_map['across'])
ax[1,0].set_xlim([0.5,2.5])
ax[1,0].set_ylim([true_psv['psv_y']-pad,true_psv['psv_y']+pad])
ax[1,0].plot(ax[1,0].get_xlim(),np.ones(2)*true_psv['psv_y'],'k--',label='true')
ax[1,0].set_ylabel('across-area %sv', color=color_map['across'])
ax[1,0].set_xticks([1,2])
ax[1,0].set_xticklabels(['training','heldout'])
ax[1,0].set_title('area 2')

ax[1,1].errorbar(1,np.mean(train_psv['psv_priv_y']),yerr=np.std(train_psv['psv_priv_y']),fmt='o',label='training',color=color_map['within2'])
ax[1,1].errorbar(2,np.mean(test_psv['psv_priv_y']),yerr=np.std(test_psv['psv_priv_y']),fmt='o',label='heldout',color=color_map['within2'])
ax[1,1].set_xlim([0.5,2.5])
ax[1,1].set_ylim([true_psv['psv_priv_y']-pad,true_psv['psv_priv_y']+pad])
ax[1,1].plot(ax[1,1].get_xlim(),np.ones(2)*true_psv['psv_priv_y'],'k--',label='true')
ax[1,1].set_ylabel('within-area %sv', color=color_map['within2'])
ax[1,1].set_xticks([1,2])
ax[1,1].set_xticklabels(['training','heldout'])
ax[1,1].set_title('area 2')

ax[0,0].legend()

plt.show(block=True)

The pCCA-FA parameters also yield the canonical correlations, as would be identified by applying traditional CCA to the neural activity. Using the model we trained, we can obtain the canonical directions and canonical correlations as defined in CCA.

In [None]:
(canonical_dirs_x, canonical_dirs_y), rho = model.get_canonical_directions()

xdata = np.arange(zDim)+1
fig,ax = plt.subplots()
ax.plot(xdata, rho, marker='o', color='gray')
ax.set_ylim(0,1)
ax.set_ylabel(r'canonical correlation ($\rho$)')
ax.set_xticks(xdata)
ax.set_xlabel('canonical pair number')
plt.show()