In [1]:
import sys
sys.path.insert(1, '/media/galia-lab/Data1/users/gidonl/connectome_embed/cepy')
import numpy as np
import cepy as ce
import os
import time

## Learn embeddings of The Enhanced Nathan Kline Institute Rockland Sample:


The purpose of this notebook is to create CEs of a large group of subjects from The
Enhanced Nathan Kline Institute Rockland Sample (eNKI-RS; Nooner et al., 2012). This is a large  open
lifespan dataset (n=542) that includes diffusion-weighted imaging, resting-state
fMRI, and demographics. We recommend running this notebook in a multi-thread machine
to save time.

Let's start with loading the relevant data:

In [2]:
!wget -O NKI_200_schaefer_sc_matrices.npz 'https://github.com/GidLev/cepy/blob/master/examples/NKI_200_schaefer_sc_matrices.npz?raw=true';
sc_matrices = np.load('NKI_200_schaefer_sc_matrices.npz')['matrices']
print(sc_matrices.shape)

(542, 200, 200)


Notice the shape of the structural connectivity matrices is (*n_subjects*, *n_nodes*, *n_nodes*)
Now we set the embedding algorithm hyper-parameters:

In [3]:
# node2vec initialization parameters
word2vec_params = {'sg': 0, 'min_count': 0}
parms = {'dimensions': 30, 'walk_length': 20, 'num_walks': 800,
         'workers': 8, 'p': 0.1, 'q': 1.6, 'seed': 1, 'window': 3,
         'iter': 1, 'verbosity': 0, 'permutations': 10, 
         'word2vec_kws': word2vec_params}

Loop over all subjects - initiate, fit and save the nodes embeddings (in a CE object):

In [4]:
subjects_ce = []
for subject_i in np.arange(sc_matrices.shape[0]):
    subjects_ce.append(ce.CE(**parms))
    subjects_ce[subject_i].fit(sc_matrices[subject_i,...])
    subjects_ce[subject_i].save_model('save_ces/ce_subject' + str(subject_i) + '.pkl.gz')
    print('Done with subject', subject_i, '.')

Done with subject 0 .
Done with subject 1 .
Done with subject 2 .
Done with subject 3 .
Done with subject 4 .
Done with subject 5 .
Done with subject 6 .
Done with subject 7 .
Done with subject 8 .
Done with subject 9 .
Done with subject 10 .
Done with subject 11 .
Done with subject 12 .
Done with subject 13 .
Done with subject 14 .
Done with subject 15 .
Done with subject 16 .
Done with subject 17 .
Done with subject 18 .
Done with subject 19 .
Done with subject 20 .
Done with subject 21 .
Done with subject 22 .
Done with subject 23 .
Done with subject 24 .
Done with subject 25 .
Done with subject 26 .
Done with subject 27 .
Done with subject 28 .
Done with subject 29 .
Done with subject 30 .
Done with subject 31 .
Done with subject 32 .
Done with subject 33 .
Done with subject 34 .
Done with subject 35 .
Done with subject 36 .
Done with subject 37 .
Done with subject 38 .
Done with subject 39 .
Done with subject 40 .
Done with subject 41 .
Done with subject 42 .
Done with subject 43 

Done with subject 346 .
Done with subject 347 .
Done with subject 348 .
Done with subject 349 .
Done with subject 350 .
Done with subject 351 .
Done with subject 352 .
Done with subject 353 .
Done with subject 354 .
Done with subject 355 .
Done with subject 356 .
Done with subject 357 .
Done with subject 358 .
Done with subject 359 .
Done with subject 360 .
Done with subject 361 .
Done with subject 362 .
Done with subject 363 .
Done with subject 364 .
Done with subject 365 .
Done with subject 366 .
Done with subject 367 .
Done with subject 368 .
Done with subject 369 .
Done with subject 370 .
Done with subject 371 .
Done with subject 372 .
Done with subject 373 .
Done with subject 374 .
Done with subject 375 .
Done with subject 376 .
Done with subject 377 .
Done with subject 378 .
Done with subject 379 .
Done with subject 380 .
Done with subject 381 .
Done with subject 382 .
Done with subject 383 .
Done with subject 384 .
Done with subject 385 .
Done with subject 386 .
Done with subjec

Load a consensus matrix of all training subjects, create its CE and use it as a common reference space for embeddings alignment:

In [5]:
sc_consensus_matrix = np.load('NKI_200_schaefer_sc_train_consensus_mat.npy')
print(sc_consensus_matrix.shape)

group_ce = ce.CE(**parms)
group_ce.fit(sc_consensus_matrix)
group_ce.save_model('group_ce.pkl.gz')

(200, 200)


Finally, align all CE to the same consensus space and save the output:

In [7]:
for subject_i in np.arange(sc_matrices.shape[0]):
    ce_subject_aligned = ce.align(group_ce, subjects_ce[subject_i])
    w_sbject = ce_subject_aligned.weights.get_w_mean(norm = True)
    if subject_i == 0:
        w_subjects_aligned = np.zeros((sc_matrices.shape[0], w_sbject.shape[0], w_sbject.shape[1]))
    w_subjects_aligned[subject_i,...] = w_sbject

# average over the different iteration and save
np.savez_compressed('NKI_200_schaefer_subjects_ce.npz', x = w_subjects_aligned)


### reference

* Nooner, K. B., Colcombe, S. J., Tobe, R. H., Mennes, M., Benedict, M. M., Moreno, A. L.,  Milham, M. P. (2012). The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry. Frontiers in Neuroscience, 6, 152. https://doi.org/10.3389/fnins.2012.00152