# Individual Parcellation Example
This jupyter notebook is to demonstrate a minimal example for generating individual cerebellar parcellations using a new individual localization dataset. Usually, the individual data are collected within a relatively short period (e.g. 10 mins). If we generate individual parcellations based on those data directly using some traditional methods, the results are poor and very noisy. However, in the ``HierarchBayesParcel`` framework, the individual parcellations are generated using an optimal integration of a common group prior and the individual localizing data. The main idea of this settings is to "fill-in" the knowledge to those uncertain areas with the group prior. 

The pipeline has two steps: 
* Train a new emission model for the particular individual localization data. This step can be skipped if you already have a pretrained model for your specific task or resting-state dataset and atlas. 
* Derive the individual parcellations based on the trained emission model and the group prior.

For data import and export we are using the `Functional_Fusion <https://github.com/DiedrichsenLab/Functional_Fusion>`_ Framework, which needs to be installed in addition to the `HierarchBayesParcel` package.

In [10]:
import numpy as np
import torch as pt
import nibabel as nb
import nitools as nt
import matplotlib.pyplot as plt
import Functional_Fusion.atlas_map as am
import Functional_Fusion.dataset as ds
import HierarchBayesParcel.arrangements as ar
import HierarchBayesParcel.full_model as fm

## Step 1: Define the the space in which to generate the individual parcellations
This step defines the atlas space (e.g. fs32k, SUIT, MNISymC3, etc) - an atlas in Functional_Fusion defines a specific set of brainlocations (grayordinates) that are being sampled. Both the probabilistic atlas and the data need to be read into this space.

In [11]:
atlas, _ = am.get_atlas('MNISymC3')

## Step 2: Load the probabilistic group atlas
First, we sample the probabilistic group atlas U from a _probseg.nii file at the required brain location. The resultant matrix U has a shape (K by P), where K is the number of parcel and P is the number of brain locations (voxels).

In [12]:
# Sample the probabilistic atlas at the specific atlas grayordinates
atlas_fname = 'atl-NettekovenAsym32_space-MNI152NLin2009cSymC_probseg.nii.gz'
U = atlas.read_data(atlas_fname)
U = U.T

## Step 3: Build an arrangement model
In the `HierarchBayesParcel` the probabilistic atlas is encoded in the `arrangement model`. Depending on whether you want an symmetric or asymmetric individual parcellations, you can choose a `ArrangeIndependent` or `ArrangeIndependentSymmetric` model. The utility function `build_arrangement_model` simply initializes the arrangement model, making sure that NaN and zero values in the `probseg.nii` files are handled correctly.  

In [13]:
# Build the arrangement model - the parameters are the log-probabilities of the atlas 
ar_model = ar.build_arrangement_model(U, prior_type='prob', atlas=atlas)

## Step 4: Load individual localizing data
For model training, the data of all subjects needs to be arranged into a num_subj x N x P tensor, where N is the number of observations, and P is the number of brain locations (voxels). To estimate the concentration parameter efficiently, it is useful to have multiple measures of the same conditions. The vector `cond_v` indicates the number of the condition, the vector `part_v` indicates the number of independent data partition (e.g. runs). In this example, we have only two repetitions per condition. 

In [None]:
mdtb_dataset = ds.get_dataset_class(':/data/FunctionalFusion','MDTB')
subj = mdtb_dataset.get_participants().participant_id
data, info = [], []
for ses_id in mdtb_dataset.sessions:
        this_data = []
        this_info = []
        info.append(mdtb_dataset.get_info(ses_id=ses_id, type='CondHalf'))
        for i, s in enumerate(subj):
                file_name = f'/{s}_space-{atlas.name}_{ses_id}_CondHalf.dscalar.nii'
                this_data.append(atlas.read_data(data_dir.format(s) + file_name).T)
        data.append(np.stack(this_data))
Now, we assemble condition and partition vectors. cond_v is a list of 1d array to indicate the condition numbers for dimension N, and part_v is a list of 1d array to specify the partitioning (runs, or repeated measurement for example) of a data tensor. sub_ind is to indicate the unique subjects index for each data tensor, repeated subjects across data tensors are theoretically allowed.

cond_v, part_v, sub_ind = [], [], []
for j, inf in enumerate(info):
        cond_v.append(inf['cond_num_uni'].values.reshape(-1,))
        part_v.append(inf['half'].values.reshape(-1,))
        sub_ind.append(np.arange(0, len(subj)))
Here, the length of the four outputs should have same length. This length is the number of emission models in your training model.

