In [1]:
cd
build_path niak psom
cd /home/pbellec/git/niak_tutorials/connectome

Adding library niak to the search path.

Adding library psom to the search path.



# Subtype pipeline
This tutorial shows how to use the NIAK subtype pipeline to identify subgroups of subjects with homogeneous seed-based functional connectivity maps in the COBRE lightweight20 release, and check if some subtypes are associated with a diagnosis of schizophrenia. See the [documentation](http://niak.simexp-lab.org/pipe_connectome.html) of the pipeline for a more detailed description of the options. Download the tutorial as a notebook [here](https://raw.githubusercontent.com/SIMEXP/niak_tutorials/master/connectome/niak_tutorial_rmap_connectome.ipynb) and a matlab script [here](https://raw.githubusercontent.com/SIMEXP/niak_tutorials/master/connectome/niak_tutorial_rmap_connectome.m). To run this tutorial, we recommend to use [jupyter](http://jupyter.org/) from a niak docker container, as described in the [NIAK installation page](http://niak.simexp-lab.org/niak_installation.html). 

To run that tutorial, you need to complete first the [correlation maps tutorial](http://niak.simexp-lab.org/niak_tutorial_rmap_connectome.html). You also need to run both tutorials in the same folder. 

# Preparing the files

## Phenotypic data

We first load the phenotypic data from the COBRE sample. This command will not work if you haven't downloaded the data yet as part of the rmap tutorial. 

In [18]:
path_cobre = [pwd filesep 'cobre_lightweight20'];
file_pheno = [path_cobre filesep 'phenotypic_data.tsv.gz'];
tab = niak_read_csv_cell(file_pheno);

Now we convert the values into a series of numerical covariates, that we save in a separate .csv file:

In [20]:
list_subject = tab(2:end,1);
patient = strcmp(tab(2:end,5),'Patient');
age = str2double(tab(2:end,2)); 
FD = str2double(tab(2:end,9));
opt_csv.labels_x = list_subject; % Labels for the rows
opt_csv.labels_y = { 'age' , 'patient' , 'fd' };
niak_write_csv('model_patient.csv', [age patient FD] , opt_csv);

We specify to the pipeline where to find the model:

In [31]:
files_in.model = [pwd filesep 'model_patient.csv'];

# Connectivity maps

Now we are going to get a list of the connectivity maps associated with each subject for one network, say the DMN. Labels for each network have been specified when running the `connectome` pipeline. We simply grab the outputs of the connectome pipeline. 

In [37]:
path_connectome = [pwd filesep 'connectome'];
files_conn = niak_grab_connectome(path_connectome);
files_in.data = files_conn.rmap.DMN;

## Brain mask
We specify the mask of brain networks to the pipeline, so that it can use it to mask the grey matter. 

In [39]:
files_in.mask = files_conn.network_rois;

# Set up the options of the pipeline

First specify where to save the outputs, and how many networks to use: 

In [40]:
%% General
opt.folder_out = [pwd filesep 'subtype'];    
opt.scale = 1;

Then specify which covariates to use as confounds **before** the generation of subtypes. 

In [56]:
% turn on/off regression of confounds during stacking (true: apply / false: don't apply)
opt.stack.flag_conf = true;                 
% a list of variable names to be regressed out
opt.stack.regress_conf = {'fd'};     

The options for the subtypes themselves:

In [43]:
%% Subtyping
opt.subtype.nb_subtype = 2;        % the number of subtypes to extract
opt.subtype.sub_map_type = 'mean'; % the model for the subtype maps (options are 'mean' or 'median')

Now we add an association test between subtypes and the patient label:

In [51]:
% turn on/off GLM association testing (true: apply / false: don't apply)
opt.flag_assoc = true;                                
% scalar number for the level of acceptable false-discovery rate (FDR) for the t-maps
opt.association.patient.fdr = 0.05;                           
% method for how the FDR is controlled
opt.association.patient.type_fdr = 'BH';                      
% turn on/off normalization of covariates in model (true: apply / false: don't apply)
opt.association.patient.normalize_x = false;                   
% turn on/off normalization of all data (true: apply / false: don't apply)
opt.association.patient.normalize_y = false;                  
% type of correction for normalization (options: 'mean', 'mean_var')
opt.association.patient.normalize_type = 'mean';              
% turn on/off adding a constant covariate to the model
opt.association.patient.flag_intercept = true;     
% To test a main effect of a variable
opt.association.patient.contrast.patient = 1; % scalar number for the weight of the variable in the contrast
opt.association.patient.contrast.fd = 0;      % scalar number for the weight of the variable in the contrast
opt.association.patient.contrast.age = 0;     % scalar number for the weight of the variable in the contrast
% type of data for visulization (options are 'continuous' or 'categorical')
opt.association.patient.visu_type = 'categorical'; 
% turn on/off making plots for GLM testing (true: apply / false: don't apply)
opt.association.patient.flag_visu = true; 

It is also possible to add a single chi-square test on the relationship between subtypes and a categorical variable:

In [49]:
% turn on/off running Chi-square test (true: apply / false: don't apply)
opt.flag_chi2 = false;               
% string name of the column in files_in.model on which the contigency table will be based
opt.chi2.group_col_id = 'patient';    

# Run the pipeline

In [57]:
opt.flag_test = false;  % Put this flag to true to just generate the pipeline without running it.
[pipeline,opt] = niak_pipeline_subtype(files_in,opt);

    psom_struct_defaults at line 112 column 5
    niak_pipeline_subtype at line 245 column 5

Logs will be stored in /home/pbellec/git/niak_tutorials/connectome/subtype/logs/
Generating dependencies ...
   Percentage completed :  20 40 60 80 100- 0.01 sec
Setting up the to-do list ...
   I found 5 job(s) to do.
Deamon started on 25-Nov-2016 23:51:20
25-Nov-2016 23:51:20 Starting the pipeline manager...
25-Nov-2016 23:51:20 Starting the garbage collector...
25-Nov-2016 23:51:20 Starting worker number 1...

Pipeline started on 25-Nov-2016 23:51:21
user: pbellec, host: acacia, system: unix
*****************************************
25-Nov-2016 23:51:22 stack_1                  submitted  (1 run / 0 fail / 0 done / 4 left)
25-Nov-2016 23:51:24 stack_1                  finished   (0 run / 0 fail / 1 done / 4 left)
25-Nov-2016 23:51:24 subtype_1                submitted  (1 run / 0 fail / 1 done / 3 left)
25-Nov-2016 23:51:27 subtype_1                finished   (0 run / 0 fail / 2 done / 3 le