# Data Input

Nipype provides many different modules to grab or select the data:

    DataFinder
    DataGrabber
    FreeSurferSource
    JSONFileGrabber
    S3DataGrabber
    SSHDataGrabber
    SelectFiles
    XNATSource
    
Most of them are installed in `interfaces.io`, which probly stands for 'input and output'.

## **DataGrabber**

DataGrabber is an interface for collecting files from hard drive. It is very flexible and supports almost any file organization of your data.

You can use it as a trivial use case of getting a fixed file. By default, DataGrabber stores its outputs in a field called outfiles.


In [1]:
import nipype.interfaces.io as nio

datasource1 = nio.DataGrabber()

datasource1.inputs.base_directory = '/data/ds000114'
datasource1.inputs.template = 'sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'
datasource1.inputs.sort_filelist = True

results = datasource1.run()

results.outputs


outfiles = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

In [2]:
# You can get all NIfTI files containing the word 'fingerfootlips' in all directories starting with the letter 's'.

import nipype.interfaces.io as nio

datasource2 = nio.DataGrabber()
    # to create an instance of the nio class

datasource2.inputs.base_directory = '/data/ds000114'
    # to indicate which directory to search 
datasource2.inputs.template = 's*/ses-test/func/*fingerfootlips*.nii.gz'
    # to indicate the string template to match 
datasource2.inputs.sort_filelist = True
    # to return data in a sorted order

results = datasource2.run()

results.outputs


outfiles = ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-03/ses-test/func/sub-03_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-04/ses-test/func/sub-04_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-05/ses-test/func/sub-05_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-06/ses-test/func/sub-06_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-07/ses-test/func/sub-07_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-08/ses-test/func/sub-08_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-09/ses-test/func/sub-09_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-10/ses-test/func/sub-10_ses-test_task-fingerfootlips_bold.nii.gz']

In [10]:
# To return the functional images from subject 1 and 7 for the task fingerfootlips.

datasource3 = nio.DataGrabber(infields = ['subject_id'])
datasource3.inputs.base_directory = '/data/ds000114'
datasource3.inputs.template = 'sub-%02d/ses-test/func/*fingerfootlips*.nii.gz'
    # %02d? 왼쪽부터 0이 두개 (00)
datasource3.inputs.sort_filelist = True
datasource3.inputs.subject_id = [1, 7]

results = datasource3.run()
results.outputs


outfiles = ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-07/ses-test/func/sub-07_ses-test_task-fingerfootlips_bold.nii.gz']

In [12]:
# To return the functional image of subject 1, task 'fingerfootlips' and the function image of subject 7 for the 'linebisection' task.

datasource4 = nio.DataGrabber(infields=['subject_id', 'run'])
    # Specify two infields-subject_id, run

datasource4.inputs.base_directory = '/data/ds000114'
datasource4.inputs.template = 'sub-%02d/ses-test/func/*%s*.nii.gz'
datasource4.inputs.sort_filelist = True

datasource4.inputs.run = ['fingerfootlips', 'linebisection']
datasource4.inputs.subject_id = [1, 7]

results = datasource4.run()
results.outputs


outfiles = ['/data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/data/ds000114/sub-07/ses-test/func/sub-07_ses-test_task-linebisection_bold.nii.gz']

## **A more realistic use-case**

`DataGrabber` is a generic data grabber module that wraps around `glob` to select your neuroimaging data in an intelligent way. As an example, let's assume we want to grab the anatomical and functional images of a certain subject.

First, we need to create the DataGrabber node. This node needs to have some input fields for all dynamic parameters (e.g. subject identifier, task identifier), as well as the two desired output fields `anat` and `func`.

In [33]:
from nipype import DataGrabber, Node

dg = Node(DataGrabber(infields = ['subject_id', 'ses_name', 'task_name'],
                      outfields = ['anat', 'func']),
          name = 'datagrabber')

dg.inputs.base_directory = '/data/ds000114'

dg.inputs.template = '*'
    #*: a placeholder for any possible string combination
    
dg.inputs.sort_filelist = True


Second, we know that the two files we desire are the following location:

    anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
    func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

We see that the two files only have three dynamic parameters between subjects and task names:

    subject_id: in this case 'sub-01'
    task_name: in this case fingerfootlips
    ses_name: test

This means that we can rewrite the paths as follows:

    anat = /data/ds102/[subject_id]/ses-[ses_name]/anat/sub-[subject_id]_ses-[ses_name]_T1w.nii.gz
    func = /data/ds102/[subject_id]/ses-[ses_name]/func/sub-[subject_id]_ses-[ses_name]_task-[task_name]_bold.nii.gz

Therefore, we need the parameters ``subject_id`` and ``ses_name`` for the anatomical image and the parameters ``subject_id``, ``ses_name`` and ``task_name`` for the functional image. In the context of DataGabber, this is specified as follows:

In [27]:
dg.inputs.template_args = {'anat': [['subject_id', 'ses_name']],
                           'func': [['subject_id', 'ses_name', 'task_name']]}

In [29]:
dg.inputs.field_template = {'anat': 'sub-%02d/ses-%s/anat/*_T1w.nii.gz',
                            'func': 'sub-%02d/ses-%s/func/*task-%s_bold.nii.gz'}

In [30]:
# Using the IdentityInterface
from nipype import IdentityInterface
infosource = Node(IdentityInterface(fields=['subject_id', 'task_name']),
                  name="infosource")
infosource.inputs.task_name = "fingerfootlips"
infosource.inputs.ses_name = "test"
subject_id_list = [1, 2]
infosource.iterables = [('subject_id', subject_id_list)]

In [31]:
# Specifying the input fields of DataGrabber directly
dg.inputs.subject_id = 1
dg.inputs.ses_name = "test"
dg.inputs.task_name = "fingerfootlips"

In [32]:
dg.run().outputs

190128-05:51:58,885 nipype.workflow INFO:
	 [Node] Setting-up "datagrabber" in "/tmp/tmppq57kh_7/datagrabber".
190128-05:51:58,895 nipype.workflow INFO:
	 [Node] Running "datagrabber" ("nipype.interfaces.io.DataGrabber")
190128-05:51:58,908 nipype.workflow INFO:
	 [Node] Finished "datagrabber".



anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

## **Exercise 1**

Grab T1w images from both sessions - ses-test and ses-retest for sub-01.

In [34]:
DataGrabber.help()

Generic datagrabber module that wraps around glob in an
intelligent way for neuroimaging tasks to grab files


.. attention::

   Doesn't support directories currently

Examples
--------

>>> from nipype.interfaces.io import DataGrabber

Pick all files from current directory

>>> dg = DataGrabber()
>>> dg.inputs.template = '*'

Pick file foo/foo.nii from current directory

>>> dg.inputs.template = '%s/%s.dcm'
>>> dg.inputs.template_args['outfiles']=[['dicomdir','123456-1-1.dcm']]

Same thing but with dynamically created fields

>>> dg = DataGrabber(infields=['arg1','arg2'])
>>> dg.inputs.template = '%s/%s.nii'
>>> dg.inputs.arg1 = 'foo'
>>> dg.inputs.arg2 = 'foo'

however this latter form can be used with iterables and iterfield in a
pipeline.

Dynamically created, user-defined input and output fields

>>> dg = DataGrabber(infields=['sid'], outfields=['func','struct','ref'])
>>> dg.inputs.base_directory = '.'
>>> dg.inputs.template = '%s/%s.nii'
>>> dg.inputs.template_args['func'] = [[

In [35]:
from nipype import DataGrabber, Node

# Create DataGrabber Node
ex1_dg = Node(DataGrabber(infields = ['subject_id', 'ses_name'],
                          outfields = ['anat']), #outfields? It sets an output variable name.
              name = 'datagrabber')

# Location of the dataset folder
ex1_dg.inputs.base_directory = '/data/ds000114'

# Necessary default parameters
ex1_dg.inputs.template = '*'
ex1_dg.inputs.sort_filelist = True

# Specify the template
ex1_dg.inputs.template_args = {'anat': [['subject_id', 'ses_name']]}
ex1_dg.inputs.field_template = {'anat': 'sub-%02d/ses-%s/anat/*_T1w.nii.gz'}

# specify subject_id and ses_name you're interested in
ex1_dg.inputs.subject_id = 1
ex1_dg.inputs.ses_name = ["test", "retest"]

# and run the node
ex1_res = ex1_dg.run()

190128-05:59:14,410 nipype.workflow INFO:
	 [Node] Setting-up "datagrabber" in "/tmp/tmpytg778dr/datagrabber".
190128-05:59:14,421 nipype.workflow INFO:
	 [Node] Running "datagrabber" ("nipype.interfaces.io.DataGrabber")
190128-05:59:14,436 nipype.workflow INFO:
	 [Node] Finished "datagrabber".


In [36]:
ex1_res.outputs


anat = ['/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz', '/data/ds000114/sub-01/ses-retest/anat/sub-01_ses-retest_T1w.nii.gz']

## **SelectFiles**

SelectFiles is a more flexible alternative to DataGrabber. It is built on Python 'format strings'. Format strings allow you to replace named sections of template strings set off by curly braces ({}), possibly filtered thru a set of functions that control how the values are rendered into the string. As a very basic example, we could write...

In [37]:
msg = "This workflow uses {package}."

print(msg.format(package = "FSL"))

This workflow uses FSL.


In [38]:
from nipype import SelectFiles, Node

templates = {'anat': 'sub-{subject_id}/ses-{ses_name}/anat/sub-{subject_id}_ses-{ses_name}_T1w.nii.gz',
             'func': 'sub-{subject_id}/ses-{ses_name}/func/sub-{subject_id}_ses-{ses_name}_task-{task_name}_bold.nii.gz'}

sf = Node(SelectFiles(templates),
          name = 'selectfiles')

sf.inputs.base_directory = '/data/ds000114'

sf.inputs.subject_id = '01'
sf.inputs.ses_name = "test"
sf.inputs.task_name = 'fingerfootlips'

In [39]:
sf.run().outputs

190128-06:25:32,670 nipype.workflow INFO:
	 [Node] Setting-up "selectfiles" in "/tmp/tmp8a791_5b/selectfiles".
190128-06:25:32,676 nipype.workflow INFO:
	 [Node] Running "selectfiles" ("nipype.interfaces.io.SelectFiles")
190128-06:25:32,689 nipype.workflow INFO:
	 [Node] Finished "selectfiles".



anat = /data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz
func = /data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

`SelectFiles` is more flexible than `DataGrabber` because you can use the {}-based string. With the {}-based string, we can reuse the same input (e.g. subject_id) multiple times in the same string, without feeding it multiple times into the template. 

Also, you can select multiple files without the need of an iterable node:

    'sub-*/anat/sub-*_T1w.nii.gz'

Let's see how this works:

In [2]:
from nipype import SelectFiles, Node

# String template with {}-based strings

templates = {'anat': 'sub-*/ses-{ses_name}/anat/sub-*_ses-{ses_name}_T1w.nii.gz'}

# Create SelectFiles Node
sf = Node(SelectFiles(templates),
          name = "selectfiles")

sf.inputs.base_directory = '/data/ds000114'

sf.inputs.ses_name = 'test'

sf.run().outputs

190129-01:39:14,804 nipype.workflow INFO:
	 [Node] Setting-up "selectfiles" in "/tmp/tmpjnd6qk45/selectfiles".
190129-01:39:14,812 nipype.workflow INFO:
	 [Node] Running "selectfiles" ("nipype.interfaces.io.SelectFiles")
190129-01:39:14,903 nipype.workflow INFO:
	 [Node] Finished "selectfiles".



anat = ['/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz', '/data/ds000114/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz', '/data/ds000114/sub-03/ses-test/anat/sub-03_ses-test_T1w.nii.gz', '/data/ds000114/sub-04/ses-test/anat/sub-04_ses-test_T1w.nii.gz', '/data/ds000114/sub-05/ses-test/anat/sub-05_ses-test_T1w.nii.gz', '/data/ds000114/sub-06/ses-test/anat/sub-06_ses-test_T1w.nii.gz', '/data/ds000114/sub-07/ses-test/anat/sub-07_ses-test_T1w.nii.gz', '/data/ds000114/sub-08/ses-test/anat/sub-08_ses-test_T1w.nii.gz', '/data/ds000114/sub-09/ses-test/anat/sub-09_ses-test_T1w.nii.gz', '/data/ds000114/sub-10/ses-test/anat/sub-10_ses-test_T1w.nii.gz']

As you can see, now `anat` contains ten file paths, T1w images for all ten subjects.

As a side note, you could also use [] string formatting for some simple case:

    'sub-0[1, 2]/ses-test/anat/sub-0[1, 2]_ses-test_T1w.nii.gz'

### **force_lists pararmeter**

This is to control the default dehavior that when a template matches multiple files they are returned as a list, while a single file is returned as a string. There may be situations where you want to force the outputs to always be returned as a list (for example, you are writing a workflow that expects to operate on several runs of data, but some of your subjects only have a single run). In this case, `force_lists` can be used to tune the outputs of the interface. You can either use a boolean value (which will be appled to every output the interface has), or you can provide a list of the output files that should be coerced to a list.

Returning to our previous example, you may want to ensure that the anat files are returned as a list, but you only ever will have a single T1 file. In this case, you would do


In [7]:
sf2 = SelectFiles(templates, force_lists = ["anat"])

sf2.inputs.base_directory = '/data/ds000114'
sf2.inputs.ses_name = 'test'
sf2.run().outputs

# We can still get a list of pathnames without the force_lists parameter tho... 
# When would it be needed??


anat = ['/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz', '/data/ds000114/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz', '/data/ds000114/sub-03/ses-test/anat/sub-03_ses-test_T1w.nii.gz', '/data/ds000114/sub-04/ses-test/anat/sub-04_ses-test_T1w.nii.gz', '/data/ds000114/sub-05/ses-test/anat/sub-05_ses-test_T1w.nii.gz', '/data/ds000114/sub-06/ses-test/anat/sub-06_ses-test_T1w.nii.gz', '/data/ds000114/sub-07/ses-test/anat/sub-07_ses-test_T1w.nii.gz', '/data/ds000114/sub-08/ses-test/anat/sub-08_ses-test_T1w.nii.gz', '/data/ds000114/sub-09/ses-test/anat/sub-09_ses-test_T1w.nii.gz', '/data/ds000114/sub-10/ses-test/anat/sub-10_ses-test_T1w.nii.gz']

In [11]:
from nipype import SelectFiles, Node

# Error: templates = {'anat': 'sub-01/ses-{test}/anat/sub-01_ses-{test}_T1w.nii.gz'}
templates = {'anat': 'sub-01/ses-*/anat/sub-01_ses-*_T1w.nii.gz'}

sf3 = Node(SelectFiles(templates), name = "selectfiles3")

sf3.inputs.base_directory = '/data/ds000114'

# Error: sf3.inputs.test = 'test', 'retest'

sf3.run().outputs

190129-01:54:47,606 nipype.workflow INFO:
	 [Node] Setting-up "selectfiles3" in "/tmp/tmphfaz2rpn/selectfiles3".
190129-01:54:47,616 nipype.workflow INFO:
	 [Node] Running "selectfiles3" ("nipype.interfaces.io.SelectFiles")
190129-01:54:47,643 nipype.workflow INFO:
	 [Node] Finished "selectfiles3".



anat = ['/data/ds000114/sub-01/ses-retest/anat/sub-01_ses-retest_T1w.nii.gz', '/data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz']

## **FreeSurferSource**

`FreeSurferSource` is a specific case of a file grabber that facilitates the data import of outputs from the FreeSurfer 'recon-all' algorithm. This, of course, requires that you've already run 'recon-all' on your subject.

For the tutorial dataset, recon-all was already run.

In [12]:
!datalad get -r -J 4 /data/ds000114/derivatives/freesurfer/sub-01

[1;1mget[0m([1;33mimpossible[0m): /data/ds000114/derivatives/freesurfer/sub-01 [path not associated with any dataset]


To specify the FreeSurfer output folder:

    from nipype.interfaces.freesurfer import FSCommand
    from os.path import abspath as opap

    fs_dir = opap('/data/ds000114/derivatives/freesurfer/') #output absolute path

    FSCommand.set_default_subejcts_dir(fs_dir)

To create the FreeSurferSource node:

    from nipype import Node
    from nipype.interfaces.io import FreeSurferSource
    
    fssource = Node(FreeSurferSource(subjects_dir = fs_dir),
                    name = 'fssource')
                    

To access multiple FreeSurfer outputs:

    print('aparc_aseg: %s\n' % result.outputs.aparc_aseg)
    print('inflated: %s\n' % result.outputs.inflated)

But as you can see, the inflated output actually contains the file location for both hemispheres. With FreeSurferSource we can also restrict the file selection to a single hemisphere. To do this, we use the hemi input filed:

    fssource.inputs.hemi = 'lh'
    result = fssource.run()
    
    result.outputs.inflated