## **Data input for BIDS datasets**

`DataGrabber` and `SelectFiles` are great if you are dealing with generic datasets with arbitrary organization. <br>

However, if you have decided to use BIDS to organize your data, you can take advantage of a formal structure BIDS imposes.

### **`pybids`** - a Python API for working with BIDS datasets
`pybids` is a lightweight python API for querying BIDS folder structure for specific files and metadata.

### **The `layout` object and simple queries**
To begin workgin with pybids we need to initialize a layout object.

In [1]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds001534/")

layout attributes:

    get_subjects()
    get_types()
    get_types(modality='func')
    get_metabata()
    get_tasks()
    get(subject='01', modality="anat", return_type='file')

In [2]:
!tree -L 4 /data/ds001534/

/data/ds001534/
├── annex-uuid
├── CHANGES
├── dataset_description.json
├── participants.json
├── participants.tsv
├── sub-01
│   ├── anat
│   │   ├── sub-01_T1w.json
│   │   └── sub-01_T1w.nii.gz
│   └── func
│       ├── sub-01_task-calorieimage_run-03_bold.json
│       ├── sub-01_task-calorieimage_run-03_bold.nii.gz
│       ├── sub-01_task-calorieimage_run-03_events.json
│       ├── sub-01_task-calorieimage_run-03_events.tsv
│       ├── sub-01_task-calorieimage_run-04_bold.json
│       ├── sub-01_task-calorieimage_run-04_bold.nii.gz
│       ├── sub-01_task-calorieimage_run-04_events.json
│       ├── sub-01_task-calorieimage_run-04_events.tsv
│       ├── sub-01_task-foodimage_run-01_bold.json
│       ├── sub-01_task-foodimage_run-01_bold.nii.gz
│       ├── sub-01_task-foodimage_run-01_events.json
│       ├── sub-01_task-foodimage_run-01_events.tsv
│       ├── sub-01_task-foodimage_run-02_bold.json
│       ├── sub-01_task-foodimage_run-02_bold.nii.gz
│       ├── sub-01_task-foodimage_r

In [3]:
# To figure out what are the subject labels in this dataset

layout.get_subjects()

['01',
 '02',
 '03',
 '04',
 '05',
 '06',
 '07',
 '08',
 '09',
 '10',
 '11',
 '12',
 '13',
 '14',
 '15',
 '16',
 '17',
 '18',
 '19',
 '20',
 '21',
 '22',
 '23',
 '24',
 '25',
 '26',
 '27',
 '28',
 '29',
 '30',
 '31',
 '32',
 '33',
 '34',
 '35',
 '36',
 '37',
 '38',
 '39',
 '40',
 '41',
 '42']

In [4]:
# To know the modalities that are included in this datset

layout.get_modalities()

['anat', 'func']

In [5]:
# To know different data types that are included in this dataset

layout.get_types(modality='func')

['bold', 'events']

In [6]:
# To know different tasks included in this dataset

layout.get_tasks()

['calorieimage', 'foodimage']

In [12]:
layout.get(subject='01', modality="anat")

# This returns not only the file paths, but objects with relevant query fields.

[File(filename='/data/ds001534/sub-01/anat/.DS_Store', subject='01', type='', modality='anat'),
 File(filename='/data/ds001534/sub-01/anat/sub-01_T1w.json', subject='01', type='T1w', modality='anat'),
 File(filename='/data/ds001534/sub-01/anat/sub-01_T1w.nii.gz', subject='01', type='T1w', modality='anat')]

In [13]:
# To extract JUST THE FILE PATHS

layout.get(subject='01', type='bold', extensions=['nii', 'nii.gz'], return_type='file')

['/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.nii.gz']

### **Exercise 1:**

List all files for the "foodimage" task for subject 10.

In [14]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/ds001534/")

layout.get(subject='10', modality="func", task='foodimage', return_type='file')

['/data/ds001534/sub-10/func/sub-10_task-foodimage_run-01_bold.json',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-01_bold.nii.gz',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-01_events.json',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-01_events.tsv',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-02_bold.json',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-02_bold.nii.gz',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-02_events.json',
 '/data/ds001534/sub-10/func/sub-10_task-foodimage_run-02_events.tsv']

### **BIDSDataGrabber: Including `pybids` in your `nipype` workflow**


In [74]:
from nipype.interfaces.io import BIDSDataGrabber
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.utility import Function

bg = Node(BIDSDataGrabber(), name='bigs-grabber')

bg.inputs.base_dir = '/data/ds001534'
bg.inputs.subject = '01'
results = bg.run()
results.outputs

190129-07:11:35,895 nipype.workflow INFO:
	 [Node] Setting-up "bigs-grabber" in "/tmp/tmpv8ufgpdk/bigs-grabber".
190129-07:11:35,905 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:36,561 nipype.workflow INFO:
	 [Node] Finished "bigs-grabber".



anat = ['/data/ds001534/sub-01/anat/sub-01_T1w.nii.gz']
func = ['/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.nii.gz']

In [75]:
bg.inputs.output_query = {'bolds' : dict(type='bold')}
#bg.inputs.output_query = {'bolds' : {'type':'bold'}}

res = bg.run()
res.outputs

190129-07:11:37,109 nipype.workflow INFO:
	 [Node] Setting-up "bigs-grabber" in "/tmp/tmpv8ufgpdk/bigs-grabber".
190129-07:11:37,125 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:37,791 nipype.workflow INFO:
	 [Node] Finished "bigs-grabber".



bolds = ['/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.json', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.json', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.json', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.json', '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.nii.gz']

In [76]:
a = dict(type='bold')
a

{'type': 'bold'}

In [77]:
layout.get(subject='01', return_type = 'file', type = 'bold')

['/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.json',
 '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.json',
 '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.json',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.nii.gz',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.json',
 '/data/ds001534/sub-01/func/sub-01_task-foodimage_run-02_bold.nii.gz']

In [78]:
def test_args_kwargs(arg1, arg2, arg3):
    print ("arg1: ", arg1)
    print ("arg2: ", arg2)
    print ("arg3: ", arg3)
    
kwargs = {"arg3":3, "arg2": "two", "arg1": 5}
# test_args_kwargs(kwargs)

test_args_kwargs(**kwargs)
test_args_kwargs(arg1=5, arg2="two", arg3="3")

arg1:  5
arg2:  two
arg3:  3
arg1:  5
arg2:  two
arg3:  3


Now, let's put it in a workflow. For demonstration purposes, we will add a couple of nodes that pretend to analyze their inputs.

In [79]:
def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeBOLD")


In [80]:
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD, "paths")
wf.run()

190129-07:11:44,886 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
190129-07:11:44,911 nipype.workflow INFO:
	 Running serially.
190129-07:11:44,914 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bigs-grabber" in "/tmp/tmpv8ufgpdk/bigs-grabber".
190129-07:11:44,922 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:45,463 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bigs-grabber".
190129-07:11:45,464 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/tmp/tmp13jux7b0/bids_demo/analyzeBOLD".
190129-07:11:45,483 nipype.workflow INFO:
	 [Node] Running "analyzeBOLD" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.json', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz', '/data/ds001534/sub-01/func/sub-01_task-calorieimage_run-04_bold.json', '/data/ds0

<networkx.classes.digraph.DiGraph at 0x7f593e659470>

## **Exercise 2:**

Modify the BIDSDataGrabber and the workflow to collect T1ws images for subject 10.



In [81]:
bg_ex = Node(BIDSDataGrabber(), name='bigs-grabber')

bg_ex.inputs.base_dir = '/data/ds001534/'
bg_ex.inputs.subject = '10'
bg_ex.inputs.output_query = {'anat' : dict(modality='anat')}
res1 = bg_ex.run()
res1.outputs

190129-07:11:47,465 nipype.workflow INFO:
	 [Node] Setting-up "bigs-grabber" in "/tmp/tmpxbei6o8e/bigs-grabber".
190129-07:11:47,473 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:48,133 nipype.workflow INFO:
	 [Node] Finished "bigs-grabber".



anat = ['/data/ds001534/sub-10/anat/sub-10_T1w.json', '/data/ds001534/sub-10/anat/sub-10_T1w.nii.gz']

In [82]:
def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeANAT")

wf = Workflow(name="bids_demo")
wf.connect(bg_ex, "anat", analyzeBOLD, "paths")
wf.run()

190129-07:11:48,915 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
190129-07:11:48,941 nipype.workflow INFO:
	 Running serially.
190129-07:11:48,944 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bigs-grabber" in "/tmp/tmpxbei6o8e/bigs-grabber".
190129-07:11:48,952 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:49,640 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bigs-grabber".
190129-07:11:49,641 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeANAT" in "/tmp/tmp7b90d8yh/bids_demo/analyzeANAT".
190129-07:11:49,652 nipype.workflow INFO:
	 [Node] Running "analyzeANAT" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds001534/sub-10/anat/sub-10_T1w.json', '/data/ds001534/sub-10/anat/sub-10_T1w.nii.gz']


190129-07:11:49,659 nipype.workflow INFO:
	 [Node] Finished "bids_demo.analyzeANAT".


<networkx.classes.digraph.DiGraph at 0x7f593e4e5898>

## **Iterating over subject labels**

In the previous example, we demonstrated how to use pybids to "analyze" one subject. How can we scale it for all subjects?

    By using iterables!

In [83]:
bg_all = Node(BIDSDataGrabber(), name='bids-grabber')

# Mandatory input
bg_all.inputs.base_dir = '/data/ds001534/'
bg_all.inputs.output_query = {'bolds' : dict(type='bold')}
bg_all.iterables = ('subject', layout.get_subjects()[:2]) #iterable!, till subject 2 (0, 1)
    # 이 subject 이름은 어디서 얻어서 그렇게 확신한거지?

wf = Workflow(name="bids_demo")
wf.connect(bg_all, "bolds", analyzeBOLD, "paths")
wf.run()

190129-07:11:50,856 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
190129-07:11:50,882 nipype.workflow INFO:
	 Running serially.
190129-07:11:50,883 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpzztcy67x/bids_demo/_subject_02/bids-grabber".
190129-07:11:50,891 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:51,562 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
190129-07:11:51,563 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeANAT" in "/tmp/tmpdqetavmx/bids_demo/_subject_02/analyzeANAT".
190129-07:11:51,586 nipype.workflow INFO:
	 [Node] Running "analyzeANAT" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/data/ds001534/sub-02/func/sub-02_task-calorieimage_run-03_bold.json', '/data/ds001534/sub-02/func/sub-02_task-calorieimage_run-03_bold.nii.gz', '/data/ds001534/sub-02/func/sub-02_task-caloriei

<networkx.classes.digraph.DiGraph at 0x7f593e7c47f0>

## **Accessing additional metadata**

Querying different files is nice, but sometimes you want to access more metadata. For example, RepetitionTime.pybids can help with that as well!

In [84]:
layout.get_metadata('/data/ds001534/sub-01/func/sub-01_task-foodimage_run-01_bold.nii.gz')

{'RepetitionTime': 2.5,
 'dcmmeta_slice_dim': 'TODO',
 'FlipAngle': 90,
 'ProcedureStepDescription': 'TODO',
 'dcmmeta_reorient_transform': [[0.0, -1.0, 0.0, 95.0],
  [1.0, 0.0, 0.0, 0.0],
  [0.0, 0.0, 1.0, 0.0],
  [0.0, 0.0, 0.0, 1.0]],
 'ManufacturersModelName': 'Achieva',
 'dcmmeta_shape': [80, 80, 35, 144],
 'TaskName': 'foodimage',
 'ImageType': ['ORIGINAL', 'PRIMARY', 'FMRI', 'NONE', 'ND', 'NORM', 'MOSA'],
 'EchoTime': 0.035,
 'MagneticFieldStrength': 3,
 'CogAtlasID': 'TODO',
 'PhaseEncodingDirection': 'j-',
 'dcmmeta_version': 0.6,
 'Manufacturer': 'Philips',
 'SliceTiming': [0.0,
  0.428571,
  0.857143,
  1.285714,
  1.71428,
  2.142857,
  0.071429,
  0.5,
  0.928571,
  1.357143,
  1.785714,
  2.214286,
  0.142857,
  0.571429,
  1.0,
  1.428571,
  1.857143,
  2.285714,
  0.214286,
  0.642857,
  1.071429,
  1.5,
  1.928571,
  2.357143,
  0.285714,
  0.714286,
  1.142857,
  1.571429,
  2.0,
  2.428571,
  0.357143,
  0.785714,
  1.214286,
  1.642857,
  2.071429]}

In [85]:
def printMetadata(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing  " + path + "\nTR: "+ str(layout.get_metadata(path)["RepetitionTime"]) + "\n\n")

analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                                output_names=[]), name="analyzeBOLD2", iterfield="path")
# Mapnode???

analyzeBOLD2.inputs.data_dir = "/data/ds001534/"

In [86]:
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD2, "path")
wf.run()

190129-07:11:58,989 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
190129-07:11:59,10 nipype.workflow INFO:
	 Running serially.
190129-07:11:59,11 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bigs-grabber" in "/tmp/tmpv8ufgpdk/bigs-grabber".
190129-07:11:59,18 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:11:59,660 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bigs-grabber".
190129-07:11:59,661 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD2" in "/tmp/tmpq011yusy/bids_demo/analyzeBOLD2".
190129-07:11:59,682 nipype.workflow INFO:
	 [Node] Setting-up "_analyzeBOLD20" in "/tmp/tmpq011yusy/bids_demo/analyzeBOLD2/mapflow/_analyzeBOLD20".
190129-07:11:59,690 nipype.workflow INFO:
	 [Node] Running "_analyzeBOLD20" ("nipype.interfaces.utility.wrappers.Function")


analyzing  /data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.json
TR: 2.5



<networkx.classes.digraph.DiGraph at 0x7f593e35f080>

In [87]:
def printMetadata(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing  " + path + "\nTR: "+ str(layout.get_metadata(path)["EchoTime"]) + "\n\n")

analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                                output_names=[]), name="analyzeBOLD2", iterfield="path")
# Mapnode???

analyzeBOLD2.inputs.data_dir = "/data/ds001534/"

wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD2, "path")
wf.run()

190129-07:13:21,325 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
190129-07:13:21,357 nipype.workflow INFO:
	 Running serially.
190129-07:13:21,360 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bigs-grabber" in "/tmp/tmpv8ufgpdk/bigs-grabber".
190129-07:13:21,368 nipype.workflow INFO:
	 [Node] Running "bigs-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
190129-07:13:22,228 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bigs-grabber".
190129-07:13:22,229 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD2" in "/tmp/tmplyugg_5z/bids_demo/analyzeBOLD2".
190129-07:13:22,258 nipype.workflow INFO:
	 [Node] Setting-up "_analyzeBOLD20" in "/tmp/tmplyugg_5z/bids_demo/analyzeBOLD2/mapflow/_analyzeBOLD20".
190129-07:13:22,268 nipype.workflow INFO:
	 [Node] Running "_analyzeBOLD20" ("nipype.interfaces.utility.wrappers.Function")


analyzing  /data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.json
TR: 0.

<networkx.classes.digraph.DiGraph at 0x7f593e38c198>