# **Data Output**

Similarly important to data input is data oupt. Using a data output module allows you `to restructure and rename computed output` and `to spatially differentiate relevant output files from the temporary computed intermediate files in the working directory`. Nipype provides the following modules to handle data stream output:

    DataSink
    JSONFileSink
    MySQLSink
    SQLiteSink
    XNATSink

This tutorial covers only `DataSink`. For the rest, see the section interfaces.io on the official homepage.

## **DataSink**

A workflow working directory is like a **cache(= 은닉처)**. It contains not only the outputs of various processing stages, it also contains various extraneous info such as execution reports, hashfiles determining the input state of processes. All of this is embedded in a hierarchical structure that reflects the iterables that have been used in the workflow. This makes navigating the working directory a not so pleasant experience. 

And typically the user is interested in preserving only a small percentage of these outputs. The `DataSink` interface can be used to extract components from this `cache` and store it at a different location.

Unlike other interfaces, a DataSink's inputs are defined and created by using **the workflow connect statement**. Currently disconnecting an input from the DataSink does not remove that connection port.

The following code segment defines the 'DataSink' node and sets the `base_directory` in which all outputs will be stored. The `container` input creates a subdirectory within the `base_directory`. If you are iterating a workflow over subjects, it may be useful to save it within a floder with the subject id.

```python
datasink = pe.Node(nio.DataSink(), name='sinker')
datasink.inputs.base_directory = '/path/to/output'
workflow.connect(inputnode, 'subject_id', datasink, 'container')
```

If we wanted to save the realigned files and the realignment parameters to the same place, the most intuitive option would be:

```python
workflow.connect(realigner, 'realigned_files', datasink, 'motion')
workflow.connect(realigner, 'realignment_parameters', datasink, 'motion')
```

However, this will not work as only one connection is allowed per input port. So we need to create a second port. We can store the files in a separate folder.

```python
workflow.connect(realigner, 'realigned_files', datasink, 'motion')
workflow.connect(realigner, 'realignment_parameters', datasink, 'motion.par')
```

The period(.) indicates that a subfolder called par should be created. But if we wanted to store it in the same folder as the realigned files, we would use the .@ syntax. The @ tells the DataSink interface to not create the subfolder. This will allow us to create different named input ports for DataSink and allow the user to store the files in the same folder.

```python
workflow.connect(realigner, 'realigned_fiels', datasink, 'motion')
workflow.connect(realigner, 'realignment_parameters', datasink, 'motion.@par')
```

The syntax for the input port of DataSink takes the following form:

```python
string[[.@]]string[[.[@]]string] ...]
where parts between paired [] are optional.
```

### **MapNode**

In order to use DataSink inside a MapNode, its inputs have to be defined inside the constructor using the `infields` keyword arg.

### **Parameterization**

One can run a workflow iterating over various inputs using the iterables attribute of nodes. This means that a given workflow can have multiple outputs depending on how many iterables are there. Iterables create working directory subfolders such as `_iterable_name_value`. 

The `parameterization` input parameter controls whether the data stored using DataSink is in a folder structure that contains this iterable info or not.


### **Substitutions**

The `substitutions and regexp_substitutions` inputs allow users to modify the output destination path and name of a file. Substitutions are a list of 2-tuples and are carried out in the order in which they were entered. Assuming that the output path of a file is:

    /root/container/_variable_1/file_subject_realigned.nii
    
We can use subsitutions to clean up the output path.

```python
datasink.inputs.substitutions = [('_variable', 'variable'),
                                 ('file_subject_', '')]
```

This will rewrite the file as:

    /root/container/variable_1/realigned.nii

## **Preparation**

Before we can use DataSink, we first need to run a workflow. 

Let's create a very short preprocessing workflow that realigns and smooths one functional image of one subject.

In [3]:
from nipype import SelectFiles, Node

templates = {'func' : '{subject}/func/{subject}_task-calorieimage_run-03_bold.nii.gz'}

sf = Node(SelectFiles(templates), name="selectfiles")
sf.inputs.base_directory = '/data/ds001534/'
sf.inputs.subject = 'sub-01'

Second, let's create the motion correction and smoothing node.

In [4]:
from nipype.interfaces.fsl import MCFLIRT, IsotropicSmooth

mcflirt = Node(MCFLIRT(mean_vol=True,
                       save_plots=True), #save_plots?
               name='mcflirt')

smooth = Node(IsotropicSmooth(fwhm=4),
              name='smooth')

Third, let's create the workflow that will contain those three nodes.

In [6]:
from nipype import Workflow
from os.path import abspath

wf = Workflow(name="preprocWF")
wf.base_dir = '/output/working_dir'

wf.connect([(sf, mcflirt, [("func", "in_file")]),
            (mcflirt, smooth, [("out_file", "in_file")])])

wf.run()

190129-08:50:41,391 nipype.workflow INFO:
	 Workflow preprocWF settings: ['check', 'execution', 'logging', 'monitoring']
190129-08:50:41,415 nipype.workflow INFO:
	 Running serially.
190129-08:50:41,417 nipype.workflow INFO:
	 [Node] Setting-up "preprocWF.selectfiles" in "/output/working_dir/preprocWF/selectfiles".
190129-08:50:41,432 nipype.workflow INFO:
	 [Node] Running "selectfiles" ("nipype.interfaces.io.SelectFiles")
190129-08:50:41,477 nipype.workflow INFO:
	 [Node] Finished "preprocWF.selectfiles".
190129-08:50:41,480 nipype.workflow INFO:
	 [Node] Setting-up "preprocWF.mcflirt" in "/output/working_dir/preprocWF/mcflirt".
190129-08:50:41,504 nipype.workflow INFO:
	 [Node] Running "mcflirt" ("nipype.interfaces.fsl.preprocess.MCFLIRT"), a CommandLine Interface with command:
mcflirt -in /data/ds001534/sub-01/func/sub-01_task-calorieimage_run-03_bold.nii.gz -meanvol -out /output/working_dir/preprocWF/mcflirt/sub-01_task-calorieimage_run-03_bold_mcf.nii.gz -plots
190129-08:51:29,796

<networkx.classes.digraph.DiGraph at 0x7f6ab97c2dd8>

After the execution of the workflow we have all the data hidden in the working directory 'working_dir'. Let's take a closer look at the content of this folder:

In [7]:
! tree /output/working_dir/preprocWF

/output/working_dir/preprocWF
├── d3.js
├── graph1.json
├── graph.json
├── index.html
├── mcflirt
│   ├── _0xc369ae8fe58d7133de9c6eed6eac1ffe.json
│   ├── command.txt
│   ├── _inputs.pklz
│   ├── _node.pklz
│   ├── _report
│   │   └── report.rst
│   ├── result_mcflirt.pklz
│   └── sub-01_task-calorieimage_run-03_bold_mcf.nii.gz
├── selectfiles
│   ├── _0x76930eb00c96b2ea8c0523c89ee5bc69.json
│   ├── _inputs.pklz
│   ├── _node.pklz
│   ├── _report
│   │   └── report.rst
│   └── result_selectfiles.pklz
└── smooth
    ├── _0x0ad8446e4bc6fcfcbf492fe07dedc570.json
    ├── command.txt
    ├── _inputs.pklz
    ├── _node.pklz
    ├── _report
    │   └── report.rst
    ├── result_smooth.pklz
    └── sub-01_task-calorieimage_run-03_bold_mcf_smooth.nii.gz

6 directories, 23 files


### **How to use `DataSink`**

`DataSink` is Nipype's standard output module to **restructure your output files.** It allows you to relocate and rename files that you deem relevant. 

Let's try to keep the smoothed functional images as well as the motion correction parameters (what is the file that stores the motion correction parameters?). To do this, we first need to create the `DataSink` object.

In [13]:
from nipype.interfaces.io import DataSink

# Create DataSink object
sinker = Node(DataSink(), name='sinker')

# Name of the output folder
sinker.inputs.base_directory = '/output/working_dir/preprocWR_output'

# Connect DataSink with the relevant nodes
wf.connect([(smooth, sinker, [('out_file', 'in_file')]),
            (mcflirt, sinker, [('mean_img', 'mean_img'),
                               ('par_file', 'par_file')])
           ])
wf.run()

OSError: Duplicate node name "sinker" found.

In [14]:
! tree /output/working_dir/preprocWR_output

/output/working_dir/preprocWR_output
├── in_file
│   └── sub-01_task-calorieimage_run-03_bold_mcf_smooth.nii.gz
├── mean_img
│   └── sub-01_task-calorieimage_run-03_bold_mcf.nii.gz_mean_reg.nii.gz
└── par_file
    └── sub-01_task-calorieimage_run-03_bold_mcf.nii.gz.par

3 directories, 3 files
