- The BIDScoin workflow
- The BIDScoin tools
- The bidsmap files
- BIDScoin functionality / TODO
- BIDScoin tutorial
BIDScoin is a python commandline toolkit that converts ("coins") source-level (raw) MRI data-sets to nifti / json / tsv data-sets that are organized according to the Brain Imaging Data Standard, a.k.a. BIDS. Rather then depending on complex or ambiguous logic, BIDScoin uses a simple (but powerful) key-value approach to convert the raw source data into BIDS data. The key values that can be used in BIDScoin to map the data are:
- Information in the MRI header files (DICOM, PAR/REC or .7 format; e.g. SeriesDescription)
- Information from nifti headers (e.g. image dimensionality)
- Information in the file structure (file- and/or directory names, e.g. number of files)
The key-value heuristics are stored in flexible, human readable and broadly supported YAML files. The nifti- and json-files are generated with dcm2niix. For more information on the installation and requirements, see the installation guide.
Currently, BIDScoin is quite functional, but note that only option (1) has been implemented for DICOM files. (Options (2) and (3) are planned for future versions, such that (3) takes precedence over (2), which in turn takes precedence over (1)).
BIDScoin is a user friendly toolkit that requires no programming knowledge in order to use it, just some basic file handling and, possibly, minor (YAML) text editing skills.
The BIDScoin workflow
BIDScoin will take your raw data as well as a YAML file with the key-value mapping information as input, and returns a BIDS folder as output. Here is how to prepare the BIDScoin inputs:
A minimally organised raw data folder, following a
/raw/sub-[identifier]/ses-[identifier]/[seriesfolder]/[dicomfile]structure. This data organization is how users receive their data from the (Siemens) scanners at the DCCN (NB: the
ses-[identifier]sub-folder is optional and can be left out).
If your data is not already organized in this way, you can use the
dicomsort.pycommand-line utility to move your unordered DICOM-files into a
seriesfolderorganization with the DICOM series-folders being named [SeriesNumber]-[SeriesDescription]. Series folders contain a single data type and are typically acquired in a single run.
Another command-line utility that can be helpful in organizing your raw data is
rawmapper.py. This utility can show you the overview (map) of all the values of DICOM-fields of interest in your data-set and, optionally, use these fields to rename your raw data sub-folders (this can be handy e.g. if you manually entered subject-identifiers as [Additional info] at the scanner console and you want to use these to rename your subject folders).
If these utilities do not satisfy your needs, then have a look at this reorganize_dicom_files tool.
A YAML file with the key-value mapping information, i.e. a bidsmap. There are two ways to create such a bidsmap.
The first is if you are a new user and are working from scratch. In this case you would start with the
bidstrainer.pycommand-line tool (see the BIDScoin workflow diagram and the bidstrainer section).
If you have run the bidstrainer or, e.g. if you work in an institute where someone else (i.e. your MR physicist ;-)) has already performed the training procedure, you can use the training data to map all the files in your data-set with the
bidsmapper.pycommand-line tool (see the bidsmapper section).
The output of the bidsmapper is the complete bidsmap that you can inspect to see if your raw data will be correctly mapped onto BIDS. If this is not the case you can go back to the training procedure and change or add new samples, and rerun the bidstrainer and bidsmapper until you have a suitable bidsmap. Alternatively, or in addition to, you can directly edit the bidsmap yourself (this requires more expert knowledge but can also be more powerful).
BIDScoin workflow. Left: New users would start with the bidstrainer, which output can be fed into the bidsmapper to produce the bidsmap.yaml file. This file can (and should) be inspected and, in case of incorrect mappings, inform the user to add raw training samples and re-run the training procedure (dashed arrowlines). Right: Institute users could start with an institute provided bidsmap file (e.g. bidsmap_dccn.yaml) and directly use the bidsmapper. In case of incorrect mappings they could ask the institute for an updated bidsmap (dashed arrowline).
Having an organized raw data folder and a correct bidsmap, the actual data-set conversion to BIDS can now be performed fully automatically by simply running the
bidscoiner.py command-line tool (see the BIDScoin workflow diagram and the bidscoiner section).
The BIDScoin tools
Running the bidstrainer
usage: bidstrainer.py [-h] bidsfolder [samplefolder] [bidsmap] Takes example files from the samples folder as training data and creates a key-value mapping, i.e. a bidsmap_sample.yaml file, by associating the file attributes with the file's BIDS-semantic pathname positional arguments: bidsfolder The destination folder with the bids data structure samplefolder The root folder of the directory tree containing the sample files / training data. Optional argument, if left empty, bidsfolder/code/samples is used or such an empty directory tree is created bidsmap The bidsmap YAML-file with the BIDS heuristics (optional argument, default: ./heuristics/bidsmap_template.yaml) optional arguments: -h, --help show this help message and exit examples: bidstrainer.py /project/foo/bids bidstrainer.py /project/foo/bids /project/foo/samples bidsmap_custom
The core idea of the bidstrainer is that you know your own scan protocol and can therefore point out which files should go where in the BIDS. In order to do so, you have to place raw sample files for each of the BIDS data types / runs in your scan protocol (e.g. T1, fMRI, etc) in the appropriate folder of a semantic folder tree (named
samples, see the bidstrainer example). If you run
bidstrainer.py with just the name of your bidsfolder, bidstrainer will create this semantic folder tree for you in the
code subfolder (if it is not already there). Generally, when placing your sample files, it will be fairly straightforward to find your way in this semantic folder tree, but in doubt you should have a look at the BIDS specification. Note that the deepest foldername in the tree denotes the BIDS suffix (e.g. "T1w"). You do not need to place samples from your non-BIDS data types / runs (such as localizer or spectroscopy scans) in the folder tree, these data types will automatically go into the "extra_data" folder.
If all sample files have been put in the appropriate location, you can (re)run the bidstrainer to create a bidsmap file for your study. How this works is that, on one hand, the bidstrainer will read a predefined set of (e.g. key DICOM) attributes from each sample file and, on the other hand, take the path-names of the sample files to infer the associated BIDS modality. In this way, a list of unique key-value mappings between sets of (DICOM) attributes and sets of BIDS-labels is defined, the so-called bidsmap, that can be used as input for the bidsmapper tool. If the predifend set of DICOM attributes does not uniquely identify your particular scan sequences (not likely but possible), or if you simnply prefer to use more or other attributes, you can (copy and) edit the bidsmap_template.yaml file in the heuristics folder and re-run the bidstrainer whith this customized template as an input argument.
Running the bidsmapper
usage: bidsmapper.py [-h] [-a] rawfolder bidsfolder [bidsmap] Creates a bidsmap.yaml YAML file that maps the information from all raw data to the BIDS labels (see also [bidsmap_template.yaml] and [bidstrainer.py]). You can check and edit the bidsmap.yaml file before passing it to [bidscoiner.py] positional arguments: rawfolder The source folder containing the raw data in sub-#/ses-#/series format bidsfolder The destination folder with the bids data structure bidsmap The bidsmap YAML-file with the BIDS heuristics (optional argument, default: bidsfolder/code/bidsmap_sample.yaml) optional arguments: -h, --help show this help message and exit -a, --automatic If this flag is given the user will not be asked for help if an unknown series is encountered examples: bidsmapper.py /project/foo/raw /project/foo/bids bidsmapper.py /project/foo/raw /project/foo/bids bidsmap_dccn
bidsmapper.py tool goes over all raw data folders of your dataset and saves the known and unknown key-value mappings in a (study specific) bidsmap file. You can consider it as a dry-run for how exactly the bidscoiner will convert the raw data into BIDS folders. It gives you the opportunity to inspect the resulting
bidsmap.yaml file to see if all data types / runs were recognized correctly with proper BIDS labels before doing the actual conversion to BIDS. Unexpected mappings or poor BIDS labels can be found if your bidstraining or the bidsmap file that was provided to you was incomplete. In that case you should either get an updated bidsmap file or redo the bidstraining with new sample files, rerun the bidstrainer and bidsmapper until you have a suitable
bidsmap.yaml file. You can of course also directly edit the
bidsmap.yaml file yourself, for instance by changing some of the automatically generated BIDS labels to your needs (e.g. "task_label").
Running the bidscoiner
usage: bidscoiner.py [-h] [-s [SUBJECTS [SUBJECTS ...]]] [-f] [-p] [-b BIDSMAP] rawfolder bidsfolder Converts ("coins") datasets in the rawfolder to nifti / json / tsv datasets in the bidsfolder according to the BIDS standard. Check and edit the bidsmap.yaml file to your needs before running this function. Provenance, warnings and error messages are stored in the ../bidsfolder/code/bidscoiner.log file positional arguments: rawfolder The source folder containing the raw data in sub-#/ses-#/series format bidsfolder The destination folder with the bids data structure optional arguments: -h, --help show this help message and exit -s [SUBJECTS [SUBJECTS ...]], --subjects [SUBJECTS [SUBJECTS ...]] Space seperated list of selected sub-# names / folders to be processed. Otherwise all subjects in the rawfolder will be selected -f, --force If this flag is given subjects will be processed, regardless of existing folders in the bidsfolder. Otherwise existing folders will be skipped -p, --participants If this flag is given those subjects that are in particpants.tsv will not be processed (also when the --force flag is given). Otherwise the participants.tsv table is ignored -b BIDSMAP, --bidsmap BIDSMAP The bidsmap YAML-file with the study heuristics. If the bidsmapfile is relative (i.e. no "/" in the name) then it is assumed to be located in bidsfolder/code/. Default: bidsmap.yaml examples: bidscoiner.py /project/raw /project/bids bidscoiner.py -f /project/raw /project/bids -s sub-009 sub-030
bidscoiner.py tool is the workhorse of the toolkit that will fully automatically convert your source-level (raw) MRI data-sets to BIDS organized data-sets. In order to do so, it needs a bidsmap file, which is typically created by running the bidsmapper tool. You can run
bidscoiner.py after all data is collected, or whenever new data has been added to the raw folder (presuming the scan protocol hasn't changed).
After a successful run of
bidscoiner.py, the work to convert your data in a fully compliant BIDS dataset is unfortunately not yet fully over and, depending on the complexity of your data-set, additional tools may need to be run and meta-data may need to be entered manually (not everything can be automated). For instance, you should update the content of the
README files in your bids folder and you may need to provide e.g. additional
participants.json files (see the BIDS specification for more information). Moreover, if you have behavioural log-files you will find that BIDScoin does not (yet) support converting these into BIDS compliant
*_events.tsv/json files (advanced users are encouraged to use the
bidscoiner.py plug-in possibility and write their own log-file parser).
If all of the above work is done, you can (and should) run the web-based bidsvalidator to check for inconsistencies or missing files in your bids data-set (NB: the bidsvalidator also exists as a command-line tool).
NB: The provenance of the produced BIDS data-sets is stored in the
bids/code/bidscoiner.log file. This file is also very useful for debugging / tracking down bidsmapping issues.
The bidsmap files
A bidsmap file contains a collection of key-value dictionaries that define unique mappings between different types of raw data files (e.g. DICOM series) and their corresponding BIDS labels. As bidsmap files are both inputs as well as outputs for the different BIDScoin tools (except for
bidscoiner.py, which has BIDS data as output; see the BIDScoin workflow), they are derivatives of eachother and, as such, share the same basic structure. The bidsmap_template.yaml file is relatively empty and defines only which attributes (but not their values) are mapped to which BIDS-labels. The bidsmap_[sample/site].yaml file contains actual attribute values (e.g. from training samples from a certain study or site) and their associated BIDS-values. The final bidsmap.yaml file contains the attribute and associated BIDS values for all types of data found in entire raw data collection.
A bidsmap file consists of help-text, followed by several mapping sections, i.e. Options, DICOM, PAR, P7, Nifti, FileSystem and Plugin. Within each of these sections there different sub-sections for the different BIDS modalities, i.e. for anat, func, dwi, fmap and beh. There are a few additional sections, i.e. participant_label, session_label and extra_data. Schematically, a bidsmap file has the following structure:
- Options (A list of general options that can be passed to the bidscoiner and its plug-ins)
- participant_label [a DICOM field]
- session_label [a DICOM field]
- [a DICOM field]
- [another DICOM field]
- [a DICOM field]
- [another DICOM field]
- extra_data (all non-BIDS data)
- PlugIn. Name of the python plug-in function. Supported but this is an experimental (untested) feature
Inside each BIDS modality, there can be multiple key-value mappings that map (e.g. DICOM) modality [attributes] to the BIDS [labels] (e.g. "task_label"), as indicated below:
Bidsmap_sample example. As indicated by the solid arrowline, the set of DICOM values (suitable to uniquely identify the DICOM series) are used here a key-set that maps onto the set of BIDS labels. Note that certain BIDS labels are enclosed by pointy brackets, marking their dynamic value. In this bidsmap, as indicated by the dashed arrowline, that means that <ProtocolName> will be replaced in a later stage by "t1_mprage_sag_p2_iso_1.0". Also note that in this bidsmap there was only one T1-image, but there where two different fMRI runs (here because of multi-echo, but multiple tasks could also be listed)
Tips and tricks
The attribute value can also be a list, in which case a DICOM series is positively identified if its attribute value is in this list.
The BIDS labels can be static, in which case the value is just a normal string, or dynamic, when the string is enclosed with pointy brackets like <attribute name> or <<argument1><argument2>> (see the example above). In case of single pointy brackets the value will be replaced during bidsmapper and bidscoiner runtime by the value of the attribute with that name. In case of double pointy brackets, the value will be updated for each subject/session during bidscoiner runtime (e.g. the <<runindex>> value will be increased if a file with the same runindex already exists in that directory).
Field maps: IntendedFor
You can use the "IntendedFor" field to indicate for which runs (DICOM series) a fieldmap was intended. The dynamic value of the "IntendedFor" field can be a list of string patterns that is used to include those runs that have that string pattern in their nifti pathname (e.g. <<Stop*Go><Reward>> to include "Stop1Go"-, "Stop2Go"- and "Reward"-runs).
BIDScoin functionality / TODO
- DICOM source data
- PAR / REC source data
- P7 source data
- Nifti source data
- Multi-echo data
- Multi-coil data
- Stimulus / behavioural logfiles
This tutorial is specific for researchers from the DCCN and makes use of data-sets stored on its central file-system. However, it should not be difficult to use (at least part of) this tutorial for other data-sets as well.
Activate the bidscoin environment and create a tutorial playground folder in your home directory by executing these bash commands:
module add bidscoin/1.4 source activate /opt/bidscoin cp /opt/bidscoin/tutorial ~
tutorialfolder contains a
rawsource-data folder and a
bids_refreference BIDS folder, i.e. the end product of this tutorial.
Let's begin with inspecting this new raw data collection:
- Use the
rawmapper.pycommand to print out the DICOM values of the "EchoTime", "Sex" and "AcquisitionDate" of the fMRI series in the
- Use the
Now that we have some data and have inspected its properties, we are ready to start with the actual BIDS coining process. The first step is to perform training on a few raw data samples:
- Put files (training data) in the right subfolders in this
- Create a
bids\code\samplesfoldertree in your
tutorialfolder with this bash command:
cd ~/tutorial bidstrainer.py bids
- Create a
bids/code/bidsmap_sample.yamlbidsmap file by re-running the above
- Inspect the newly created bidsmap file. Can you recognise the key-value mappings? Which fields are going to end up in the filenames of the final BIDS datasets?
- Put files (training data) in the right subfolders in this
Scan all folders in the raw data collection for unknown data by running the bidsmapper bash command:
bidsmapper.py raw bids
- Open the
bids/code/bidsmap.yamlfile and check the "extra_data" section for images that should go in the BIDS sections (e.g. T1, fMRI or DWI data). If so, add the missing training samples (check the messages in the command shell) to the
samplesfolder tree and rerun the
- In the
bids/code/bidsmap.yamlfile, rename the "task_label" of the functional scans into something more readable, e.g. "Reward" and "Stop"
- Add a search pattern to the IntendedFor field such that it will select your fMRI runs
- Change the options such that you will get non-zipped nifti data (i.e.
*.nii.gz) in your BIDS data collection
- Open the
Convert your raw data collection into a BIDS collection by running the bidscoiner bash command (note that the input is the same as for the bidsmapper):
bidscoiner.py raw bids
- Check your
bids/code/bidscoiner.logfile for any errors or warnings
- Compare the results in your
bids/sub-#subject folders with the in
bids_refreference result. Are the file and foldernames the same? Also check the json sidecar files of the fieldmaps. Do they have the right "EchoTime" and "IntendedFor" fields?
- What happens if you re-run the
bidscoiner.pycommand? Are the same subjects processed again? Re-run "sub-001".
- Inspect the
bids/participants.tsvfile and decide if it is ok.
- Update the
READMEfiles in your
- As a final step, run the bids-validator on your
~/bids_tutorialfolder. Are you completely ready now to share this dataset?
- Check your