Skip to content

Commit

Permalink
Merge pull request #28 from teonbrooks/filesplit
Browse files Browse the repository at this point in the history
[MRG] split intro, commons, mr, and meg into folder from specification.md
  • Loading branch information
chrisgorgo committed Oct 3, 2018
2 parents 4de6983 + eb46532 commit 45985e3
Show file tree
Hide file tree
Showing 14 changed files with 1,919 additions and 0 deletions.
156 changes: 156 additions & 0 deletions src/01_introduction.md

Large diffs are not rendered by default.

201 changes: 201 additions & 0 deletions src/02_common_principles.md
@@ -0,0 +1,201 @@
Common principles
=================

The Inheritance Principle
-------------------------

Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level, but no more than one applicable file may be defined at a given level (Example 1). The values from the top level are inherited by all lower levels unless they are overridden by a file at the lower level. For example, `sub-*_task-rest_bold.json` may be specified at the participant level, setting TR to a specific value. If one of the runs has a different TR than the one specified in that file, another `sub-*_task-rest_bold.json` file can be placed within that specific series directory specifying the TR for that specific run.
There is no notion of "unsetting" a key/value pair. For example if there is a JSON file corresponding to particular participant/run defining a key/value and there is a JSON file on the root level of the dataset that does not define this key/value it will not be "unset" for all subjects/runs.
Files for a particular participant can exist only at participant level directory, i.e
`/dataset/sub-*[/ses-*]/sub-*_T1w.json`. Similarly, any file that is not specific to a participant is to be declared only at top level of dataset for eg: `task-sist_bold.json` must be placed under `/dataset/task-sist_bold.json`

Example 1: Two JSON files at same level that are applicable for NIfTI
file.

```
sub-01/
ses-test/
sub-test_task-overtverbgeneration_bold.json
sub-test_task-overtverbgeneration_run-2_bold.json
anat/
sub-01_ses-test_T1w.nii.gz
func/
sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz
sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz
```

In the above example, two JSON files are listed under
`sub-01/ses-test/`, which are each applicable to `sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz`, violating the constraint that no more than one file may be defined at a given level of the directory structure. Instead `task-overtverbgeneration_run-2_bold.json` should have been under `sub-01/ses-test/func/`.

Example 2: Multiple run and rec with same acquisition (acq) parameters acq-test1

```
sub-01/
anat/
func/
sub-01_task-xyz_acq-test1_run-1_bold.nii.gz
sub-01_task-xyz_acq-test1_run-2_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon2_bold.nii.gz
sub-01_task-xyz_acq-test1_bold.json
```

For the above example, all NIfTI files are acquired with same scanning parameters (`acq-test1`). Hence a JSON file describing the acq parameters will apply to different runs and rec files. Also if the JSON file (`task-xyz_acq-test1_bold.json`) is defined at dataset top level directory, it will be applicable to all task runs with `test1` acquisition parameter.

Case 2: Multiple json files at different levels for same task and acquisition parameters
```
sub-01/
sub-01_task-xyz_acq-test1_bold.json
anat/
func/
sub-01_task-xyz_acq-test1_run-1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon1_bold.nii.gz
sub-01_task-xyz_acq-test1_rec-recon2_bold.nii.gz
```

In the above example, the fields from `task-xyz_acq-test1_bold.json` file will apply to all bold runs. However, if there is a key with different value in `sub-01/func/sub-01_task-xyz_acq-test1_run-1_bold.json`,
the new value will be applicable for that particular run/task NIfTI
file/s.

File Formation specification
----------------------------

### Imaging files

All imaging data MUST be stored using the NIfTI file format. We RECOMMEND using compressed NIfTI files (.nii.gz), either version 1.0 or 2.0. Imaging data SHOULD be converted to the NIfTI format using a tool that provides as much of the NIfTI header information (such as orientation and slice timing information) as possible. Since the NIfTI standard offers limited support for the various image acquisition parameters available in DICOM files, we RECOMMEND that users provide additional meta information extracted from DICOM files in a sidecar JSON file (with the same filename as the `.nii[.gz]` file, but with a `.json` extension). Extraction of BIDS
compatible metadata can be performed using dcm2nii [https://www.nitrc.org/projects/dcm2nii/](https://www.nitrc.org/projects/dcm2nii/) and dicm2nii [http://www.mathworks.com/matlabcentral/fileexchange/42997-dicom-to-nifti-converter/content/dicm2nii.m](http://www.mathworks.com/matlabcentral/fileexchange/42997-dicom-to-nifti-converter/content/dicm2nii.m) DICOM to NIfTI converters. A provided validator[https://github.com/INCF/bids-validator](https://github.com/INCF/bids-validator) will
check for conflicts between the JSON file and the data recorded in the
NIfTI header.

### Tabular files

Tabular data MUST be saved as tab delimited values (`.tsv`) files, i.e. csv files where commas are replaced by tabs. Tabs MUST be true tab characters and MUST NOT be a series of space characters. Each TSV file MUST start with a header line listing the names of all columns (with the exception of physiological and other continuous acquisition data - see below for details). Names MUST be separated with tabs. String values containing tabs MUST be escaped using double quotes. Missing and non-applicable values MUST be coded as `n/a`.

#### Example:
```
onset duration response_time correct stop_trial go_trial
200 200 0 n/a n/a n/a
```

Tabular files MAY be optionally accompanied by a simple data dictionary in a JSON format (see below). The data dictionaries MUST have the same name as their corresponding tabular files but with `.json` extensions. Each entry in the data dictionary has a name corresponding to a column name and the following fields:

| Field name | Definition |
|:------------|:---------------------------------------------------------------|
| LongName | Long (unabbreviated) name of the column. |
| Description | Description of the column. |
| Levels | For categorical variables: a dictionary of possible values (keys) and their descriptions (values). |
| Units | Measurement units. `[<prefix symbol>] <unit symbol>` format following the SI standard is RECOMMENDED (see Appendix V). |
| TermURL | URL pointing to a formal definition of this type of data in an ontology available on the web. |

#### Example:

```JSON
{
"test": {
"LongName": "Education level",
"Description": "Education level, self-rated by participant",
"Levels": {
"1": "Finished primary school",
"2": "Finished secondary school",
"3": "Student at university",
"4": "Has degree from university"
}
},
"bmi": {
"LongName": "Body mass index",
"Units": "kilograms per squared meters",
"TermURL": "http://purl.bioontology.org/ontology/SNOMEDCT/60621009"
}
}
```

Key/value files (dictionaries)
------------------------------

JavaScript Object Notation (JSON) files MUST be used for storing key/value pairs. Extensive documentation of the format can be found here: [http://json.org/](http://json.org/). Several editors have built-in support for JSON syntax highlighting that aids manual creation of such files. An online editor for JSON with built-in validation is available at: [http://jsoneditoronline.org](http://jsoneditoronline.org). JSON
files MUST be in UTF-8 encoding.

### Example:
```JSON
{
"RepetitionTime": 3,
"Instruction": "Lie still and keep your eyes open"
}
```

Participant names and other labels
----------------------------------

BIDS uses custom user-defined labels in several situations (naming of participants, sessions, acquisition schemes, etc.) Labels are strings and MUST only consist of letters (lower or upper case) and/or numbers. If numbers are used we RECOMMEND zero padding (e.g., `01` instead of `1` if you have more than nine subjects) to make alphabetical sorting more intuitive. Please note that the sub- prefix is not part of the subject label, but must be included in file names (similarly to other key names).
In contrast to other labels, run and echo labels MUST be integers. Those labels MAY include zero padding, but this is NOT RECOMMENDED to maintain their uniqueness.

Units
-----

All units SHOULD be specified as per International System of Units (abbreviated as SI, from the French Système international (d'unités)) and can be SI units or SI derived units. In case there are valid reasons to deviate from SI units or SI derived units, the units MUST be specified in the sidecar JSON file. In case data is expressed in SI units or SI derived units, the units MAY be specified in the sidecar JSON file. In case prefixes are added to SI or non-SI units (e.g. mm), the prefixed units MUST be specified in the JSON file (see Appendix V: Units). In particular:

- Elapsed time SHOULD be expressed in seconds. Please note that some DICOM parameters have been traditionally expressed in milliseconds. Those need to be converted to seconds.
- Frequency SHOULD be expressed in Hertz.

Describing dates and timestamps:

- Date time information MUST be expressed in the following format `YYYY-MM-DDThh:mm:ss` (one of the [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) date-time formats). For example: `2009-06-15T13:45:30`
- Time stamp information MUST be expressed in the following format: `13:45:30`
- Dates can be shifted by a random number of days for privacy protection reasons. To distinguish real dates from shifted dates always use year 1900 or earlier when including shifted years. For longitudinal studies please remember to shift dates within one subject by the same number of days to maintain the interval information. Example: `1867-06-15T13:45:30`
- Age SHOULD be given as the number of years since birth at the time of scanning (or first scan in case of multi session datasets). Using higher accuracy (weeks) should in general be avoided due to privacy protection, unless when appropriate given the study goals, e.g., when scanning babies.

Directory structure
-------------------

### Single session example

This is an example of the folder and file structure. Because there is only one session, the session level is not required by the
format. For details on individual files see descriptions in the next
section:

```
sub-control01/
anat/
sub-control01_T1w.nii.gz
sub-control01_T1w.json
sub-control01_T2w.nii.gz
sub-control01_T2w.json
func/
sub-control01_task-nback_bold.nii.gz
sub-control01_task-nback_bold.json
sub-control01_task-nback_events.tsv
sub-control01_task-nback_physio.tsv.gz
sub-control01_task-nback_physio.json
sub-control01_task-nback_sbref.nii.gz
dwi/
sub-control01_dwi.nii.gz
sub-control01_dwi.bval
sub-control01_dwi.bvec
fmap/
sub-control01_phasediff.nii.gz
sub-control01_phasediff.json
sub-control01_magnitude1.nii.gz
sub-control01_scans.tsv
code/
deface.py
derivatives/
README
participants.tsv
dataset_description.json
CHANGES
```

Additional files and folders containing raw data may be added as needed for special cases. They should be named using all lowercase with a name that reflects the nature of the scan (e.g., `calibration`). Naming of files within the directory should follow the same scheme as above (e.g., `sub-control01_calibration_Xcalibration.nii.gz`)

### Code

Template:
`code/*`

Source code of scripts that were used to prepare the dataset (for example if it was anonymized or defaced) MAY be stored here.<sup>1</sup> Extra care should be taken to avoid including original IDs or any identifiable information with the source code. There are no limitations or recommendations on the language and/or code organization of these scripts at the moment.

<sup>1</sup>Storing actual source files with the data
is preferred over links to external source repositories to maximize long
term preservation (which would suffer if an external repository would
not be available anymore).
143 changes: 143 additions & 0 deletions src/03_modality_agnostic_files.md
@@ -0,0 +1,143 @@
Modality-agnostic files
=======================

Dataset description
-------------------

Template: `dataset_description.json` `README` `CHANGES`

### `dataset_description.json`

The file dataset_description.json is a JSON file describing the dataset. Every dataset MUST include this file with the following fields:

| Field name | Definition |
|:-------------------|:--------------------------------------------------------|
| Name | REQUIRED. Name of the dataset. |
| BIDSVersion | REQUIRED. The version of the BIDS standard that was used. |
| License | RECOMMENDED. What license is this dataset distributed under? The use of license name abbreviations is suggested for specifying a license. A list of common licenses with suggested abbreviations can be found in Appendix II. |
| Authors | OPTIONAL. List of individuals who contributed to the creation/curation of the dataset. |
| Acknowledgements | OPTIONAL. Text acknowledging contributions of individuals or institutions beyond those listed in Authors or Funding. |
| HowToAcknowledge | OPTIONAL. Instructions how researchers using this dataset should acknowledge the original authors. This field can also be used to define a publication that should be cited in publications that use the dataset. |
| Funding | OPTIONAL. List of sources of funding (grant numbers) |
| ReferencesAndLinks | OPTIONAL. List of references to publication that contain information on the dataset, or links. |
| DatasetDOI | OPTIONAL. The Document Object Identifier of the dataset (not the corresponding paper). |

Example:

```JSON
{
"Name": "The mother of all experiments",
"BIDSVersion": "1.0.1",
"License": "CC0",
"Authors": [
"Paul Broca",
"Carl Wernicke"
],
"Acknowledgements": "Special thanks to Korbinian Brodmann for help in formatting this dataset in BIDS. We thank Alan Lloyd Hodgkin and Andrew Huxley for helpful comments and discussions about the experiment and manuscript; Hermann Ludwig Helmholtz for administrative support; and Claudius Galenus for providing data for the medial-to-lateral index analysis.",
"HowToAcknowledge": "Please cite this paper: https://www.ncbi.nlm.nih.gov/pubmed/001012092119281",
"Funding": [
"National Institute of Neuroscience Grant F378236MFH1",
"National Institute of Neuroscience Grant 5RMZ0023106"
],
"ReferencesAndLinks": [
"https://www.ncbi.nlm.nih.gov/pubmed/001012092119281",
"Alzheimer A., & Kraepelin, E. (2015). Neural correlates of presenile dementia in humans. Journal of Neuroscientific Data, 2, 234001. http://doi.org/1920.8/jndata.2015.7"
],
"DatasetDOI": "10.0.2.3/dfjj.10"
}
```

### `README`

In addition a free form text file (`README`) describing the dataset in more details SHOULD be provided.

### `CHANGES`

Version history of the dataset (describing changes, updates and corrections) MAY be provided in the form of a `CHANGES` text file. This file MUST follow the CPAN Changelog convention: [http://search.cpan.org/~haarg/CPAN-Changes-0.400002/lib/CPAN/Changes/Spec.pod](https://metacpan.org/pod/release/HAARG/CPAN-Changes-0.400002/lib/CPAN/Changes/Spec.pod). `README` and `CHANGES` files MUST be either in ASCII or UTF-8 encoding.

Example:

```
1.0.1 2015-08-27
- Fixed slice timing information.
1.0.0 2015-08-17
- Initial release.
```

Participants file
-----------------

Template:
```
participants.tsv
participants.json
phenotype/<measurement_tool_name>.tsv
phenotype/<measurement_tool_name>.json
```

Optional: Yes

The purpose of this file is to describe properties of participants such as age, handedness, sex, etc. In case of single session studies this file has one compulsory column `participant_id` that consists of `sub-<participant_label>`, followed by a list of optional columns describing participants. Each participant needs to be described by one and only one row.

`participants.tsv` example:
```
participant_id age sex group
sub-control01 34 M control
sub-control02 12 F control
sub-patient01 33 F patient
```

If the dataset includes multiple sets of participant level measurements (for example responses from multiple questionnaires) they can be split into individual files separate from `participants.tsv`. Those measurements should be kept in phenotype/ folder and end with the `.tsv` extension. They can include arbitrary set of columns, but one of them has to be participant_id with matching `sub-<participant_label>`.
As with all other tabular data, those additional phenotypic information files can be accompanied by a JSON file describing the columns in detail (see Section 4.2). In addition to the column description, a section describing the measurement tool (as a whole) can be added under the name `MeasurementToolMetadata`. This section consists of two keys: `Description` - a free text description of the tool, and `TermURL` a link to an entity in an ontology corresponding to this tool. For example (content of phenotype/acds_adult.json):
```JSON
{
"MeasurementToolMetadata": {
"Description": "Adult ADHD Clinical Diagnostic Scale V1.2",
"TermURL": "http://www.cognitiveatlas.org/task/id/trm_5586ff878155d"
},
"adhd_b": {
"Description": "B. CHILDHOOD ONSET OF ADHD (PRIOR TO AGE 7)",
"Levels": {
"1": "YES",
"2": "NO"
}
},
"adhd_c_dx": {
"Description": "As child met A, B, C, D, E and F diagnostic criteria",
"Levels": {
"1": "YES",
"2": "NO"
}
}
}
```

Please note that in this example `MeasurementToolMetadata` includes
information about the questionnaire and `adhd_b` and `adhd_c_dx`
correspond to individual columns.

In addition to the keys available to describe columns in all tabular files (`LongName`, `Description`, `Levels`, `Units`, and `TermURL`) the `participants.json` file as well as phenotypic files can also include column descriptions with `Derivative` field that, when set to true, indicates that values in the corresponding column is a transformation of values from other columns (for example a summary score based on a subset of items in a questionnaire).

Scans file
----------

Template:
```
sub-<participant_label>/[ses-<session_label>/]
sub-<participant_label>[_ses-<session_label>]_scans.tsv
```

Optional: Yes

The purpose of this file is to describe timing and other properties of each imaging acquisition sequence (each run `.nii[.gz]` file) within one session. Each `.nii[.gz]` file should be described by at most one row. Relative paths to files should be used under a compulsory `filename` header.
If acquisition time is included it should be under `acq_time` header. Datetime should be expressed in the following format `2009-06-15T13:45:30` (year, month, day, hour (24h), minute, second; this is equivalent to the RFC3339 "date-time" format, time zone is always assumed as local time). For anonymization purposes all dates within one subject should be shifted by a randomly chosen (but common across all runs etc.) number of days. This way relative timing would be preserved, but chances of identifying a person based on the date and time of their scan would be decreased. Dates that are shifted for anonymization purposes should be set to a year 1900 or earlier to clearly distinguish them from unmodified data. Shifting dates is recommended, but not required.

Additional fields can include external behavioural measures relevant to the scan. For example vigilance questionnaire score administered after a resting state scan.

Example:
```
filename acq_time
func/sub-control01_task-nback_bold.nii.gz 1877-06-15T13:45:30
func/sub-control01_task-motor_bold.nii.gz 1877-06-15T13:55:33
```

0 comments on commit 45985e3

Please sign in to comment.