# A quick review of a few concepts

## Talapas and OnDemand
Talapas is the high performance computing cluster at the University of Oregon. You can log in from a browser using "OnDemand". You do not need to be on a VPN to log in (unless you are out of the country). Instructions for OnDemand are here:  
https://hpcrcf.atlassian.net/wiki/spaces/TCP/pages/922746881/Open+OnDemand  
The two apps available are "Jupyter" and "Desktop". You can run interactive graphical jobs on the Desktop, and Jupyter notebooks under Jupyter. By running your analyses inside Jupyter notebooks, you can have a record of the steps of that analysis, plus the interactivity is useful when developing scripts.

All talapas users are organized into pirgs. You can see which pirgs you belong to by using the linux bash groups command. You can run a bash command in Jupyter notebook by preceding it with an exclamation point, or by using the %%bash magic. If you want to run multiple lines of shell commands, use the %%bash magic.

In [1]:
!groups

talapas lcni


You should see 'talapas' and your pirg. Your home directory does not have much space. Put your big files in your pirg directory, which is under /projects/{pirg}. You should have your own directory there, and a link to it under your home directory. For example, the directory /home/jolinda/lcni is a symbolic link to the directory /projects/lcni/jolinda. I put my analyzed data either into /projects/lcni/jolinda, or into /projects/lcni/shared.

In [2]:
! ls -la /home/jolinda/lcni

lrwxrwxrwx 1 root root 22 Apr  9 10:25 /home/jolinda/lcni -> /projects/lcni/jolinda


## LMOD modules
Most software on talapas is found in lmod modules. Use the load command to load the module you wish to use. You can either load the default module, or specify the version.

In [3]:
%%bash
module load fsl
echo $FSLDIR

/packages/fsl/5.0.10/install


In [4]:
%%bash
module load fsl/6.0.1
echo $FSLDIR

/packages/fsl/6.0.1/fsl


## DICOMS on talapas
DICOM files are transfered in real time from our scanner to talapas. The study comment field in the dicom file is parsed to find the pirg name, the primary investigator, and the project name. DICOMS are sorted into directories under /projects/lcni/dcm by pirg/PIname/project/subject_date_time/Series_number_description. For example:

In [5]:
!tree /projects/lcni/dcm/TalapasClass -d

/projects/lcni/dcm/TalapasClass
`-- Smith
    `-- TC
        |-- TC001_20180620_110428
        |   |-- Series_10_bold_DIFTIM_1.7mm
        |   |-- Series_11_bold_DIFTIM_1.7mm
        |   |-- Series_1_AAHead_Scout_32ch-head-coil
        |   |-- Series_2_AAHead_Scout_32ch-head-coil_MPR_sag
        |   |-- Series_3_AAHead_Scout_32ch-head-coil_MPR_cor
        |   |-- Series_4_AAHead_Scout_32ch-head-coil_MPR_tra
        |   |-- Series_5_mprage_p2
        |   |-- Series_6_bold_DIFTIM_1.7mm
        |   |-- Series_7_bold_DIFTIM_1.7mm
        |   |-- Series_8_se_epi_1.7mm_ap
        |   `-- Series_9_se_epi_1.7mm_pa
        `-- TC002_20180916_090647
            |-- Series_1_localizer
            |-- Series_2_mprage_p2
            `-- Series_3_SVC_1_bold_mb3_g2_2mm_te25

18 directories


TalapasClass is a "fake" pirg, with PI Smith, project TC, and two subjects (both phantoms by the way). Normally you should have read access only to the dicoms under your own pirg. All of you should be able to access the "TalapasClass" directory too.

## Python modules

Python modules are loaded with the import command. For example, the sys module:

In [6]:
import sys

Functions, classes, and variables defined in modules are accessed by modulename.functionname (or classname, or variable). For example, the variable sys.path includes the directories where python will look for modules:

In [7]:
sys.path

['/gpfs/projects/lcni/jolinda/shared/TalapasClass',
 '',
 '/projects/lcni/shared/python3/site-packages',
 '/projects/lcni/jolinda/shared/site-packages',
 '/packages/miniconda/20190102/envs/anaconda-tensorflow-cpu/lib/python36.zip',
 '/packages/miniconda/20190102/envs/anaconda-tensorflow-cpu/lib/python3.6',
 '/packages/miniconda/20190102/envs/anaconda-tensorflow-cpu/lib/python3.6/lib-dynload',
 '/home/jolinda/.local/lib/python3.6/site-packages',
 '/home/jolinda/.local/lib/python3.6/site-packages/pyperclip-1.6.0-py3.6.egg',
 '/packages/miniconda/20190102/envs/anaconda-tensorflow-cpu/lib/python3.6/site-packages',
 '/packages/miniconda/20190102/envs/anaconda-tensorflow-cpu/lib/python3.6/site-packages/IPython/extensions',
 '/gpfs/home/jolinda/.ipython']

If '/projects/lcni/jolinda/shared/site-packages' is NOT in your path, you can add it. You'll need it for the bids-conversion scripts we'll be calling.

In [8]:
if '/projects/lcni/jolinda/shared/site-packages' not in sys.path:
    sys.path.append('/projects/lcni/jolinda/shared/site-packages')

# BIDS
BIDS stands for Brain Imaging Data Structure. You can read more about it here: https://bids-specification.readthedocs.io/en/stable/. It's an extension of the NIFTI file format that adds a .json side car to each nifti file, and has rigidly defined requirements for file naming and organization. Here's an example dataset from our class data:

In [9]:
!tree /projects/lcni/jolinda/shared/bids_example

/projects/lcni/jolinda/shared/bids_example
|-- dataset_description.json
|-- participants.tsv
|-- sub-TC001
|   |-- ses-post
|   |   |-- anat
|   |   |   |-- sub-TC001_ses-post_acq-mprage_run-5_T1w.json
|   |   |   `-- sub-TC001_ses-post_acq-mprage_run-5_T1w.nii.gz
|   |   |-- fmap
|   |   |   |-- sub-TC001_ses-post_dir-ap_run-8_epi.json
|   |   |   |-- sub-TC001_ses-post_dir-ap_run-8_epi.nii.gz
|   |   |   |-- sub-TC001_ses-post_dir-pa_run-9_epi.json
|   |   |   `-- sub-TC001_ses-post_dir-pa_run-9_epi.nii.gz
|   |   `-- func
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-10_bold.json
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-10_bold.nii.gz
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-11_bold.json
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-11_bold.nii.gz
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-6_bold.json
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-6_bold.nii.gz
|   |       |-- sub-TC001_ses-post_task-DIFTIM_run-7_bold.json
|   |       `-- su

So what do we have? Two subjects, each with two sessions. (This is actually the same data converted twice, just to show the structure). Let's just look at the top level first:

In [10]:
!tree /projects/lcni/jolinda/shared/bids_example/  -L 1

/projects/lcni/jolinda/shared/bids_example/
|-- dataset_description.json
|-- participants.tsv
|-- sub-TC001
`-- sub-TC002

2 directories, 2 files


Each subject's data is in a directory called sub-{name}. At the top level, there is a dataset_description.json file and a participants.tsv file. Ideally there should be a readme file too. We can read the .tsv file easily enough:

In [11]:
import os
bidsdir = '/projects/lcni/jolinda/shared/bids_example/'
with open(os.path.join(bidsdir, 'participants.tsv')) as f:
    print(f.read())

participant_id	age	sex
sub-TC002	018	O
sub-TC001	012	O



This information was pulled straight from the dicom files. You may notice one problem already -- if this is a longitudinal dataset, then age won't be the same for our two sessions. You'd need to do something else -- like have two columns, age1 and age2. For any columns beyond a few standard ones (age, sex, group) you have to also include a participants.json file defining what the column names mean. What is a json file anyway? Let's look at the dataset_description.json file:

In [12]:
with open(os.path.join(bidsdir, 'dataset_description.json')) as f:
    print(f.read())

{"Name": "Talapas Class", "BIDSVersion": "1.3.0", "Authors": ["Jolinda Smith", "Fred Sabb"], "Acknowledgements": "BIDS conversion was performed using dcm2niix. Thanks to Jolinda Smith at LCNI at the University of Oregon for additional scripting of BIDS conversion.", "ReferencesAndLinks": ["Li X, Morgan PS, Ashburner J, Smith J, Rorden C (2016) The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 264:47-56. doi: 10.1016/j.jneumeth.2016.03.001."], "Talapas Class": "TC"}


A bunch of "field name": "value" pairs, but a bit hard to read. We can use the jq LMOD module to "pretty print" it. This is a good opportunity to remember how to pass a variable to %%bash too:

In [13]:
%%bash -s {bidsdir}
module load jq
jq . $1/dataset_description.json

{
  "Name": "Talapas Class",
  "BIDSVersion": "1.3.0",
  "Authors": [
    "Jolinda Smith",
    "Fred Sabb"
  ],
  "Acknowledgements": "BIDS conversion was performed using dcm2niix. Thanks to Jolinda Smith at LCNI at the University of Oregon for additional scripting of BIDS conversion.",
  "ReferencesAndLinks": [
    "Li X, Morgan PS, Ashburner J, Smith J, Rorden C (2016) The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 264:47-56. doi: 10.1016/j.jneumeth.2016.03.001."
  ],
  "Talapas Class": "TC"
}


The only required elements are Name and BIDSVersion. I added the rest to be helpful (you plan to acknowlege me, right?) Python has a .json module that makes it easy to edit these files.

In [14]:
import json
with open(os.path.join(bidsdir, 'dataset_description.json')) as f:
    dataset_json = json.load(f)

In [15]:
dataset_json

{'Name': 'Talapas Class',
 'BIDSVersion': '1.3.0',
 'Authors': ['Jolinda Smith', 'Fred Sabb'],
 'Acknowledgements': 'BIDS conversion was performed using dcm2niix. Thanks to Jolinda Smith at LCNI at the University of Oregon for additional scripting of BIDS conversion.',
 'ReferencesAndLinks': ['Li X, Morgan PS, Ashburner J, Smith J, Rorden C (2016) The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 264:47-56. doi: 10.1016/j.jneumeth.2016.03.001.'],
 'Talapas Class': 'TC'}

This is just a python 'dict', a set of key:value pairs. We can edit them:

In [16]:
dataset_json['Name'] = 'Talapas Class'

In [17]:
dataset_json['Authors'].append('Dina Canup')

In [18]:
dataset_json

{'Name': 'Talapas Class',
 'BIDSVersion': '1.3.0',
 'Authors': ['Jolinda Smith', 'Fred Sabb', 'Dina Canup'],
 'Acknowledgements': 'BIDS conversion was performed using dcm2niix. Thanks to Jolinda Smith at LCNI at the University of Oregon for additional scripting of BIDS conversion.',
 'ReferencesAndLinks': ['Li X, Morgan PS, Ashburner J, Smith J, Rorden C (2016) The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 264:47-56. doi: 10.1016/j.jneumeth.2016.03.001.'],
 'Talapas Class': 'TC'}

To write this to the file use json.dump and open the file with the 'w' flag.

In [19]:
with open(os.path.join(bidsdir, 'dataset_description.json'), 'w') as f:
    json.dump(dataset_json, f)

In [20]:
%%bash -s {bidsdir}
module load jq
jq . $1/dataset_description.json

{
  "Name": "Talapas Class",
  "BIDSVersion": "1.3.0",
  "Authors": [
    "Jolinda Smith",
    "Fred Sabb",
    "Dina Canup"
  ],
  "Acknowledgements": "BIDS conversion was performed using dcm2niix. Thanks to Jolinda Smith at LCNI at the University of Oregon for additional scripting of BIDS conversion.",
  "ReferencesAndLinks": [
    "Li X, Morgan PS, Ashburner J, Smith J, Rorden C (2016) The first step for neuroimaging data analysis: DICOM to NIfTI conversion. J Neurosci Methods. 264:47-56. doi: 10.1016/j.jneumeth.2016.03.001."
  ],
  "Talapas Class": "TC"
}


reverting back so my notebook stays consistent

In [23]:
dataset_json['Authors'].remove('Dina Canup')

dataset_json['Talapas Class'] = 'TC'

with open(os.path.join(bidsdir, 'dataset_description.json'), 'w') as f:
    json.dump(dataset_json, f)

Looking one more level deep into our directory structure:

In [24]:
!tree {bidsdir}  -L 2

/projects/lcni/jolinda/shared/bids_example/
|-- dataset_description.json
|-- participants.tsv
|-- sub-TC001
|   |-- ses-post
|   `-- ses-pre
`-- sub-TC002
    |-- ses-post
    `-- ses-pre

6 directories, 2 files


The next level is for sessions. If every subject only has one session, you can leave this level out. However, if even one subject has multiple sessions, you must include this level for every subject. You can name them whatever you like but the directory name must start with 'ses-'. Let's go one more level down.

In [25]:
!tree  {bidsdir}  -L 3

/projects/lcni/jolinda/shared/bids_example/
|-- dataset_description.json
|-- participants.tsv
|-- sub-TC001
|   |-- ses-post
|   |   |-- anat
|   |   |-- fmap
|   |   `-- func
|   `-- ses-pre
|       |-- anat
|       |-- fmap
|       `-- func
`-- sub-TC002
    |-- ses-post
    |   |-- anat
    |   `-- func
    `-- ses-pre
        |-- anat
        `-- func

16 directories, 2 files


The next level is the directories for different types of data. There are currently four for MR data: 'anat', 'fmap', 'func', and 'dwi'. Other datatypes are 'meg', 'eeg', 'ieeg', or 'beh'. Nothing else is allowed! Let's look at subject TC001's session 1 data:

In [26]:
!tree  {bidsdir}/sub-TC001/ses-pre

/projects/lcni/jolinda/shared/bids_example//sub-TC001/ses-pre
|-- anat
|   |-- sub-TC001_ses-pre_acq-mprage_run-5_T1w.json
|   `-- sub-TC001_ses-pre_acq-mprage_run-5_T1w.nii.gz
|-- fmap
|   |-- sub-TC001_ses-pre_dir-ap_run-8_epi.json
|   |-- sub-TC001_ses-pre_dir-ap_run-8_epi.nii.gz
|   |-- sub-TC001_ses-pre_dir-pa_run-9_epi.json
|   `-- sub-TC001_ses-pre_dir-pa_run-9_epi.nii.gz
`-- func
    |-- sub-TC001_ses-pre_task-DIFTIM_run-10_bold.json
    |-- sub-TC001_ses-pre_task-DIFTIM_run-10_bold.nii.gz
    |-- sub-TC001_ses-pre_task-DIFTIM_run-11_bold.json
    |-- sub-TC001_ses-pre_task-DIFTIM_run-11_bold.nii.gz
    |-- sub-TC001_ses-pre_task-DIFTIM_run-6_bold.json
    |-- sub-TC001_ses-pre_task-DIFTIM_run-6_bold.nii.gz
    |-- sub-TC001_ses-pre_task-DIFTIM_run-7_bold.json
    `-- sub-TC001_ses-pre_task-DIFTIM_run-7_bold.nii.gz

3 directories, 14 files


Filenames follow a tightly proscribed format: `sub-{subject}_ses-{session}_{label}-{value}_etc.nii.gz`  
More details about each of these labels, which are required, and which are optional, are in the entitiy table given here: https://bids-specification.readthedocs.io/en/stable/99-appendices/04-entity-table.html

We can use the bids-validator web site to check our data set. https://bids-standard.github.io/bids-validator/

# Conversion
MRIConvert/mcverter has, sadly, not kept up with BIDS. We are going to use dcm2niix instead. You should (almost) always use the latest version of your file converter, so check the available modules before loading. NB: I left this up since there's some useful information and I don't want to go back and edit it all, but most of the dicom2bids stuff is already out of date. Look at the next notebook on bids conversion for the latest methods.

In [27]:
%%bash
module load dcm2niix/1.0.20200331
dcm2niix

Chris Rorden's dcm2niiX version v1.0.20200331  (JP2:OpenJPEG) (JP-LS:CharLS) GCC5.5.0 (64-bit Linux)
usage: dcm2niix [options] <in_folder>
 Options :
  -1..-9 : gz compression level (1=fastest..9=smallest, default 6)
  -a : adjacent DICOMs (images from same series always in same folder) for faster conversion (n/y, default n)
  -b : BIDS sidecar (y/n/o [o=only: no NIfTI], default y)
   -ba : anonymize BIDS (y/n, default y)
  -c : comment stored in NIfTI aux_file (provide up to 24 characters)
  -d : directory search depth. Convert DICOMs in sub-folders of in_folder? (0..9, default 5)
  -e : export as NRRD instead of NIfTI (y/n, default n)
  -f : filename (%a=antenna (coil) name, %b=basename, %c=comments, %d=description, %e=echo number, %f=folder name, %i=ID of patient, %j=seriesInstanceUID, %k=studyInstanceUID, %m=manufacturer, %n=name of patient, %o=mediaObjectInstanceUID, %p=protocol, %r=instance number, %s=series number, %t=time, %u=acquisition number, %v=vendor, %x=study ID; %z=seque

Take a look at the default filename -- it's whatever I used the last time I called dcm2niix! We are going to use one of my helper modules to handle the details of the conversion, taking advantage of the nice structure of our dicom repository & our knowledge of dicom to fill in all the bids information. We'll wind up with one dcm2niix command for each series directory under each subject. We'll also use jq to add the task name to the .json files for the bold runs. This is all very specific to the LCNI way of doing things. If you are at another institution and would like to know more about our setup, drop me a line! The dicom2bids module is pretty new so please let me know if you run into problems or have suggestions.

In [28]:
import dicom2bids # my helper module. Make sure it's in your path. It should be if you followed directions in the review section.

In [29]:
dcmdir = '/projects/lcni/dcm/TalapasClass/Smith/TC'

In [30]:
# remember this?
!tree {dcmdir} -d

/projects/lcni/dcm/TalapasClass/Smith/TC
|-- TC001_20180620_110428
|   |-- Series_10_bold_DIFTIM_1.7mm
|   |-- Series_11_bold_DIFTIM_1.7mm
|   |-- Series_1_AAHead_Scout_32ch-head-coil
|   |-- Series_2_AAHead_Scout_32ch-head-coil_MPR_sag
|   |-- Series_3_AAHead_Scout_32ch-head-coil_MPR_cor
|   |-- Series_4_AAHead_Scout_32ch-head-coil_MPR_tra
|   |-- Series_5_mprage_p2
|   |-- Series_6_bold_DIFTIM_1.7mm
|   |-- Series_7_bold_DIFTIM_1.7mm
|   |-- Series_8_se_epi_1.7mm_ap
|   `-- Series_9_se_epi_1.7mm_pa
`-- TC002_20180916_090647
    |-- Series_1_localizer
    |-- Series_2_mprage_p2
    `-- Series_3_SVC_1_bold_mb3_g2_2mm_te25

16 directories


We are going to map series descriptions to bids information. dicom2bids has a convenience function for pulling out all the unique series names from a given project directory.

In [31]:
dicom2bids.GetSeriesNames(dcmdir)

{'AAHead_Scout_32ch-head-coil',
 'AAHead_Scout_32ch-head-coil_MPR_cor',
 'AAHead_Scout_32ch-head-coil_MPR_sag',
 'AAHead_Scout_32ch-head-coil_MPR_tra',
 'SVC_1_bold_mb3_g2_2mm_te25',
 'bold_DIFTIM_1.7mm',
 'localizer',
 'mprage_p2',
 'se_epi_1.7mm_ap',
 'se_epi_1.7mm_pa'}

We see that half are localizers, which we can ignore. We have two different bold series, an mprage, and some field maps. dicom2bids needs us to define a dictionary that maps from series name to bids info. It currently has SOME information about what is and isn't valid, but right now it's mostly up to you to check the entity table at https://bids-specification.readthedocs.io/en/stable/99-appendices/04-entity-table.html for what elements are required in your filenames.

We are going to map these series names to bids entities. We can see the different parts of a bids entity (as defined in dicom2bids) using the inspect module.You normally don't need to do this, this is only useful for times like this when documentation is not yet available, or if you are a bit lazy and don't want to look it up.

In [44]:
import inspect
inspect.signature(dicom2bids.entity)

<Signature (filetype, form, session=None, task=None, acq=None, phase_encoding=None, ce=None, rec=None)>

'form' and 'phase_encoding' are used instead of 'format' and 'dir' because 'format' and 'dir' are python keywords and can't be used as parameter names. These are all laid out in the entity table in the bids documentation. As you can see only the MR relevant entries are included.

You can use dicom2bids to list all the formats for a given filetype.

In [45]:
print(dicom2bids.formats['anat'])
print(dicom2bids.formats['func'])
print(dicom2bids.formats['dwi'])
print(dicom2bids.formats['fmap'])

['T1w', 'T2w', 'FLAIR', 'T1rho', 'T1map', 'T2map', 'T2star', 'FLASH', 'PD', 'PDmap', 'PDT2', 'inplaneT1', 'inplaneT2', 'angio', 'defacemask']
['bold', 'cbv', 'phase', 'sbref', 'events', 'physio', 'stim']
['dwi', 'bvec', 'bval']
['phasediff', 'phase1', 'phase2', 'magnitude1', 'magnitude2', 'magnitude', 'fieldmap', 'epi']


dicom2bids has a bids_dict class. We will use its 'add' method to create our mapping.

In [47]:
bd = dicom2bids.bids_dict()
bd.add('mprage_p2', filetype = 'anat', acq = 'mprage', form = 'T1w', session = 'pre')
bd.add('SVC_1_bold_mb3_g2_2mm_te25', filetype = 'func', task = 'SVC', form = 'bold', session = 'pre')
bd.add('bold_DIFTIM_1.7mm', filetype = 'func', form = 'bold', task = 'DIFTIM', session = 'pre')
bd.add('se_epi_1.7mm_ap', filetype = 'fmap', form = 'epi', phase_encoding ='ap', session = 'pre')
bd.add('se_epi_1.7mm_pa', filetype = 'fmap', form = 'epi', phase_encoding ='pa', session = 'pre')

For the dedicated student of python: the bids_dict class has custom __repr and __str methods so it looks nice.

In [48]:
bd

mprage_p2: filetype: anat, session: pre, task: None, acq: mprage, phase_encoding: None,ce: None, rec: None, form: T1w
SVC_1_bold_mb3_g2_2mm_te25: filetype: func, session: pre, task: SVC, acq: None, phase_encoding: None,ce: None, rec: None, form: bold
bold_DIFTIM_1.7mm: filetype: func, session: pre, task: DIFTIM, acq: None, phase_encoding: None,ce: None, rec: None, form: bold
se_epi_1.7mm_ap: filetype: fmap, session: pre, task: None, acq: None, phase_encoding: ap,ce: None, rec: None, form: epi
se_epi_1.7mm_pa: filetype: fmap, session: pre, task: None, acq: None, phase_encoding: pa,ce: None, rec: None, form: epi

In [49]:
print(bd)

mprage_p2: sub-{}_ses-pre_acq-mprage_run-{}_T1w
SVC_1_bold_mb3_g2_2mm_te25: sub-{}_ses-pre_task-SVC_run-{}_bold
bold_DIFTIM_1.7mm: sub-{}_ses-pre_task-DIFTIM_run-{}_bold
se_epi_1.7mm_ap: sub-{}_ses-pre_dir-ap_run-{}_epi
se_epi_1.7mm_pa: sub-{}_ses-pre_dir-pa_run-{}_epi



The __str method is what's called by print(), and it returns the format string that will be used for the filenames. You can see it's left a placeholder for the subject name and the run number. Run number is not required by bids, and is meant for cases when there are multiple runs of a given sequence. In our example, that's wpuld be the four DIFTIM runs, and you might expect them to be labelled 'run-1', 'run-2', etc. However, dicom2bids ALWAYS includes it and uses the series number as the run number. Why? Because scan sessions don't always play out perfectly. Sometimes you have a session with two mprages, or an interupted functional run because of stimulus problems. This approach guarantees that all runs will be converted with unique names. Fortunately the bids specification does not insist on what the run numbers should be, other than that they should be numeric.

### An aside: what if the information in the dicom file is incorrect?  
I will guarantee you that you will have subjects entered with the wrong name, birth year, or other info. LCNI frequently gets requests to correct the dicom files in the repository, and we generally are willing to do so. Yes, it makes your scripts run easier! However, there are very good reasons to leave the dicom information alone. Here are some.
1. You risk someone making a SECOND mistake
2. If only the directory name is changed, and not the dicom file itself, you have a conflict
3. If only the talapas repository is changed, but not the backup repository, now they don't match
4. If the dicom gets resent to the repository, you'll now have one dataset appearing twice in the repository under two different subject names
5. You shouldn't change data without a good reason  
  
However, I get it. It's very convenient to have your scripts run smoothly, and it's a huge pain in the neck to have to keep track of subjects' correct ids. But please consider correcting it AFTER conversion. Of course, in that case, you have to be careful not to convert the same subject again because you forgot that the name changed. Whatever you decide, this is just one of those things you have to deal with when working with real-life messy data. Keep good records.

Moving on, let's define a new output directory for our bids output. Use something in your own pirg if you are following along, you don't have write access to my shared directory.

In [50]:
bidsdir = '/projects/lcni/jolinda/shared/bids_example2' 

The method we are going to use is "convert". It has three required arguments and two optional ones:  
subjectdir: directory of the subject we want to convert  
bidsdir: top level directory for output  
bids_dict: the bids dictionary we defined above  
submit: whether or not to submit the job to slurm (default True)  
participant_file: whether to create and/or append to a participants.tsv file with information from the dicom files (default True, ignored if submit = False)
Other important information: if there is no dataset_description.json file, it will create one if submit = True. The return value is the command that is or would be submitted to slurm (it won't include anything about the dataset_description or participant file creation, because that part is not submitted to the job manager).

In [54]:
inspect.signature(dicom2bids.convert)

<Signature (subjectdir, bidsdir, bids_dict, submit=True, participant_file=True)>

"But Jolinda," you may be thinking, "I have two hundred subjects and I want to submit all of them at once! Why can't I specify the top level directory?" Well, we want to split the conversion into separate bits so they can run in parallel on slurm. So we will iterate through each subject directory, and send those to the dicom2bids.convert routine. 

In [65]:
for subjectdir in os.listdir(dcmdir):
    print(dicom2bids.convert(os.path.join(dcmdir, subjectdir), bidsdir, bd, submit = False))

module load dcm2niix/1.0.20200331
module load jq
dcm2niix -ba n -l o -o /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/anat -f sub-TC002_ses-pre_acq-mprage_run-2_T1w /projects/lcni/dcm/TalapasClass/Smith/TC/TC002_20180916_090647/Series_2_mprage_p2
dcm2niix -ba n -l o -o /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/func -f sub-TC002_ses-pre_task-SVC_run-3_bold /projects/lcni/dcm/TalapasClass/Smith/TC/TC002_20180916_090647/Series_3_SVC_1_bold_mb3_g2_2mm_te25
jq '.TaskName="SVC"' /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/func/sub-TC002_ses-pre_task-SVC_run-3_bold.json > /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/func/sub-TC002_ses-pre_task-SVC_run-3_bold.json

module load dcm2niix/1.0.20200331
module load jq
dcm2niix -ba n -l o -o /projects/lcni/jolinda/shared/bids_example2/sub-TC001/ses-pre/fmap -f sub-TC001_ses-pre_dir-pa_run-9_epi /projects/lcni/dcm/TalapasClass/Smith/TC/TC001_20180620_110428/Series_9_se_epi_1.7mm_p

We see three steps: loading the modules, converting all series for that subject that were found in bids_dict, then editing of the .json file for any task runs to add the required task name. One important thing to note in the dcm2niix command: the parameter -l o. This means "don't change the datatype for the dicoms during conversion" and IS NOT THE DEFAULT. You DO want it, especially for se-epi type fieldmaps created at LCNI using the CMRR multiband sequence. It probably won't matter for other LCNI sequences (I can't make any promises for data collected elsewhere). This is a holdover from the old Analyze days, which didn't have "unsigned 16 bit integer" as a possible data type. Most MR dicom data is stored as unsigned 16 bit integers, but only uses 12 bits. Therefore you could just call unsigned signed without any problems (the highest bit, the sign bit, will always be zero). Nifti can handle unsigned data just fine, but various analysis software took a while to catch up, so dcm2niix defaulted to the 'old' behavior because it didn't really matter. However, dicom has loosened up the MR standard over the years and newer sequences can use those extra four bits for higher dynamic range. If you use all 16 bits, you can't just call it signed data. In our sequences we've only seen it happen with the se-epi field map sequences, but it's best to always use -l o. Warnings in dcm2niix indicate that it will become the default behavior in future releases. More specifically, the behavior of dcmniix is:  

| input file                              | output (-l o)   | output (-l n)   |
|-----------------------------------------|-----------------|-----------------|
|  12 bit unsigned                        | 16 bit signed   | 16 bit signed   |
|  16 bit unsigned, max intensity < 32768 | 16 bit unsigned | 16 bit signed (with warning)  |
|  16 bit unsigned, max intensity > 32768 | 16 bit unsigned | 16 bit unsigned |

What this tells us is that any individual file will be fine no matter what you choose. However, you could find yourself in the situation where some files are defined as signed and some as unsigned, depending on the maximum intensity. For consistency, it's best to use -l o. Older versions of dcm2niix lack this option and are equivalent to -l n. The current release of MRIConvert/mcverter is equivalent to -l n as well.


The other option differing from the default is 'ba -n'. This means: generate the bids sidecar (.json file), but don't anonymize it. If you choose the default option (generate anonymized json), the json file will not have the following parameters: SeriesInstanceUID, StudyInstanceUID, StudyID, PatientName, PatientID, PatientBirthDate, PatientSex, PatientWeight, AcquisitionDateTime. In that case the only information you have on the participant will be the subject name in the filename and anything you enter in the participants.tsv file (and your own notes of course), and you'll lose the information that links the nifti back to the original dicom. I recommend NOT anonymizing at this stage; it's easy enough to anonymize the .json files later if you need to and we don't use real names or full birthdates anyway.

In [66]:
for subjectdir in os.listdir(dcmdir):
    dicom2bids.convert(os.path.join(dcmdir, subjectdir), bidsdir, bd, submit = True)

Submitted batch job 11558998

Submitted batch job 11558999



We'll talk more about slurm jobs next week. We can use the sacct command in bash to see the progress of our jobs.

In [69]:
!sacct -j 11558999

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11558999        convert      short       lcni          1    RUNNING      0:0 
11558999.ba+      batch                  lcni          1    RUNNING      0:0 
11558999.ex+     extern                  lcni          1    RUNNING      0:0 


In [73]:
!sacct -j 11558998

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11558998        convert      short       lcni          1  COMPLETED      0:0 
11558998.ba+      batch                  lcni          1  COMPLETED      0:0 
11558998.ex+     extern                  lcni          1  COMPLETED      0:0 


We can check the output of our commands in the slurm-{}.out files.

In [72]:
!cat slurm-11558998.out

Chris Rorden's dcm2niiX version v1.0.20200331  (JP2:OpenJPEG) (JP-LS:CharLS) GCC5.5.0 (64-bit Linux)
Found 176 DICOM file(s)
Convert 176 DICOM as /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/anat/sub-TC002_ses-pre_acq-mprage_run-2_T1w (256x256x176x1)
Compress: "/bin/pigz" -b 960 -n -f -6 "/projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/anat/sub-TC002_ses-pre_acq-mprage_run-2_T1w.nii"
Conversion required 2.571868 seconds (0.370000 for core code).
Chris Rorden's dcm2niiX version v1.0.20200331  (JP2:OpenJPEG) (JP-LS:CharLS) GCC5.5.0 (64-bit Linux)
Found 180 DICOM file(s)
Convert 180 DICOM as /projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/func/sub-TC002_ses-pre_task-SVC_run-3_bold (104x104x72x180)
Compress: "/bin/pigz" -b 960 -n -f -6 "/projects/lcni/jolinda/shared/bids_example2/sub-TC002/ses-pre/func/sub-TC002_ses-pre_task-SVC_run-3_bold.nii"
Conversion required 22.936356 seconds (0.830000 for core code).


Go forth and inspect that directory! Load it into the bids validator. Try converting your own data. Next week we'll talk about slurm, and get to know my favorite helper module I've written: slurmpy.