# Converting DICOMS to BIDS on talapas

In this tutorial we'll use mrpyconvert and dcm2niix to convert some dicom files to BIDS format on talapas. mrpyconvert takes advantage of the way DICOMS are organized subject & series, and generates scripts that call dcm2niix.
 

## Jupyter notebook on talapas
As of this writing, the default python on talapas is out of date. This will hopefully change, but for now I recommend using an alternate conda environment. You can do that on onDemand by choosing the server Jupyter Notebook (Python3/TensorFlow/PyTorch), and the alternate conda environment jupyterlab-tf-plus-20220927.

## Installing mrpyconvert
The source code for mrpyconvert is on github at: https://github.com/Jolinda/mrpyconvert.git. You can install it on using pip. If you are in a jupyter notebook, you can run bash commands by starting the cell with %%bash. Install mrpyconvert like this:

In [12]:
%%bash
python3 -m pip install --upgrade --user mrpyconvert

Collecting mrpyconvert
  Using cached mrpyconvert-0.1.18-py3-none-any.whl (8.4 kB)
Collecting dcm2niix<2.0.0,>=1.0.2022
  Using cached dcm2niix-1.0.20220715.tar.gz (451 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting miutil[web]
  Using cached miutil-0.12.0-py3-none-any.whl (18 kB)
Building wheels for collected packages: dcm2niix
  Building wheel for dcm2niix (pyproject.toml): started
  Building wheel for dcm2niix (pyproject.toml): finished with status 'done'
  Created wheel for dcm2niix: filename=dcm2niix-1.0.20220715-cp310-cp310-linux_x86_64.whl size=583178 sha256=170a607556881b02f37ff5708c0aab6f12f2ca945e7c93318ff6e32716f3504b
  Stored in directory: /gpfs/home/jolinda/.cache/pip/wheels/88/8d

This installs it locally for your use only, and will upgrade it if it is already installed. This will also install a local version of dcm2niix if one is not found. If you are on talapas, this version will likely be more current than any available as environment modules. 

You may get some warnings that scripts have been installed in a path that is not on PATH. There should be a file named .bashrc in your home directory on talapas. Edit it (eg, nano .bashrc) and add the line:
```
export PATH=$PATH:$HOME/.local/bin
```
At one point .bashrc did not get sourced for ondemand sessions -- I don't know if this is still true, but you may also need to create a file called .bashrc-ondemand in your home directory. That file should simply source .bashrc:
```
if [ f .bashrc ]; then
  . .bashrc
fi
```
Make sure that's in the .bashrc-ondemand file and not the .bashrc file, otherwise you'll get stuck in an infinite loop. You'll need to start a new session for those changes to take effect.

### jq
Some actions of mrpyconvert require the tool jq for editing json files on the command line. If you are on talapas, you can load it via an environment module (I'll show examples of how to include this). If you are running this somewhere else, get jq at https://jqlang.github.io/jq/.

## Example 1: single subject

For this tutorial we'll be using some data from the LCNI repository. This data is available to all University of Oregon researchers.

In [1]:
!ls /projects/lcni/dcm/repository/

AEPET2	DEV  DIPPER  REV_examples  Round_Robin


For our first example, we'll use the DIPPER study. First we're going to create some paths to the directories we're going to use. Substitute your own pirg for "lcni" in the output path. If you aren't familiar with the pathlib module, I recommend taking a moment to familiarize yourself with it.

In [2]:
import pathlib
repo = pathlib.Path('/projects/lcni/dcm/repository/')
dipper_path = repo / 'DIPPER'
output_path = pathlib.Path.home() / 'lcni' / 'bids_tutorial'

Next we need to create a Converter object and set the output path.

In [3]:
import mrpyconvert
converter = mrpyconvert.Converter()
converter.set_bids_path(output_path / 'dipper')

We can use the converter to inspect the contents of the input directory before we add the files to it.

In [4]:
converter.inspect(dipper_path)

1 study for 1 subject found.
Subjects: DIPPER_007
AAHead_Scout_32ch-head-coil
AAHead_Scout_32ch-head-coil_MPR_cor
AAHead_Scout_32ch-head-coil_MPR_sag
AAHead_Scout_32ch-head-coil_MPR_tra
DIPPER_1
DIPPER_2
DIPPER_3
DIPPER_4
DIPPER_5
DIPPER_6
DIPPER_7
DIPPER_8
DIPPER_9
mprage_p2_ND_defaced
mprage_p2_defaced
se_epi_mb3_g2_2mm_ap
se_epi_mb3_g2_2mm_pa


We have a single subject, with several functional runs, a structural, and an AP/PA epi pair for fieldmap generation. It's helpful to have an idea of what the final BIDS output should look like, and it should have filenames that look something like this:
```
anat/sub-007_T1w
fmap/sub-007_dir-AP_epi
fmap/sub-007_dir-PA_epi
func/sub-007_task-dipper_run-1_bold
func/sub-007_task-dipper_run-2_bold
etc
```
 
We can go ahead and add all the dicoms to the converter.

In [5]:
converter.add_dicoms(dipper_path)

mrpyconvert creates a mapping of series directories to final BIDS output. Each of these mappings is referred to as an "entry". We use the add_entry command to define them:
```
converter.add_entry(name, search, datatype, suffix, chain, json_entries)
```
name: a descriptive name (eg, 'mprage') that will also be the name of the generated script 
search: string to search for in the series description. Regular expressions are allowed.  
datatype: BIDS data type ('anat', 'func', etc)  
suffix: BIDS suffix ('T1w', 'bold', etc)  
chain: dictionary of anything else that needs to go into the file name ({'run':'1', 'dir':'AP'}, etc)  
json_fields: dictionary of additional fields that need to go into the .json sidecar file
  
We will start with the mprage

In [6]:
converter.add_entry('mprage', search='mprage_p2_defaced', datatype='anat', suffix='T1w')

By default, mrpyconvert will use the Patient Name frome the dicom file for the subject name: sub-{Patient Name}. In this case, there's a problem: the subject name has an underscore, and that's not allowed. So we need to set the subject name to something else, such as "007" or "dipper007". We do that with a dictionary.

In [7]:
converter.set_names({'DIPPER_007':'007'})

For a simple conversion like this, we can just tell the converter to convert everything. For bigger jobs you'll want to submit it to the grid. We'll come back to that later

In [8]:
converter.convert()

Converting mprage


What does our output look like?

In [9]:
!tree {output_path}/dipper

/home/jolinda/lcni/bids_tutorial/dipper
`-- sub-007
    `-- anat
        |-- sub-007_T1w.json
        `-- sub-007_T1w.nii.gz

2 directories, 2 files


To add the field maps, we'll need to use the 'chain' and 'json_fields' keyword arguments. Field maps require an identifier that we'll refer to in the functional .json files to indicate which data goes together (you still need to do this even if there's only one field map). We'll use a wildcard in the search string this time.  "." will match any single character, ".*" matches any number of characters.

In [8]:
converter.add_entry('se_epi_ap', search='se_epi_.*_ap', datatype='fmap', suffix='epi', chain = {'dir':'AP'},
                    json_fields={'B0FieldIdentifier': 'fieldmap'})
converter.add_entry('se_epi_pa', search='se_epi_.*_pa', datatype='fmap', suffix='epi', chain = {'dir':'PA'},
                    json_fields={'B0FieldIdentifier': 'fieldmap'})

Finally we have the functional runs. Here we need to add the 'run' identifier. We *could* explicitly map each run individually, or we could let mrpyconvert number them consecutively using the 'autorun' option. That's not always a good idea (imagine you have a session where you had to stop and restart a run), but it will work fine with this dataset.

In [9]:
converter.add_entry('tasks', search='DIPPER_.', datatype='func', suffix='bold', autorun=True, chain={'task':'dipper'},
                    json_fields={'TaskName': 'dipper', 'B0FieldSource': 'fieldmap'})

Normally you'll probably want to generate scripts that can be submitted to the cluster, rather than converting using the "convert" command. We can generate bash scripts, or scripts already formatted for SLURM. Here's what that looks like. I'm using the options "script path" and "script extension" arguments; otherwise they're in the current working directory with no extension. Each entry generates one script. I'm also including the module load jq command using the addition_commands argument. Anything in this list will be run before dcm2niix.

In [10]:
converter.generate_scripts(script_path='dipper_scripts', script_ext='.sh', additional_commands=['module load jq'])

[PosixPath('dipper_scripts/mprage.sh'),
 PosixPath('dipper_scripts/se_epi_ap.sh'),
 PosixPath('dipper_scripts/se_epi_pa.sh'),
 PosixPath('dipper_scripts/tasks.sh')]

Here's what the script looks like for the mprage

In [11]:
!cat dipper_scripts/mprage.sh

#!/bin/bash

module load jq


dicom_path=/gpfs/projects/lcni/dcm/repository/DIPPER/DIPPER_007_20191010_135619
bids_path=/gpfs/projects/lcni/jolinda/bids_tutorial/dipper
names=(007)
input_dirs=("Series_1017_mprage_p2_defaced")


for i in "${!names[@]}"; do
  name=${names[$i]}
  input_dir=${input_dirs[$i]}
  mkdir --parents "${bids_path}/sub-${name}/anat"
  dcmoutput=$(dcm2niix -ba n -l o -o "${bids_path}/sub-${name}/anat" -f "sub-${name}_T1w"  ${dicom_path}/${input_dir})
  echo "${dcmoutput}"
done


If you plan to submit the job to slurm, you can have it output a file ready to submit using sbatch. You might need additional commands in the slurm file to define your pirg, partition, etc, if you haven't set these as environment variables. You'll also need to load the jq module (the mprage script doesn't use it, but the others do).

In [17]:
converter.generate_scripts(script_path='dipper_scripts', script_ext='.srun', slurm=True, additional_commands=['#SBATCH --account=lcni', 'module load jq'])

[PosixPath('dipper_scripts/mprage.srun'),
 PosixPath('dipper_scripts/tasks.srun'),
 PosixPath('dipper_scripts/se_epi_ap.srun'),
 PosixPath('dipper_scripts/se_epi_pa.srun')]

In [18]:
!cat dipper_scripts/tasks.srun

#!/bin/bash

#SBATCH --job-name=tasks
#SBATCH --array=0-8
#SBATCH --account=lcni
module load jq


dicom_path=/gpfs/projects/lcni/dcm/repository/DIPPER/DIPPER_007_20191010_135619
bids_path=/gpfs/projects/lcni/jolinda/bids_tutorial/dipper
names=(007 007 007 007 007 007 007 007 007)
runs=(1 2 3 4 5 6 7 8 9)
input_dirs=("Series_7_DIPPER_1" \
            "Series_8_DIPPER_2" \
            "Series_9_DIPPER_3" \
            "Series_10_DIPPER_4" \
            "Series_11_DIPPER_5" \
            "Series_12_DIPPER_6" \
            "Series_13_DIPPER_7" \
            "Series_14_DIPPER_8" \
            "Series_15_DIPPER_9")


name=${names[$SLURM_ARRAY_TASK_ID]}
input_dir=${input_dirs[$SLURM_ARRAY_TASK_ID]}
run=${runs[$SLURM_ARRAY_TASK_ID]}
mkdir --parents "${bids_path}/sub-${name}/func"
dcmoutput=$(dcm2niix -ba n -l o -o "${bids_path}/sub-${name}/func" -f "sub-${name}_task-dipper_run-${run}_bold"  ${dicom_path}/${input_dir})
echo "${dcmoutput}"

# get names of converted files
if grep -q Convert <<< $

In [29]:
%%bash
for x in dipper_scripts/*.srun; do sbatch ${x}; done

Submitted batch job 26214472
Submitted batch job 26214473
Submitted batch job 26214474
Submitted batch job 26214475


That was a bit lazy; I wound up converting the mprage twice. You'll see it below with an 'a' appended to the filename.

In [35]:
!tree {bids_path}/dipper

/home/jolinda/lcni/bids_tutorial/dipper
`-- sub-007
    |-- anat
    |   |-- sub-007_T1w.json
    |   |-- sub-007_T1w.nii.gz
    |   |-- sub-007_T1wa.json
    |   `-- sub-007_T1wa.nii.gz
    |-- fmap
    |   |-- sub-007_dir-AP_epi.json
    |   |-- sub-007_dir-AP_epi.nii.gz
    |   |-- sub-007_dir-PA_epi.json
    |   `-- sub-007_dir-PA_epi.nii.gz
    `-- func
        |-- sub-007_task-dipper_run-1_bold.json
        |-- sub-007_task-dipper_run-1_bold.nii.gz
        |-- sub-007_task-dipper_run-2_bold.json
        |-- sub-007_task-dipper_run-2_bold.nii.gz
        |-- sub-007_task-dipper_run-3_bold.json
        |-- sub-007_task-dipper_run-3_bold.nii.gz
        |-- sub-007_task-dipper_run-4_bold.json
        |-- sub-007_task-dipper_run-4_bold.nii.gz
        |-- sub-007_task-dipper_run-5_bold.json
        |-- sub-007_task-dipper_run-5_bold.nii.gz
        |-- sub-007_task-dipper_run-6_bold.json
        |-- sub-007_task-dipper_run-6_bold.nii.gz
        |-- sub-007_task-dipper_run-7_bold.json
   

In [37]:
!rm {bids_path}/dipper/sub-007/anat/*a.*

## Example 2: multiple subjects, diffusion data

Let's create a new converter and look at the Round Robin sample data. This time we'll add the dicoms, then inspect them.

In [19]:
converter = mrpyconvert.Converter()

In [20]:
converter.add_dicoms(repo / 'Round_Robin')

In [21]:
converter.inspect()

3 studies for 3 subjects found.
Subjects: G16_S01 G17_S01 G18_S02
1EPI188
2EPI188
3EPI188
4EPI188
5EPI188
AAHScout_32ch-head-coil
AAHScout_32ch-head-coil_MPR_cor
AAHScout_32ch-head-coil_MPR_sag
AAHScout_32ch-head-coil_MPR_tra
AP_fieldmap_se_epi_mb3_g2_2mm_ap
EPI196
LR_diff_m2p2_64_2mm_lr
PA_fieldmap_se_epi_mb3_g2_2mm_pa
RL_diff_m2p2_64_2mm_rl
mprage_defaced


Each subject has six functional runs, an mprage, fieldmaps, and diffusion scans. We didn't get any warnings, so no subjects have duplicate runs. Once again, we have problematic subject names, so let's map those out first. 

In [22]:
converter.set_names({'G16_S01':'G16S01', 'G17_S01':'G17S01', 'G18_S02':'G18S02'})

We can do the mprage, field maps, and functionals similarly to the way we did the previous scans. I'm going to call EPI188 task "A" and EPI196 task "B". 

In [24]:
converter.add_entry('mprage', search='mprage_.*', datatype='anat', suffix='T1w')
converter.add_entry('se_epi_ap', search='AP_fieldmap.*', datatype='fmap', suffix='epi', chain = {'dir':'AP'},
                    json_fields={'B0FieldIdentifier': 'fieldmap'})
converter.add_entry('se_epi_pa', search='PA_fieldmap.*', datatype='fmap', suffix='epi', chain = {'dir':'PA'},
                    json_fields={'B0FieldIdentifier': 'fieldmap'})
converter.add_entry('taskA', search='.EPI188', datatype='func', suffix='bold', autorun=True, chain={'task':'A'},
                    json_fields={'TaskName': 'A', 'B0FieldSource': 'fieldmap'})
converter.add_entry('taskB', search='EPI196', datatype='func', suffix='bold', chain={'task':'B'},
                    json_fields={'TaskName': 'B', 'B0FieldSource': 'fieldmap'})

Diffusion scans are straightforward

In [25]:
converter.add_entry('dwi_LR', search='LR_diff.*', datatype='dwi', suffix = 'dwi', chain = {'dir':'LR'})
converter.add_entry('dwi_RL', search='RL_diff.*', datatype='dwi', suffix = 'dwi', chain = {'dir':'RL'})

In [26]:
converter.set_bids_path(output_path / 'RoundRobin')

In [27]:
converter.generate_scripts(script_path='RRscripts', script_ext='.srun', slurm=True, additional_commands=['#SBATCH --account=lcni', 'module load jq'])

[PosixPath('RRscripts/mprage.srun'),
 PosixPath('RRscripts/se_epi_ap.srun'),
 PosixPath('RRscripts/se_epi_pa.srun'),
 PosixPath('RRscripts/taskA.srun'),
 PosixPath('RRscripts/taskB.srun'),
 PosixPath('RRscripts/dwi_LR.srun'),
 PosixPath('RRscripts/dwi_RL.srun')]

In [28]:
!cat RRscripts/dwi_LR.srun

#!/bin/bash

#SBATCH --job-name=dwi_LR
#SBATCH --array=0-2
#SBATCH --account=lcni
module load jq


dicom_path=/gpfs/projects/lcni/dcm/repository/Round_Robin
bids_path=/gpfs/projects/lcni/jolinda/bids_tutorial/RoundRobin
names=(G16S01 G17S01 G18S02)
input_dirs=("G16_S01_20191111_103547/Series_15_LR_diff_m2p2_64_2mm_lr" \
            "G17_S01_20191218_091736/Series_15_LR_diff_m2p2_64_2mm_lr" \
            "G18_S02_20191121_132748/Series_15_LR_diff_m2p2_64_2mm_lr")


name=${names[$SLURM_ARRAY_TASK_ID]}
input_dir=${input_dirs[$SLURM_ARRAY_TASK_ID]}
mkdir --parents "${bids_path}/sub-${name}/dwi"
dcmoutput=$(dcm2niix -ba n -l o -o "${bids_path}/sub-${name}/dwi" -f "sub-${name}_dir-LR_dwi"  ${dicom_path}/${input_dir})
echo "${dcmoutput}"


We can submit these scripts to slurm using sbatch

In [30]:
%%bash
sbatch RRscripts/mprage.srun
sbatch RRscripts/se_epi_ap.srun
sbatch RRscripts/se_epi_pa.srun
sbatch RRscripts/taskA.srun
sbatch RRscripts/taskB.srun
sbatch RRscripts/dwi_LR.srun
sbatch RRscripts/dwi_RL.srun

Submitted batch job 26214476
Submitted batch job 26214477
Submitted batch job 26214478
Submitted batch job 26214479
Submitted batch job 26214480
Submitted batch job 26214481
Submitted batch job 26214482


We can use sacct to track whether the job is finished

In [33]:
!sacct --name taskA -b

JobID             State ExitCode 
------------ ---------- -------- 
26214479_0    COMPLETED      0:0 
26214479_0.+  COMPLETED      0:0 
26214479_0.+  COMPLETED      0:0 
26214479_1    COMPLETED      0:0 
26214479_1.+  COMPLETED      0:0 
26214479_1.+  COMPLETED      0:0 
26214479_2    COMPLETED      0:0 
26214479_2.+  COMPLETED      0:0 
26214479_2.+  COMPLETED      0:0 
26214479_3    COMPLETED      0:0 
26214479_3.+  COMPLETED      0:0 
26214479_3.+  COMPLETED      0:0 
26214479_4    COMPLETED      0:0 
26214479_4.+  COMPLETED      0:0 
26214479_4.+  COMPLETED      0:0 
26214479_5    COMPLETED      0:0 
26214479_5.+  COMPLETED      0:0 
26214479_5.+  COMPLETED      0:0 
26214479_6    COMPLETED      0:0 
26214479_6.+  COMPLETED      0:0 
26214479_6.+  COMPLETED      0:0 
26214479_7    COMPLETED      0:0 
26214479_7.+  COMPLETED      0:0 
26214479_7.+  COMPLETED      0:0 
26214479_8    COMPLETED      0:0 
26214479_8.+  COMPLETED      0:0 
26214479_8.+  COMPLETED      0:0 
26214479_9    

Anything that would have printed to the command line goes to slurm*.out and slurm*.err

In [34]:
!cat slurm-26214479_0.out

Chris Rorden's dcm2niiX version v1.0.20220505  GCC4.8.5 x86-64 (64-bit Linux)
Found 188 DICOM file(s)
Convert 188 DICOM as /gpfs/projects/lcni/jolinda/bids_tutorial/RoundRobin/sub-G16S01/func/sub-G16S01_task-A_run-1_bold (104x104x72x188)
Compress: "/bin/pigz" -b 960 -n -f -6 "/gpfs/projects/lcni/jolinda/bids_tutorial/RoundRobin/sub-G16S01/func/sub-G16S01_task-A_run-1_bold.nii"
Conversion required 29.277022 seconds (1.020000 for core code).


We can check whether we have a valid bids data set by uploading the files to the bids validator at https://bids-standard.github.io/bids-validator. It will fail and will tell us exactly what's wrong: a missing dataset description json file. mrpyconvert can make one, and can also make the optional participants.tsv and .json files.

In [49]:
converter.write_description_file()
converter.write_participants_file()

/home/jolinda/lcni/bids_tutorial/RoundRobin/dataset_description.json
/home/jolinda/lcni/bids_tutorial/RoundRobin/participants.tsv
/home/jolinda/lcni/bids_tutorial/RoundRobin/participants.json


In [57]:
cat /home/jolinda/lcni/bids_tutorial/RoundRobin/participants.tsv

participant_id	sex	age
sub-G18S02	F	18
sub-G17S01	F	28
sub-G16S01	F	21


In [52]:
%%bash
module load jq
jq . /home/jolinda/lcni/bids_tutorial/RoundRobin/participants.json

{
  "age": {
    "Description": "age of participant",
    "Units": "years"
  },
  "sex": {
    "Description": "sex of participant",
    "Levels": {
      "M": "male",
      "F": "female",
      "O": "other"
    }
  }
}


In [54]:
%%bash
module load jq
jq . /home/jolinda/lcni/bids_tutorial/RoundRobin/dataset_description.json

{
  "Name": "RoundRobin",
  "BIDSVersion": "1.8.0",
  "GeneratedBy": [
    {
      "Name": "dcm2niix",
      "version": "1.0.20220505"
    },
    {
      "Name": "mrpyconvert",
      "version": "0.1.4"
    }
  ]
}


This time we pass the bids validator test, with some warnings. One of them is that we are missing an author list in the dataset_description file. We can add that and anything else the file needs when we create the file (we can also set a different name for the project if we don't want to use the bids directory name).

In [60]:
converter.write_description_file(json_fields={'Name':'Round Robin', 'Authors':['Jolinda Smith', 'Peter Parker']})

/home/jolinda/lcni/bids_tutorial/RoundRobin/dataset_description.json


In [61]:
%%bash
module load jq
jq . /home/jolinda/lcni/bids_tutorial/RoundRobin/dataset_description.json

{
  "Name": "Round Robin",
  "Authors": [
    "Jolinda Smith",
    "Peter Parker"
  ],
  "BIDSVersion": "1.8.0",
  "GeneratedBy": [
    {
      "Name": "dcm2niix",
      "version": "1.0.20220505"
    },
    {
      "Name": "mrpyconvert",
      "version": "0.1.16"
    }
  ]
}


In [69]:
!tree {output_path}/RoundRobin

/home/jolinda/lcni/bids_tutorial/RoundRobin
|-- dataset_description.json
|-- participants.json
|-- participants.tsv
|-- sub-G16S01
|   |-- anat
|   |   |-- sub-G16S01_T1w.json
|   |   `-- sub-G16S01_T1w.nii.gz
|   |-- dwi
|   |   |-- sub-G16S01_dir-LR_dwi.bval
|   |   |-- sub-G16S01_dir-LR_dwi.bvec
|   |   |-- sub-G16S01_dir-LR_dwi.json
|   |   |-- sub-G16S01_dir-LR_dwi.nii.gz
|   |   |-- sub-G16S01_dir-RL_dwi.bval
|   |   |-- sub-G16S01_dir-RL_dwi.bvec
|   |   |-- sub-G16S01_dir-RL_dwi.json
|   |   `-- sub-G16S01_dir-RL_dwi.nii.gz
|   |-- fmap
|   |   |-- sub-G16S01_dir-AP_epi.json
|   |   |-- sub-G16S01_dir-AP_epi.nii.gz
|   |   |-- sub-G16S01_dir-PA_epi.json
|   |   `-- sub-G16S01_dir-PA_epi.nii.gz
|   `-- func
|       |-- sub-G16S01_task-A_run-1_bold.json
|       |-- sub-G16S01_task-A_run-1_bold.nii.gz
|       |-- sub-G16S01_task-A_run-2_bold.json
|       |-- sub-G16S01_task-A_run-2_bold.nii.gz
|       |-- sub-G16S01_task-A_run-3_bold.json
|       |-- sub-G16S01_task-A_run-3_bold.n

## Example 3: multiple sessions, other fieldmaps, messy data

Let's take a look at the REV example files

In [63]:
converter.inspect(repo / 'REV_examples')

6 studies for 3 subjects found.
Subjects: REV055 REV074 REV126
AAHScout
AAHScout_MPR_cor
AAHScout_MPR_sag
AAHScout_MPR_tra
BART1_mb3_g2_2mm_te27
BART2_mb3_g2_2mm_te27
GNG1_mb3_g2_2mm_te27
GNG2_mb3_g2_2mm_te27
GNG3_mb3_g2_2mm_te27
GNG4_mb3_g2_2mm_te27
PhoenixZIPReport
React1_mb3_g2_2mm_te27
React2_mb3_g2_2mm_te27
React3_mb3_g2_2mm_te27
React4_mb3_g2_2mm_te27
SST1_mb3_g2_2mm_te27
SST2_mb3_g2_2mm_te27
SST3_mb3_g2_2mm_te27
SST4_mb3_g2_2mm_te27
fieldmap1
fieldmap2
fieldmap3
fieldmap4
mprage1_MGH_p2_defaced
mprage2_MGH_p2_defaced
More than one copy of AAHScout for at least one study
More than one copy of AAHScout_MPR_sag for at least one study
More than one copy of AAHScout_MPR_cor for at least one study
More than one copy of GNG3_mb3_g2_2mm_te27 for at least one study
More than one copy of fieldmap1 for at least one study
More than one copy of AAHScout_MPR_tra for at least one study
More than one copy of fieldmap3 for at least one study
More than one copy of fieldmap4 for at least one study

Let's take a closer look at one subject to make sense of this

In [66]:
!ls {repo}/REV_examples/REV055* 

/projects/lcni/dcm/repository/REV_examples/REV055_20150811_135636:
Series_1010_mprage1_MGH_p2_defaced  Series_3_AAHScout_MPR_cor
Series_11_SST1_mb3_g2_2mm_te27	    Series_4_AAHScout_MPR_tra
Series_12_SST2_mb3_g2_2mm_te27	    Series_5_GNG2_mb3_g2_2mm_te27
Series_13_fieldmap2		    Series_6_GNG1_mb3_g2_2mm_te27
Series_14_fieldmap2		    Series_7_fieldmap1
Series_15_React1_mb3_g2_2mm_te27    Series_8_fieldmap1
Series_16_React2_mb3_g2_2mm_te27    Series_99_PhoenixZIPReport
Series_1_AAHScout		    Series_9_BART1_mb3_g2_2mm_te27
Series_2_AAHScout_MPR_sag

/projects/lcni/dcm/repository/REV_examples/REV055_20150905_102727:
Series_1010_mprage2_MGH_p2_defaced  Series_3_AAHScout_MPR_cor
Series_11_GNG3_mb3_g2_2mm_te27	    Series_4_AAHScout_MPR_tra
Series_12_GNG4_mb3_g2_2mm_te27	    Series_5_SST3_mb3_g2_2mm_te27
Series_13_fieldmap4		    Series_6_SST4_mb3_g2_2mm_te27
Series_14_fieldmap4		    Series_7_fieldmap3
Series_15_React4_mb3_g2_2mm_te27    Series_8_fieldmap3
Series_16_React3_mb3_g2_2mm_te27    Se

From here we can see that we have two sessions for each subject, and we can tell which session is which according to the name of the run. As long as we include the keyword 'ses' in add_entry, mrpyconvert will take care of organizing the output correctly. So we might do something like this:
```
converter.add_entry('GNG1', search='GNG1.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '1', 'ses': '1'})
converter.add_entry('GNG2', search='GNG2.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run':'2', 'ses': '1'})
converter.add_entry('GNG3', search='GNG3.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '1', 'ses': '2'})
converter.add_entry('GNG4', search='GNG4.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '2', 'ses': '2'})
```
This would work great if our data was perfect! But check out the warning message from the inspection above:
```
More than one copy of GNG1_mb3_g2_2mm_te27 for at least one study
```
This means that at least one subject has two GNG1 scans in the same session. This is a problem -- we'll wind up with files named:
```
sub-XX_task-GNG_ses-1_run-1_bold
sub-XX_task-GNG_ses-1_run-1_bolda
```
It's safer to just take the series numbers as the run numbers. We can do that by using the special formatting character '%s', which will be replaced by dcm2niix with the series number 
```
converter.add_entry('GNG1', search='GNG1.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '%s', 'ses': '1'})
converter.add_entry('GNG2', search='GNG2.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run':'%s', 'ses': '1'})
converter.add_entry('GNG3', search='GNG3.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '%s', 'ses': '2'})
converter.add_entry('GNG4', search='GNG4.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '%s', 'ses': '2'})
```
You can also have mrpyconvert automatically determine the session number for each subject, by using the relative dates of each study. That's what we'll do below. No matter what, you are going to have some cleanup work afterwards to remove/rename files if you don't clean up the dicom files before conversion.  

Finally, there are the fieldmaps. These are the older phasediff/magnitude pairs. Unfortunately, the series names don't tell us which is which. For this special case, we can set the suffix to "auto" and let mrpyconvert figure it out.
```
converter.set_autosession(True)
converter.add_entity('fieldmap.*', datatype='fmap', suffix='auto', chain={'run': '%s'})
```

In [3]:
converter = mrpyconvert.Converter()
converter.add_dicoms(repo / 'REV_examples')
converter.set_bids_path(output_path / 'REV')
converter.set_autosession(True)

In [4]:
converter.add_entry('React', search='React.*', datatype='func', suffix='bold', chain={'task': 'React', 'run': '%s'}, json_fields={'TaskName': 'React'})
converter.add_entry('BART', search='BART.*', datatype='func', suffix='bold', chain={'task': 'BART', 'run': '%s'}, json_fields={'TaskName': 'BART'})
converter.add_entry('SST', search='SST.*', datatype='func', suffix='bold', chain={'task': 'SST', 'run': '%s'}, json_fields={'TaskName': 'SST'})
converter.add_entry('GNG', search='GNG.*', datatype='func', suffix='bold', chain={'task': 'GNG', 'run': '%s'}, json_fields={'TaskName': 'GNG'})
converter.add_entry('mprage', search='mprage.*', datatype='anat', suffix='T1w', chain={'acq': 'mprage', 'run': '%s'})
converter.add_entry('fieldmap', search='fieldmap.*', datatype='fmap', suffix='auto', chain={'run': '%s'})

I'm leaving out the json entries for the field maps (B0FieldSource and B0FieldIdentifier). With multiple field maps per subject, and some subjects having extra fieldmaps, I've decided to fix that later on a case by case basis. I've added "acq" to the mprage, but that's completely optional.

In [5]:
converter.generate_scripts(script_path='rev_scripts', script_ext='.srun', slurm=True, 
                           additional_commands=['#SBATCH --account=lcni', 
                                                '#SBATCH --output=rev_out/%x-%A_%a.out',
                                                '#SBATCH --error=rev_out/%x-%A_%a.err',
                                                '\n',
                                                'module load jq'])

[PosixPath('rev_scripts/React.srun'),
 PosixPath('rev_scripts/BART.srun'),
 PosixPath('rev_scripts/SST.srun'),
 PosixPath('rev_scripts/GNG.srun'),
 PosixPath('rev_scripts/mprage.srun'),
 PosixPath('rev_scripts/fieldmap.srun')]

Those additional commands will write the "slurm-.out" files files to a directory called "rev_out". This will keep our working directory a little cleaner. Dont' forget  to create this directory. There are several other useful commands you can add here, including one to email you when the job is finished. You can find them at https://slurm.schedmd.com/sbatch.html.

In [None]:
!mkdir rev_out

In [7]:
!cat rev_scripts/BART.srun

#!/bin/bash

#SBATCH --job-name=BART
#SBATCH --array=0-7
#SBATCH --account=lcni
#SBATCH --output=rev_out/%x-%A_%a.out
#SBATCH --error=rev_out/%x-%A_%a.err


module load jq


dicom_path=/gpfs/projects/lcni/dcm/repository/REV_examples
bids_path=/gpfs/projects/lcni/jolinda/bids_tutorial/REV
names=(REV055 REV055 REV074 REV074 REV126 REV126 REV126 REV126)
sessions=(1 2 1 2 1 1 1 2)
input_dirs=("REV055_20150811_135636/Series_9_BART1_mb3_g2_2mm_te27" \
            "REV055_20150905_102727/Series_9_BART2_mb3_g2_2mm_te27" \
            "REV074_20151006_100216/Series_5_BART1_mb3_g2_2mm_te27" \
            "REV074_20151110_151323/Series_5_BART2_mb3_g2_2mm_te27" \
            "REV126_20160304_130506/Series_5_BART1_mb3_g2_2mm_te27" \
            "REV126_20160304_130506/Series_6_BART1_mb3_g2_2mm_te27" \
            "REV126_20160304_130506/Series_7_BART1_mb3_g2_2mm_te27" \
            "REV126_20160407_150231/Series_15_BART2_mb3_g2_2mm_te27")


name=${names[$SLURM_ARRAY_TASK_ID]}
input_dir=${input_dirs

In [8]:
%%bash
for x in rev_scripts/*.srun; do sbatch $x; done

Submitted batch job 26215831
Submitted batch job 26215832
Submitted batch job 26215833
Submitted batch job 26215834
Submitted batch job 26215835
Submitted batch job 26215836


In [15]:
converter.write_description_file()
converter.write_participants_file()

/home/jolinda/lcni/bids_tutorial/REV/dataset_description.json
/home/jolinda/lcni/bids_tutorial/REV/participants.tsv
/home/jolinda/lcni/bids_tutorial/REV/participants.json


In [16]:
!tree {output_path}/REV

/home/jolinda/lcni/bids_tutorial/REV
|-- dataset_description.json
|-- participants.json
|-- participants.tsv
|-- sub-REV055
|   |-- ses-1
|   |   |-- anat
|   |   |   |-- sub-REV055_ses-1_acq-mprage_run-1010_T1w.json
|   |   |   `-- sub-REV055_ses-1_acq-mprage_run-1010_T1w.nii.gz
|   |   |-- fmap
|   |   |   |-- sub-REV055_ses-1_run-13_magnitude1.json
|   |   |   |-- sub-REV055_ses-1_run-13_magnitude1.nii.gz
|   |   |   |-- sub-REV055_ses-1_run-13_magnitude2.json
|   |   |   |-- sub-REV055_ses-1_run-13_magnitude2.nii.gz
|   |   |   |-- sub-REV055_ses-1_run-14_phasediff.json
|   |   |   |-- sub-REV055_ses-1_run-14_phasediff.nii.gz
|   |   |   |-- sub-REV055_ses-1_run-7_magnitude1.json
|   |   |   |-- sub-REV055_ses-1_run-7_magnitude1.nii.gz
|   |   |   |-- sub-REV055_ses-1_run-7_magnitude2.json
|   |   |   |-- sub-REV055_ses-1_run-7_magnitude2.nii.gz
|   |   |   |-- sub-REV055_ses-1_run-8_phasediff.json
|   |   |   `-- sub-REV055_ses-1_run-8_phasediff.nii.gz
|   |   `-- func
|   |      

This will pass bids validation, but there's still cleanup to do, especially if you're going to use something like fmriprep. You'll want those extra fieldmap related json fields, and the fieldmaps should be renamed to something like run-1 and run-2, and you need to look for aborted runs and take them out. Hopefully this tutorial is enough to get you started with your own conversion.