# DICOM-to-BIDS conversion using `heudiconv` and `singularity`

* This notebook walks through the steps for doing DICOM to BIDS conversion using `heudiconv` with the Singularity image for a single participant to identify the correct parameters


* For a new project the initial configuration steps will need to be completed to create the `heuristic.py` file. But this only needs to be done once and then it should be straight forward to run new participants.

* There is a tutorial on using heudiconv:
    - http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/

  that was used as a basis for this but required some tweaks.
  

#### HISTORY

* 9/9/21 dcosme - separated into two notebooks; this notebooks is to identify the parameters, the other creates jobs for each participant to convert
* 4/1/20 mbod - initial setup for MURI DICOMS

## Location of files

1. DICOMS for UPenn data are in:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/
       
    ```
    
    * DICOMS (`.dcm` files) should match:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/{subject}/{scan}/*.dcm
    ```
    
    
2. BIDS files should be in  (may need to create these directories)
    ```
    /data00/projects/{project}/data/bids_data
    ```
    
3. Config files for `heudiconv` will be in
    ```
    /data00/projects/{project}/scripts/BIDS/heudiconv
    ```

In [84]:
raw_directory = '/fmriDataRaw/fmri_data_raw/geoscan/T2'
!ls $raw_directory


BIDStest   GEO013_T2  GEO036_T2  GEO061_T2  GEO070_T2  GEO078_T2
GEO004_T2  GEO015_T2  GEO037_T2  GEO064_T2  GEO071_T2  GS_pilotBM
GEO006_T2  GEO021_T2  GEO047_T2  GEO067_T2  GEO073_T2  GSTEST_TX
GEO010_T2  GEO033_T2  GEO053_T2  GEO068_T2  GEO074_T2


In [85]:
bids_directory = '/data00/projects/geoscan_v2/data/bids_data'
!ls  $bids_directory

CHANGES			  README     sub-GS024	task-image_bold.json
dataset_description.json  sub-GS005  sub-GS025	task-rest_bold.json
derivatives		  sub-GS008  sub-GS028
participants.json	  sub-GS017  sub-GS031
participants.tsv	  sub-GS022  sub-GS032


### Setup

In [86]:
import pandas as pd

## Create the configuration files for a new project

* For a new project you need to run `heudiconv` in heuristic mode once on a representative participant (i.e. has all the possible scans for a subject in study) and it will generate:
    - a TSV file called `dicominfo.tsv` that contains the details of each of the scans in the dataset
    - a Python template file called `heuristic.py` that you edit to setup the dicom volumes to NIFTI mapping
    
    
* To run `heudiconv` in heuristic mode with Singularity image, edit the cell below using the correct paths for your directory (replace {Project Name}). Use:
    ```
    !singularity run --cleanenv \
        -B /data00/projects/{Project Name}/data/bids_data:/base  \
        -B /fmriDataRaw/fmri_data_raw:/raw \
        /data00/tools/singularity_images/heudiconv_0.8.0 \
        -d /raw/{Project Name}/{subject}/*/*.dcm \
        -o heudiconv/ -f convertall -s {subject} -c none --overwrite
    ```
     where
     * `-B /data00/BIDS:/base` makes a file mapping for singularity to where the data should be written (i.e. `/data00/BIDS/{project...}`)
     * `-B /fmriDataRaw/fmri_data_raw:/raw` maps the DICOM directory to singularity location `/raw`
     * `/data00/tools/singularity_images/heudiconv_0.8.0` is the path to the `heudiconv` Singularity image you want to use. (On 4/22/20 that is version 0.8.0 - but this will change and we try to keep an updated version)
     * `-d /raw/{Project Name}/{subject}/*/*.dcm` is the file matching template for finding DICOM files. Here `/raw` maps to `/fmriDataRaw/fmri_data_raw` so we are looking for files that end with `.dcm` in `/fmriDataRaw/fmri_data_raw/{Project Name}/{subject}/*/*.dcm` - where subject will be specified with the -s param, e.g. `MURI155`
     * `-o /base/{Project Name}/heudiconv/` is the location for output files from `heudiconv` to be written
     * `-f convertall` means include all dicom files and scans (i.e. not just one type like anatomical or ignore localizers etc)
     * `-c none` puts it into heuristic mode. This option specifies which tool to use to do dicom2nifti conversion and will be changed once the configuration is set up.


In [89]:
sub = "GEO053"

!singularity run --cleanenv \
    -B /data00/projects/geoscan_v2/data/bids_data:/base  \
    -B /fmriDataRaw/fmri_data_raw:/raw \
    /data00/tools/singularity_images/heudiconv_0.8.0 \
    -d /raw/geoscan/T3/{subject}_T3/*.dcm \
    -o heudiconv/ -ss t3 -f convertall -s 'GEO053' -c none --overwrite

INFO: Running heudiconv version 0.8.0 latest 0.11.3
INFO: Need to process 1 study sessions
INFO: PROCESSING STARTS: {'subject': 'GEO053', 'outdir': '/fmriNASTest/data00/projects/geoscan_v2/scripts/BIDS/heudiconv/', 'session': 't2'}
INFO: Processing 2603 dicoms
INFO: Analyzing 2603 dicoms
INFO: Generated sequence info for 11 studies with 2603 entries total
INFO: PROCESSING DONE: {'subject': 'GEO053', 'outdir': '/fmriNASTest/data00/projects/geoscan_v2/scripts/BIDS/heudiconv/', 'session': 't2'}


* The output from this run is in a hidden folder called .heudiconv

In [90]:
!ls -a heudiconv/.heudiconv/GEO053/info
#more heudiconv/code/heuristic.py

.   dicominfo_ses-t2.tsv   GEO053_ses-t2.auto.txt  heuristic.py
..  filegroup_ses-t2.json  GEO053_ses-t2.edit.txt


* We can look at the `dicominfo.tsv` in Pandas

In [91]:
scan_df=pd.read_csv('heudiconv/.heudiconv/{sub}/info/dicominfo_ses-t2.tsv'.format(sub=sub), sep='\t')
scan_df

Unnamed: 0,total_files_till_now,example_dcm_file,series_id,dcm_dir_name,series_files,unspecified,dim1,dim2,dim3,dim4,...,study_description,referring_physician_name,series_description,sequence_name,image_type,accession_number,patient_age,patient_sex,date,series_uid
0,9,001_000001_000001.dcm,1-localizer_multislice,GEO053_T2,9,,512,512,9,1,...,CAMRIS^Falk,,localizer_multislice,*fl2d1,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071815455218508...
1,404,001_000002_000001.dcm,2-BOLD_IMAGE_run01,GEO053_T2,395,,84,84,56,395,...,CAMRIS^Falk,,BOLD_IMAGE_run01,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071815490857006...
2,799,001_000003_000001.dcm,3-BOLD_IMAGE_run02,GEO053_T2,395,,84,84,56,395,...,CAMRIS^Falk,,BOLD_IMAGE_run02,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071815564076680...
3,1194,001_000004_000001.dcm,4-BOLD_IMAGE_run03,GEO053_T2,395,,84,84,56,395,...,CAMRIS^Falk,,BOLD_IMAGE_run03,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816033238423...
4,1589,001_000005_000001.dcm,5-BOLD_IMAGE_run04,GEO053_T2,395,,84,84,56,395,...,CAMRIS^Falk,,BOLD_IMAGE_run04,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816110432425...
5,1984,001_000006_000001.dcm,6-BOLD_IMAGE_run05,GEO053_T2,395,,84,84,56,395,...,CAMRIS^Falk,,BOLD_IMAGE_run05,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816180216283...
6,2160,001_000007_000001.dcm,7-T2_1mm_SPACE,GEO053_T2,176,,256,256,176,1,...,CAMRIS^Falk,,T2_1mm_SPACE,*spc_282ns,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816292997914...
7,2320,001_000008_000001.dcm,8-MPRAGE_TI1100_ipat2,GEO053_T2,160,,256,192,160,1,...,CAMRIS^Falk,,MPRAGE_TI1100_ipat2,*tfl3d1_16,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816293695891...
8,2423,001_000009_000001.dcm,9-DSI_2mm_102dir_b4000_mb3,GEO053_T2,103,,106,106,78,103,...,CAMRIS^Falk,,DSI_2mm_102dir_b4000_mb3,ep_b5#1,"('ORIGINAL', 'PRIMARY', 'DIFFUSION', 'NONE', '...",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816330844848...
9,2543,001_000010_000001.dcm,10-B0map,GEO053_T2,120,,80,80,120,1,...,CAMRIS^Falk,,B0map,*fm2d2r,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,021Y,M,20170718,1.3.12.2.1107.5.2.43.66044.2017071816390779984...


* The goal is to set up a mapping between values some of these fields and a BIDS compliant file


* The most useful fields for this are the `series_id` or `series_description` and the 4 dimension fields.
  * `dim4` is the number of volumes and using this can help when there is a partial (i.e. scan started and stopped quickly) scan before a complete one where the `series_id` will be the same.

In [92]:
scan_df[['series_id', 'series_description', 'dim1','dim2','dim3','dim4']]

Unnamed: 0,series_id,series_description,dim1,dim2,dim3,dim4
0,1-localizer_multislice,localizer_multislice,512,512,9,1
1,2-BOLD_IMAGE_run01,BOLD_IMAGE_run01,84,84,56,395
2,3-BOLD_IMAGE_run02,BOLD_IMAGE_run02,84,84,56,395
3,4-BOLD_IMAGE_run03,BOLD_IMAGE_run03,84,84,56,395
4,5-BOLD_IMAGE_run04,BOLD_IMAGE_run04,84,84,56,395
5,6-BOLD_IMAGE_run05,BOLD_IMAGE_run05,84,84,56,395
6,7-T2_1mm_SPACE,T2_1mm_SPACE,256,256,176,1
7,8-MPRAGE_TI1100_ipat2,MPRAGE_TI1100_ipat2,256,192,160,1
8,9-DSI_2mm_102dir_b4000_mb3,DSI_2mm_102dir_b4000_mb3,106,106,78,103
9,10-B0map,B0map,80,80,120,1


* There is a Python file called `heuristic.py` that needs to be edited to set up the mappings between dicom files and these scan data and the location within the BIDS structure in the output folder.


* The two things that need to be added to this files are:
    1. __KEYS__ that provide a file template and mapping to where in the BIDS output a certain filetype should be written
    2. __MATCHES__ are condition statements that match a scan using the fields in the dataframe above and map them to a specific key



* __KEYS__
Use the ‘create_key’ function to create keys inside of the ‘infotodict’ function. See how keys are created for the t1 scan, the different functional runs, and the field maps. These are put into a dictionary, and the values will be added using the conditional matching.

```python
 t1w = create_key('sub-{subject}/anat/sub-{subject}_T1w')
    
    func_read=create_key('sub-{subject}/func/sub-{subject}_task-read_run-{item:01d}_bold')
    func_share=create_key('sub-{subject}/func/sub-{subject}_task-share_run-{item:01d}_bold')

    fmap_AP=create_key('sub-{subject}/fmap/sub-{subject}_acq-{item:01d}_dir-AP_epi')

    
    info = {
            t1w: [], 
            func_read: [], func_share: [],
            fmap_AP: []
           }
    
```

![](img/heudiconv1.png)


* __CONDITIONAL MATCHES__
Now, using the information from the dicominfo.tsv, we will go through the series and put the scans only when they have the full number of scans. 
```python
if (s.dim1 == 256) and (s.dim2 == 192) and ('MPRAGE_TI1100_ipat2' in s.series_id):
            info[t1w].append(s.series_id)
        
        if (s.dim4 == 824 or s.dim4 == 813) and ('read' in s.series_id):
            info[func_read].append(s.series_id)

        if (s.dim4 == 865 or s.dim4 == 854 or s.dim4 == 568) and ('share' in s.series_id):
            info[func_share].append(s.series_id)

        if (s.dim4 == 3) and ('epi' in s.series_id):
            info[fmap_AP].append(s.series_id)
```
![](img/heudiconv2.png)





* Once these are added to `heuristic.py` which we keep in:
    ```
    {Your Project}/scripts/BIDS/heudiconv/code
    ```
    
    
* and we delete the `.heudiconv` heuristic folder. Then we are ready to convert DICOM files to NIFTIs and set them up in a BIDS compliant structure!

In [104]:
rm -fr heudiconv/.heudiconv

In [105]:
!singularity run --cleanenv \
    -B /data00/projects/geoscan_v2:/base  \
    -B /fmriDataRaw/fmri_data_raw:/raw \
    /data00/tools/singularity_images/heudiconv_0.8.0 \
    -d /raw/geoscan/T3/{subject}_T3/*.dcm \
    -o /base/data/bids_data/ \
    -f /base/scripts/BIDS/heudiconv/code/heuristic.py -ss t3 -s 'GEO053'  -c dcm2niix -b --overwrite

INFO: Running heudiconv version 0.8.0 latest 0.11.3
INFO: Need to process 1 study sessions
INFO: PROCESSING STARTS: {'subject': 'GEO053', 'outdir': '/base/data/bids_data/', 'session': 't3'}
INFO: Processing 3010 dicoms
INFO: Analyzing 3010 dicoms
INFO: Generated sequence info for 13 studies with 3010 entries total
INFO: Doing conversion using dcm2niix
INFO: Converting /base/data/bids_data/sub-GEO053/ses-t3/anat/sub-GEO053_ses-t3_T1w (160 DICOMs) -> /base/data/bids_data/sub-GEO053/ses-t3/anat . Converter: dcm2niix . Output types: ('nii.gz',)
220613-16:03:07,677 nipype.utils INFO:
	 Running nipype version 1.4.2 (latest: 1.8.1)
INFO: Running nipype version 1.4.2 (latest: 1.8.1)
220613-16:03:07,683 nipype.workflow INFO:
	 [Node] Setting-up "convert" in "/tmp/dcm2niixgnxm0lkd/convert".
INFO: [Node] Setting-up "convert" in "/tmp/dcm2niixgnxm0lkd/convert".
220613-16:03:07,747 nipype.workflow INFO:
	 [Node] Running "convert" ("nipype.interfaces.dcm2nii.Dcm2niix"), a CommandLine Interface with 

220613-16:03:28,63 nipype.interface INFO:
	 stdout 2022-06-13T16:03:28.063626:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-image_run-1_bold.nii"
INFO: stdout 2022-06-13T16:03:28.063626:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-image_run-1_bold.nii"
220613-16:03:28,63 nipype.interface INFO:
	 stdout 2022-06-13T16:03:28.063626:Conversion required 5.488265 seconds (1.815476 for core code).
INFO: stdout 2022-06-13T16:03:28.063626:Conversion required 5.488265 seconds (1.815476 for core code).
220613-16:03:28,117 nipype.workflow INFO:
	 [Node] Finished "convert".
INFO: [Node] Finished "convert".
220613-16:03:30,132 nipype.workflow INFO:
	 [Node] Setting-up "embedder" in "/tmp/embedmetag5viyjl6/embedder".
INFO: [Node] Setting-up "embedder" in "/tmp/embedmetag5viyjl6/embedder".
220613-16:03:30,183 nipype.workflow INFO:
	 [Node] Running "embedder" ("nipype.inter

	 Storing result file without outputs
	 [Node] Error on "embedder" (/tmp/embedmetaamvbbipx/embedder)
ERROR: Embedding failed: 'NoneType' object is not subscriptable
INFO: Post-treating /base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-image_run-3_bold.json file
INFO: Converting /base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-image_run-4_bold (395 DICOMs) -> /base/data/bids_data/sub-GEO053/ses-t3/func . Converter: dcm2niix . Output types: ('nii.gz',)
220613-16:05:06,75 nipype.workflow INFO:
	 [Node] Setting-up "convert" in "/tmp/dcm2niixin2yd2__/convert".
INFO: [Node] Setting-up "convert" in "/tmp/dcm2niixin2yd2__/convert".
220613-16:05:06,219 nipype.workflow INFO:
	 [Node] Running "convert" ("nipype.interfaces.dcm2nii.Dcm2niix"), a CommandLine Interface with command:
dcm2niix -b y -z y -x n -t n -m n -f /base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-image_run-4_bold -o . -s n -v n /tmp/dcm2niixin2yd2__/convert
INFO: [Node] Running

220613-16:06:14,453 nipype.interface INFO:
	 stdout 2022-06-13T16:06:14.453031:Chris Rorden's dcm2niiX version v1.0.20190410  GCC6.3.0 (64-bit Linux)
INFO: stdout 2022-06-13T16:06:14.453031:Chris Rorden's dcm2niiX version v1.0.20190410  GCC6.3.0 (64-bit Linux)
220613-16:06:14,453 nipype.interface INFO:
	 stdout 2022-06-13T16:06:14.453031:Found 24 DICOM file(s)
INFO: stdout 2022-06-13T16:06:14.453031:Found 24 DICOM file(s)
220613-16:06:14,453 nipype.interface INFO:
	 stdout 2022-06-13T16:06:14.453031:slices stacked despite varying acquisition numbers (if this is not desired recompile with 'mySegmentByAcq')
INFO: stdout 2022-06-13T16:06:14.453031:slices stacked despite varying acquisition numbers (if this is not desired recompile with 'mySegmentByAcq')
220613-16:06:14,453 nipype.interface INFO:
	 stdout 2022-06-13T16:06:14.453031:Convert 24 DICOM as ./base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-retrans_run-1_bold (84x84x56x24)
INFO: stdout 2022-06-13T16:06:14.453031

220613-16:06:29,561 nipype.interface INFO:
	 stdout 2022-06-13T16:06:29.561631:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-retrans_run-3_bold.nii"
INFO: stdout 2022-06-13T16:06:29.561631:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GEO053/ses-t3/func/sub-GEO053_ses-t3_task-retrans_run-3_bold.nii"
220613-16:06:29,561 nipype.interface INFO:
	 stdout 2022-06-13T16:06:29.561631:Conversion required 6.414422 seconds (2.134033 for core code).
INFO: stdout 2022-06-13T16:06:29.561631:Conversion required 6.414422 seconds (2.134033 for core code).
220613-16:06:29,616 nipype.workflow INFO:
	 [Node] Finished "convert".
INFO: [Node] Finished "convert".
220613-16:06:31,69 nipype.workflow INFO:
	 [Node] Setting-up "embedder" in "/tmp/embedmeta09t4_t46/embedder".
INFO: [Node] Setting-up "embedder" in "/tmp/embedmeta09t4_t46/embedder".
220613-16:06:31,117 nipype.workflow INFO:
	 [Node] Running "embedder" ("nipype.

	 Storing result file without outputs
	 [Node] Error on "embedder" (/tmp/embedmetals156k3t/embedder)
ERROR: Embedding failed: 'NoneType' object is not subscriptable
INFO: Post-treating /base/data/bids_data/sub-GEO053/ses-t3/fmaps/sub-GEO053_ses-t3_acq-2_epi.json file
INFO: Lock 140638704426176 acquired on /base/data/bids_data/heudiconv.lock
INFO: Populating template files under /base/data/bids_data/
INFO: Lock 140638704426176 released on /base/data/bids_data/heudiconv.lock
INFO: PROCESSING DONE: {'subject': 'GEO053', 'outdir': '/base/data/bids_data/', 'session': 't3'}
