# DICOM-to-BIDS conversion using `heudiconv` and `singularity`

* This notebook walks through the steps for doing DICOM to BIDS conversion using `heudiconv` with the Singularity image for a single participant to identify the correct parameters


* For a new project the initial configuration steps will need to be completed to create the `heuristic.py` file. But this only needs to be done once and then it should be straight forward to run new participants.

* There is a tutorial on using heudiconv:
    - http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/

  that was used as a basis for this but required some tweaks.
  

#### HISTORY

* 9/9/21 dcosme - separated into two notebooks; this notebooks is to identify the parameters, the other creates jobs for each participant to convert
* 4/1/20 mbod - initial setup for MURI DICOMS

## Location of files

1. DICOMS for UPenn data are in:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/
       
    ```
    
    * DICOMS (`.dcm` files) should match:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/{subject}/{scan}/*.dcm
    ```
    
    
2. BIDS files should be in  (may need to create these directories)
    ```
    /data00/projects/{project}/data/bids_data
    ```
    
3. Config files for `heudiconv` will be in
    ```
    /data00/projects/{project}/scripts/BIDS/heudiconv
    ```

In [1]:
raw_directory = '/fmriDataRaw/fmri_data_raw/geoscan_R01'
!ls $raw_directory


GS004-T2  GS008  GS017_t2  GS020     GS022_t3  GS024_t3  GS025_t3  GS028  GS032
GS005	  GS016  GS017_t3  GS022_t2  GS024_t2  GS025_t2  GS026	   GS031


In [2]:
bids_directory = '/data00/projects/geoscan_v2/data/bids_data'
!ls  $bids_directory

CHANGES			  README     sub-GS022		   task-rest_bold.json
dataset_description.json  sub-GS005  sub-GS024
participants.json	  sub-GS008  sub-GS025
participants.tsv	  sub-GS017  task-image_bold.json


### Setup

In [3]:
import pandas as pd

## Create the configuration files for a new project

* For a new project you need to run `heudiconv` in heuristic mode once on a representative participant (i.e. has all the possible scans for a subject in study) and it will generate:
    - a TSV file called `dicominfo.tsv` that contains the details of each of the scans in the dataset
    - a Python template file called `heuristic.py` that you edit to setup the dicom volumes to NIFTI mapping
    
    
* To run `heudiconv` in heuristic mode with Singularity image, edit the cell below using the correct paths for your directory (replace {Project Name}). Use:
    ```
    !singularity run --cleanenv \
        -B /data00/projects/{Project Name}/data/bids_data:/base  \
        -B /fmriDataRaw/fmri_data_raw:/raw \
        /data00/tools/singularity_images/heudiconv_0.8.0 \
        -d /raw/{Project Name}/{subject}/*/*.dcm \
        -o heudiconv/ -f convertall -s {subject} -c none --overwrite
    ```
     where
     * `-B /data00/BIDS:/base` makes a file mapping for singularity to where the data should be written (i.e. `/data00/BIDS/{project...}`)
     * `-B /fmriDataRaw/fmri_data_raw:/raw` maps the DICOM directory to singularity location `/raw`
     * `/data00/tools/singularity_images/heudiconv_0.8.0` is the path to the `heudiconv` Singularity image you want to use. (On 4/22/20 that is version 0.8.0 - but this will change and we try to keep an updated version)
     * `-d /raw/{Project Name}/{subject}/*/*.dcm` is the file matching template for finding DICOM files. Here `/raw` maps to `/fmriDataRaw/fmri_data_raw` so we are looking for files that end with `.dcm` in `/fmriDataRaw/fmri_data_raw/{Project Name}/{subject}/*/*.dcm` - where subject will be specified with the -s param, e.g. `MURI155`
     * `-o /base/{Project Name}/heudiconv/` is the location for output files from `heudiconv` to be written
     * `-f convertall` means include all dicom files and scans (i.e. not just one type like anatomical or ignore localizers etc)
     * `-c none` puts it into heuristic mode. This option specifies which tool to use to do dicom2nifti conversion and will be changed once the configuration is set up.


In [4]:
sub = "GS025"

!singularity run --cleanenv \
    -B /data00/projects/geoscan_v2/data/bids_data:/base  \
    -B /fmriDataRaw/fmri_data_raw:/raw \
    /data00/tools/singularity_images/heudiconv_0.8.0 \
    -d /raw/geoscan_R01/{subject}_t2/*/*.dcm \
    -o heudiconv/ -ss t2 -f convertall -s 'GS025' -c none --overwrite

INFO: Running heudiconv version 0.8.0 latest 0.11.3
INFO: Need to process 1 study sessions
INFO: PROCESSING STARTS: {'subject': 'GS025', 'outdir': '/fmriNASTest/data00/projects/geoscan_v2/scripts/BIDS/heudiconv/', 'session': 't2'}
INFO: Processing 1179 dicoms
INFO: Analyzing 1179 dicoms
INFO: Generated sequence info for 17 studies with 1164 entries total
INFO: PROCESSING DONE: {'subject': 'GS025', 'outdir': '/fmriNASTest/data00/projects/geoscan_v2/scripts/BIDS/heudiconv/', 'session': 't2'}


* The output from this run is in a hidden folder called .heudiconv

In [5]:
!ls -a heudiconv/.heudiconv/GS025/info
#more heudiconv/code/heuristic.py

.		      filegroup_ses-t2.json  GS025_ses-t3.auto.txt
..		      filegroup_ses-t3.json  GS025_ses-t3.edit.txt
dicominfo_ses-t2.tsv  GS025_ses-t2.auto.txt  heuristic.py
dicominfo_ses-t3.tsv  GS025_ses-t2.edit.txt


* We can look at the `dicominfo.tsv` in Pandas

In [6]:
scan_df=pd.read_csv('heudiconv/.heudiconv/{sub}/info/dicominfo_ses-t2.tsv'.format(sub=sub), sep='\t')
scan_df

Unnamed: 0,total_files_till_now,example_dcm_file,series_id,dcm_dir_name,series_files,unspecified,dim1,dim2,dim3,dim4,...,study_description,referring_physician_name,series_description,sequence_name,image_type,accession_number,patient_age,patient_sex,date,series_uid
0,9,1.3.12.2.1107.5.2.43.66044.2021120118233350460...,1-localizer_multislice,1.3.12.2.1107.5.2.43.66044.2021120118232768356...,9,,512,512,9,1,...,CAMRIS^Falk,,localizer_multislice,*fl2d1,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118232768356...
1,137,1.3.12.2.1107.5.2.43.66044.2021120118242911163...,2-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021120118242921346...,128,,160,160,128,1,...,CAMRIS^Falk,,AAHead_Scout,*fl3d1_ns,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118242921346...
2,142,1.3.12.2.1107.5.2.43.66044.2021120118243264539...,3-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021120118243262381...,5,,162,162,5,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_sag,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118243262381...
3,145,1.3.12.2.1107.5.2.43.66044.2021120118243264659...,4-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021120118243262429...,3,,162,162,3,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_cor,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118243262429...
4,148,1.3.12.2.1107.5.2.43.66044.2021120118243264729...,5-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021120118243262452...,3,,162,162,3,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_tra,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118243262452...
5,280,1.3.12.2.1107.5.2.43.66044.2021120118274430061...,6-BOLD_IMAGE_run01,1.3.12.2.1107.5.2.43.66044.2021120118260579495...,132,,84,84,46,132,...,CAMRIS^Falk,,BOLD_IMAGE_run01,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MO...",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118260579495...
6,284,1.3.12.2.1107.5.2.43.66044.2021120118342991498...,7-FieldMap_PA,1.3.12.2.1107.5.2.43.66044.2021120118341985854...,4,,84,84,46,4,...,CAMRIS^Falk,,FieldMap_PA,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MO...",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118341985854...
7,416,1.3.12.2.1107.5.2.43.66044.2021120118354376532...,8-BOLD_IMAGE_run02,1.3.12.2.1107.5.2.43.66044.2021120118344063819...,132,,84,84,46,132,...,CAMRIS^Falk,,BOLD_IMAGE_run02,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MO...",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118344063819...
8,420,1.3.12.2.1107.5.2.43.66044.2021120118422846619...,9-FieldMap_PA,1.3.12.2.1107.5.2.43.66044.2021120118421931246...,4,,84,84,46,4,...,CAMRIS^Falk,,FieldMap_PA,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MO...",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118421931246...
9,552,1.3.12.2.1107.5.2.43.66044.2021120118434340833...,10-BOLD_IMAGE_run03,1.3.12.2.1107.5.2.43.66044.2021120118424059730...,132,,84,84,46,132,...,CAMRIS^Falk,,BOLD_IMAGE_run03,epfid2d1_84,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM', 'MO...",,028Y,M,20211201,1.3.12.2.1107.5.2.43.66044.2021120118424059730...


* The goal is to set up a mapping between values some of these fields and a BIDS compliant file


* The most useful fields for this are the `series_id` or `series_description` and the 4 dimension fields.
  * `dim4` is the number of volumes and using this can help when there is a partial (i.e. scan started and stopped quickly) scan before a complete one where the `series_id` will be the same.

In [7]:
scan_df[['series_id', 'series_description', 'dim1','dim2','dim3','dim4']]

Unnamed: 0,series_id,series_description,dim1,dim2,dim3,dim4
0,1-localizer_multislice,localizer_multislice,512,512,9,1
1,2-AAHead_Scout,AAHead_Scout,160,160,128,1
2,3-AAHead_Scout,AAHead_Scout_MPR_sag,162,162,5,1
3,4-AAHead_Scout,AAHead_Scout_MPR_cor,162,162,3,1
4,5-AAHead_Scout,AAHead_Scout_MPR_tra,162,162,3,1
5,6-BOLD_IMAGE_run01,BOLD_IMAGE_run01,84,84,46,132
6,7-FieldMap_PA,FieldMap_PA,84,84,46,4
7,8-BOLD_IMAGE_run02,BOLD_IMAGE_run02,84,84,46,132
8,9-FieldMap_PA,FieldMap_PA,84,84,46,4
9,10-BOLD_IMAGE_run03,BOLD_IMAGE_run03,84,84,46,132


* There is a Python file called `heuristic.py` that needs to be edited to set up the mappings between dicom files and these scan data and the location within the BIDS structure in the output folder.


* The two things that need to be added to this files are:
    1. __KEYS__ that provide a file template and mapping to where in the BIDS output a certain filetype should be written
    2. __MATCHES__ are condition statements that match a scan using the fields in the dataframe above and map them to a specific key



* __KEYS__
Use the ‘create_key’ function to create keys inside of the ‘infotodict’ function. See how keys are created for the t1 scan, the different functional runs, and the field maps. These are put into a dictionary, and the values will be added using the conditional matching.

```python
 t1w = create_key('sub-{subject}/anat/sub-{subject}_T1w')
    
    func_read=create_key('sub-{subject}/func/sub-{subject}_task-read_run-{item:01d}_bold')
    func_share=create_key('sub-{subject}/func/sub-{subject}_task-share_run-{item:01d}_bold')

    fmap_AP=create_key('sub-{subject}/fmap/sub-{subject}_acq-{item:01d}_dir-AP_epi')

    
    info = {
            t1w: [], 
            func_read: [], func_share: [],
            fmap_AP: []
           }
    
```

![](img/heudiconv1.png)


* __CONDITIONAL MATCHES__
Now, using the information from the dicominfo.tsv, we will go through the series and put the scans only when they have the full number of scans. 
```python
if (s.dim1 == 256) and (s.dim2 == 192) and ('MPRAGE_TI1100_ipat2' in s.series_id):
            info[t1w].append(s.series_id)
        
        if (s.dim4 == 824 or s.dim4 == 813) and ('read' in s.series_id):
            info[func_read].append(s.series_id)

        if (s.dim4 == 865 or s.dim4 == 854 or s.dim4 == 568) and ('share' in s.series_id):
            info[func_share].append(s.series_id)

        if (s.dim4 == 3) and ('epi' in s.series_id):
            info[fmap_AP].append(s.series_id)
```
![](img/heudiconv2.png)





* Once these are added to `heuristic.py` which we keep in:
    ```
    {Your Project}/scripts/BIDS/heudiconv/code
    ```
    
    
* and we delete the `.heudiconv` heuristic folder. Then we are ready to convert DICOM files to NIFTIs and set them up in a BIDS compliant structure!

In [10]:
rm -fr heudiconv/.heudiconv

In [11]:
!singularity run --cleanenv \
    -B /data00/projects/geoscan_v2:/base  \
    -B /fmriDataRaw/fmri_data_raw:/raw \
    /data00/tools/singularity_images/heudiconv_0.8.0 \
    -d /raw/geoscan_R01/{subject}_t2/*/*.dcm \
    -o /base/data/bids_data/ \
    -f /base/scripts/BIDS/heudiconv/code/heuristic.py -ss t2 -s 'GS025'  -c dcm2niix -b --overwrite

INFO: Running heudiconv version 0.8.0 latest 0.11.3
INFO: Need to process 1 study sessions
INFO: PROCESSING STARTS: {'subject': 'GS025', 'outdir': '/base/data/bids_data/', 'session': 't2'}
INFO: Processing 1179 dicoms
INFO: Reloading existing filegroup.json because /base/data/bids_data/.heudiconv/GS025/ses-t2/info/GS025_ses-t2.edit.txt exists
INFO: Doing conversion using dcm2niix
INFO: Converting /base/data/bids_data/sub-GS025/ses-t2/anat/sub-GS025_ses-t2_T1 (160 DICOMs) -> /base/data/bids_data/sub-GS025/ses-t2/anat . Converter: dcm2niix . Output types: ('nii.gz',)
220612-21:25:42,96 nipype.utils INFO:
	 Running nipype version 1.4.2 (latest: 1.8.1)
INFO: Running nipype version 1.4.2 (latest: 1.8.1)
220612-21:25:42,106 nipype.workflow INFO:
	 [Node] Setting-up "convert" in "/tmp/dcm2niix2mb2gegf/convert".
INFO: [Node] Setting-up "convert" in "/tmp/dcm2niix2mb2gegf/convert".
220612-21:25:42,174 nipype.workflow INFO:
	 [Node] Running "convert" ("nipype.interfaces.dcm2nii.Dcm2niix"), a Com

220612-21:25:45,599 nipype.interface INFO:
	 stdout 2022-06-12T21:25:45.599357:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GS025/ses-t2/fmaps/sub-GS025_ses-t2_acq-1_epi.nii"
INFO: stdout 2022-06-12T21:25:45.599357:Compress: "/usr/bin/pigz" -b 960 -n -f -6 "./base/data/bids_data/sub-GS025/ses-t2/fmaps/sub-GS025_ses-t2_acq-1_epi.nii"
220612-21:25:45,599 nipype.interface INFO:
	 stdout 2022-06-12T21:25:45.599357:Conversion required 0.131952 seconds (0.034707 for core code).
INFO: stdout 2022-06-12T21:25:45.599357:Conversion required 0.131952 seconds (0.034707 for core code).
220612-21:25:45,627 nipype.workflow INFO:
	 [Node] Finished "convert".
INFO: [Node] Finished "convert".
220612-21:25:45,800 nipype.workflow INFO:
	 [Node] Setting-up "embedder" in "/tmp/embedmetapnp7q21m/embedder".
INFO: [Node] Setting-up "embedder" in "/tmp/embedmetapnp7q21m/embedder".
220612-21:25:45,824 nipype.workflow INFO:
	 [Node] Running "embedder" ("nipype.interfaces.utility.wrappers.F

220612-21:25:46,709 nipype.workflow INFO:
	 [Node] Running "embedder" ("nipype.interfaces.utility.wrappers.Function")
INFO: [Node] Running "embedder" ("nipype.interfaces.utility.wrappers.Function")
	 Storing result file without outputs
	 [Node] Error on "embedder" (/tmp/embedmetavu2lut5o/embedder)
ERROR: Embedding failed: 'NoneType' object is not iterable
INFO: Post-treating /base/data/bids_data/sub-GS025/ses-t2/fmaps/sub-GS025_ses-t2_acq-3_epi.json file
INFO: Converting /base/data/bids_data/sub-GS025/ses-t2/fmaps/sub-GS025_ses-t2_acq-4_epi (4 DICOMs) -> /base/data/bids_data/sub-GS025/ses-t2/fmaps . Converter: dcm2niix . Output types: ('nii.gz',)
220612-21:25:46,777 nipype.workflow INFO:
	 [Node] Setting-up "convert" in "/tmp/dcm2niix6m5ofsyp/convert".
INFO: [Node] Setting-up "convert" in "/tmp/dcm2niix6m5ofsyp/convert".
220612-21:25:46,782 nipype.workflow INFO:
	 [Node] Running "convert" ("nipype.interfaces.dcm2nii.Dcm2niix"), a CommandLine Interface with command:
dcm2niix -b y -z y -