# DICOM-to-BIDS conversion using `heudiconv` and `singularity`

* This notebook walks through the steps for doing DICOM to BIDS conversion using `heudiconv` with the Singularity image for a single participant to identify the correct parameters


* For a new project the initial configuration steps will need to be completed to create the `heuristic.py` file. But this only needs to be done once and then it should be straight forward to run new participants.

* There is a tutorial on using heudiconv:
    - http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/

  that was used as a basis for this but required some tweaks.
  

#### HISTORY

* 9/9/21 dcosme - separated into two notebooks; this notebooks is to identify the parameters, the other creates jobs for each participant to convert
* 4/1/20 mbod - initial setup for MURI DICOMS

## Location of files

1. DICOMS for UPenn MURI data are in:
    ```
    /fmriDataRaw/fmri_data_raw/bbprime/
       
    ```
    
    * DICOMS (`.dcm` files) should match:
    ```
    /fmriDataRaw/fmri_data_raw/bbprime/{subject}/{scan}/*.dcm
    ```
    
    
2. BIDS files for bbprime should be in 
    ```
    /data00/projects/bbprime/data/bids_data
    ```
    
3. Config files for `heudiconv` will be in
    ```
    /data00/projects/bbprime/scripts/BIDS/heudiconv
    ```

In [1]:
ls /fmriDataRaw/fmri_data_raw/bbprime/

[0m[01;34mBPP00[0m/  [01;34mBPP01[0m/  [01;34mBPP05[0m/  [01;34mBPP06[0m/  [01;34mBPP07[0m/  [01;34mBPP11[0m/  [01;34mBPP12[0m/  [01;34mBPP13[0m/  [01;34mBPP19[0m/


In [2]:
ls  /data00/projects/bbprime/data/bids_data

[0m[01;32mCHANGES[0m*                   [01;32mREADME[0m*     [34;42msub-BPP06[0m/
[01;32mdataset_description.json[0m*  [01;32mREADME.md[0m*  task-read_bold.json
[34;42mderivatives[0m/               [34;42msub-BPP00[0m/  task-share_bold.json
[01;32mparticipants.json[0m*         [34;42msub-BPP01[0m/
[01;32mparticipants.tsv[0m*          [34;42msub-BPP05[0m/


### Setup

In [1]:
import pandas as pd

## Create the configuration files for a new project

* For a new project you need to run `heudiconv` in heuristic mode once on a representative participant (i.e. has all the possible scans for a subject in study) and it will generate:
    - a TSV file called `dicominfo.tsv` that contains the details of each of the scans in the dataset
    - a Python template file called `heuristic.py` that you edit to setup the dicom volumes to NIFTI mapping
    
    
* To run `heudiconv` in heuristic mode with Singularity image, use:
    ```
    !singularity run --cleanenv \
        -B /data00/projects/bbprime/data/bids_data:/base  \
        -B /fmriDataRaw/fmri_data_raw:/raw \
        /data00/tools/singularity_images/heudiconv_0.8.0 \
        -d /raw/bbprime/{subject}/*/*.dcm \
        -o heudiconv/ -f convertall -s {subject} -c none --overwrite
    ```
     where
     * `-B /data00/BIDS:/base` makes a file mapping for singularity to where the data should be written (i.e. `/data00/BIDS/{project...}`)
     * `-B /fmriDataRaw/fmri_data_raw:/raw` maps the DICOM directory to singularity location `/raw`
     * `/data00/tools/singularity_images/heudiconv_0.8.0` is the path to the `heudiconv` Singularity image you want to use. (On 4/22/20 that is version 0.8.0 - but this will change and we try to keep an updated version)
     * `-d /raw/bbprime/{subject}/*/*.dcm` is the file matching template for finding DICOM files. Here `/raw` maps to `/fmriDataRaw/fmri_data_raw` so we are looking for files that end with `.dcm` in `/fmriDataRaw/fmri_data_raw/bbprime/{subject}/*/*.dcm` - where subject will be specified with the -s param, e.g. `MURI155`
     * `-o /base/MURI/heudiconv/` is the location for output files from `heudiconv` to be written
     * `-f convertall` means include all dicom files and scans (i.e. not just one type like anatomical or ignore localizers etc)
     * `-c none` puts it into heuristic mode. This option specifies which tool to use to do dicom2nifti conversion and will be changed once the configuration is set up.


In [2]:
!singularity run --cleanenv \
    -B /data00/projects/bbprime/data/bids_data:/base  \
    -B /fmriDataRaw/fmri_data_raw:/raw \
    /data00/tools/singularity_images/heudiconv_0.8.0 \
    -d /raw/bbprime/{subject}/*/*.dcm \
    -o heudiconv/ -f convertall -s BPP06 -c none --overwrite

INFO: Running heudiconv version 0.8.0 latest 0.9.0


* The output from this run is in a hidden folder called .heudiconv

In [4]:
# !ls -a heudiconv/.heudiconv/BPP00/info
# !more heudiconv/code/heuristic.py

ls: cannot access heudiconv/.heudiconv/BPP00/info: No such file or directory
import os


def create_key(template, outtype=('nii.gz',), annotation_classes=None):
    if template is None or not template:
        raise ValueError('Template must be a valid format string')
    return template, outtype, annotation_classes


def infotodict(seqinfo):
    """Heuristic evaluator for determining which runs belong where

    allowed template fields - follow python string module:

    item: index within category
    subject: participant id
    seqitem: run number during scanning
    subindex: sub index within group
    """
    
    func_read=create_key('sub-{subject}/func/sub-{subject}_task-read_run-{item:0
1d}_bold')
    func_share=create_key('sub-{subject}/func/sub-{subject}_task-share_run-{item
[Km--More--(44%)[m

* We can look at the `dicominfo.tsv` in Pandas

In [9]:
scan_df=pd.read_csv('heudiconv/.heudiconv/BPP06/info/dicominfo.tsv', sep='\t')

In [10]:
scan_df

Unnamed: 0,total_files_till_now,example_dcm_file,series_id,dcm_dir_name,series_files,unspecified,dim1,dim2,dim3,dim4,...,study_description,referring_physician_name,series_description,sequence_name,image_type,accession_number,patient_age,patient_sex,date,series_uid
0,9,1.3.12.2.1107.5.2.43.66044.2021071413532424170...,1-localizer_multislice,1.3.12.2.1107.5.2.43.66044.2021071413531829780...,9,,512,512,9,1,...,CAMRIS^Falk,,localizer_multislice,*fl2d1,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413531829780...
1,137,1.3.12.2.1107.5.2.43.66044.2021071413542236051...,2-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021071413542235964...,128,,160,160,128,1,...,CAMRIS^Falk,,AAHead_Scout,*fl3d1_ns,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413542235964...
2,142,1.3.12.2.1107.5.2.43.66044.2021071413542645971...,3-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021071413542644312...,5,,162,162,5,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_sag,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413542644312...
3,145,1.3.12.2.1107.5.2.43.66044.2021071413542646066...,4-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021071413542644354...,3,,162,162,3,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_cor,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413542644354...
4,148,1.3.12.2.1107.5.2.43.66044.2021071413542646119...,5-AAHead_Scout,1.3.12.2.1107.5.2.43.66044.2021071413542644376...,3,,162,162,3,1,...,CAMRIS^Falk,,AAHead_Scout_MPR_tra,*fl3d1_ns,"('DERIVED', 'PRIMARY', 'MPR', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413542644376...
5,372,1.3.12.2.1107.5.2.43.66044.2021071413592624397...,6-MPRAGE_TI1100_ipat2,1.3.12.2.1107.5.2.43.66044.2021071413553130434...,224,,256,192,224,1,...,CAMRIS^Falk,,MPRAGE_TI1100_ipat2,*tfl3d1_16,"('ORIGINAL', 'PRIMARY', 'M', 'ND', 'NORM')",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071413553130434...
6,1185,1.3.12.2.1107.5.2.43.66044.2021071414020790336...,7-task-read_run-1_bold,1.3.12.2.1107.5.2.43.66044.2021071414015597889...,813,,80,80,36,813,...,CAMRIS^Falk,,task-read_run-1_bold,epfid2d1_80,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071414015597889...
7,1188,1.3.12.2.1107.5.2.43.66044.2021071414094325357...,8-task-read_run-1_epi,1.3.12.2.1107.5.2.43.66044.2021071414093378038...,3,,80,80,36,3,...,CAMRIS^Falk,,task-read_run-1_epi,epfid2d1_80,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071414093378038...
8,2042,1.3.12.2.1107.5.2.43.66044.2021071414110826105...,9-task-share_run-1_bold,1.3.12.2.1107.5.2.43.66044.2021071414094352151...,854,,80,80,36,854,...,CAMRIS^Falk,,task-share_run-1_bold,epfid2d1_80,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071414094352151...
9,2045,1.3.12.2.1107.5.2.43.66044.2021071414190592820...,10-task-share_run-1_epi,1.3.12.2.1107.5.2.43.66044.2021071414185584301...,3,,80,80,36,3,...,CAMRIS^Falk,,task-share_run-1_epi,epfid2d1_80,"('ORIGINAL', 'PRIMARY', 'M', 'MB', 'ND', 'NORM...",,,M,20210714,1.3.12.2.1107.5.2.43.66044.2021071414185584301...


* The goal is to set up a mapping between values some of these fields and a BIDS compliant file


* The most useful fields for this are the `series_id` or `series_description` and the 4 dimension fields.
  * `dim4` is the number of volumes and using this can help when there is a partial (i.e. scan started and stopped quickly) scan before a complete one where the `series_id` will be the same.

In [11]:
scan_df[['series_id', 'series_description', 'dim1','dim2','dim3','dim4']]

Unnamed: 0,series_id,series_description,dim1,dim2,dim3,dim4
0,1-localizer_multislice,localizer_multislice,512,512,9,1
1,2-AAHead_Scout,AAHead_Scout,160,160,128,1
2,3-AAHead_Scout,AAHead_Scout_MPR_sag,162,162,5,1
3,4-AAHead_Scout,AAHead_Scout_MPR_cor,162,162,3,1
4,5-AAHead_Scout,AAHead_Scout_MPR_tra,162,162,3,1
5,6-MPRAGE_TI1100_ipat2,MPRAGE_TI1100_ipat2,256,192,224,1
6,7-task-read_run-1_bold,task-read_run-1_bold,80,80,36,813
7,8-task-read_run-1_epi,task-read_run-1_epi,80,80,36,3
8,9-task-share_run-1_bold,task-share_run-1_bold,80,80,36,854
9,10-task-share_run-1_epi,task-share_run-1_epi,80,80,36,3


* There is a Python file called `heuristic.py` that needs to be edited to set up the mappings between dicom files and these scan data and the location within the BIDS structure in the output folder.


* The two things that need to be added to this files are:
    1. __KEYS__ that provide a file template and mapping to where in the BIDS output a certain filetype should be written
    2. __MATCHES__ are condition statements that match a scan using the fields in the dataframe above and map them to a specific key



* __KEYS__
![](img/heudiconv1.png)

* __CONDITIONAL MATCHES__
![](img/heudiconv2.png)





* Once these are added to `heuristic.py` which we keep in:
    ```
    /data00/BIDS/MURI/heudiconv/code
    ```
    
    
* and we delete the `.heudiconv` heuristic folder. Then we are ready to convert DICOM files to NIFTIs and set them up in a BIDS compliant structure!

In [14]:
rm -fr heudiconv/.heudiconv