# DICOM-to-BIDS conversion using `heudiconv` and `singularity`

* This notebook walks through the steps for doing DICOM to BIDS conversion using `heudiconv` with the Singularity image for a single participant to identify the correct parameters


* For a new project the initial configuration steps will need to be completed to create the `heuristic.py` file. But this only needs to be done once and then it should be straight forward to run new participants.

* There is a tutorial on using heudiconv:
    - http://reproducibility.stanford.edu/bids-tutorial-series-part-2a/

  that was used as a basis for this but required some tweaks.
  

#### HISTORY

* 6/14/22 aresni - reformatted to take in a config.json, streamline use
* 9/9/21 dcosme - separated into two notebooks; this notebooks is to identify the parameters, the other creates jobs for each participant to convert
* 4/1/20 mbod - initial setup for MURI DICOMS

## Initialize Paths

* Run the cell below to initialize paths from the Config.json file. If Config is set up for this project, you shouldn't have to make any changes to the cell below. Check the output to make sure the directories match what you expect.

In [None]:
import pandas as pd
import os

config_path = "../Config.json"

with open(config_path, 'r') as f:
    config = json.load(f)
path_object = "Environment"

project_name = config.get("Description").get("project_name")

project_directory = config.get(path_object).get( # Path to base project directory
    'project_path','/data00/projects/{project_name}').format(project_name = project_name) 

raw_directory = config.get(path_object).get(   # Path to base raw fmri directory
    'raw_path','/fmriDataRaw/fmri_data_raw') 

heudiconv_image_path = config.get(path_object).get(    # path to heudiconv image
    "heudiconv_image_path", 'Config["Environment"]["heudiconv_image_path"] Not Found')

bids_data_directory = os.path.join(
    config.get(path_object).get(    # path to where your bids_data will be stored within base
        "bids_data_directory", "data/bids_data")
)

bids_scripts_directory = config.get(path_object).get(       # path to where this script lives (scripts/BIDS)
        "bids_script_directory", "scripts/BIDS")

job_directory = os.path.join(
    project_directory,
    bids_scripts_directory,
    'jobs')

print('Project:', project_name)
print('\nDirectories:')
print('Project Directory:', project_directory)
print('Raw Dicom Project Directory:', os.path.join(raw_directory, project_name))
print('BIDS Data Directory:', os.path.join(project_directory, bids_data_directory))
print('BIDS Scripts Directory:', os.path.join(project_directory, bids_scripts_directory))
print('Jobs directory:', os.path.join(project_directory, bids_scripts_directory, 'jobs'))
print('Heudiconv Singularity Image:', heudiconv_image_path)

## Location of files

1. DICOMS for UPenn data are in:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/
       
    ```
    
    * DICOMS (`.dcm` files) should match:
    ```
    /fmriDataRaw/fmri_data_raw/{project}/{subject}/{scan}/*.dcm
    ```
    
    
2. BIDS files should be in  (may need to create these directories)
    ```
    /data00/projects/{project}/data/bids_data
    ```
    
3. Config files for `heudiconv` will be in
    ```
    /data00/projects/{project}/scripts/BIDS/heudiconv
    ```

In [None]:
os.listdir(os.path.join(raw_directory, project_name)) 

In [None]:
os.listdir(os.path.join(project_directory, bids_data_directory))

### Setup

## Create the heuristic configuration files for a new project

* For a new project you need to run `heudiconv` in heuristic mode once on a representative participant (i.e. has all the possible scans for a subject in study) and it will generate:
    - a TSV file called `dicominfo.tsv` that contains the details of each of the scans in the dataset
    - a Python template file called `heuristic.py` that you edit to setup the dicom volumes to NIFTI mapping
    
    
* To run `heudiconv` in heuristic mode with Singularity image, edit the cell below using the correct paths for your directory (replace {Project Name}). Use:
    ```
    !singularity run --cleanenv \
        -B /data00/projects/{Project Name}/data/bids_data:/base  \
        -B /fmriDataRaw/fmri_data_raw:/raw \
        /data00/tools/singularity_images/heudiconv_0.8.0 \
        -d /raw/{Project Name}/{subject}/*/*.dcm \
        -o heudiconv/ -f convertall -s {subject} -c none --overwrite
    ```
     where
     * `-B /data00/BIDS:/base` makes a file mapping for singularity to where the data should be written (i.e. `/data00/BIDS/{project...}`)
     * `-B /fmriDataRaw/fmri_data_raw:/raw` maps the DICOM directory to singularity location `/raw`
     * `/data00/tools/singularity_images/heudiconv_0.8.0` is the path to the `heudiconv` Singularity image you want to use. (On 4/22/20 that is version 0.8.0 - but this will change and we try to keep an updated version)
     * `-d /raw/{Project Name}/{subject}/*/*.dcm` is the file matching template for finding DICOM files. Here `/raw` maps to `/fmriDataRaw/fmri_data_raw` so we are looking for files that end with `.dcm` in `/fmriDataRaw/fmri_data_raw/{Project Name}/{subject}/*/*.dcm` - where subject will be specified with the -s param, e.g. `MURI155`
     * `-o /base/{Project Name}/heudiconv/` is the location for output files from `heudiconv` to be written
     * `-f convertall` means include all dicom files and scans (i.e. not just one type like anatomical or ignore localizers etc)
     * `-c none` puts it into heuristic mode. This option specifies which tool to use to do dicom2nifti conversion and will be changed once the configuration is set up.

## Enter subject and session below

* Edit the subject template to match how your raw dicoms are saved, but don't remove the '{subject}'
* Then, run the two cells below to get your .tsv file you can use to edit your heuristic.py

In [None]:
sub = ['']
session = '' # Run one session at a time. If only one session, leave as empty string

raw_subject_template = os.path.join(
    '{subject}', # Keep the '{subject}' string, but feel free to add any suffixes e.g. '{subject}_session01'
    '*',
    '*.dcm'      # Pattern matching to get all of the raw dicoms in the subject's folder  
)

In [None]:
getHeuristic_template = r'''#!/bin/bash
singularity run --cleanenv \
    -B {project_directory}:/base  \
    -B {raw_directory}:/raw \
    {heudiconv_image_path} \
    -d /raw/{project_name}/{raw_subject_template} \
    -o heudiconv/ -f convertall {session_flag}-s {pID} -c none --overwrite
'''


session_flag = ''
if session:
    session_flag = '-ss ' + session + ' '

getHeuristic = getHeuristic_template.format( 
            project_directory = project_directory,
            raw_directory = raw_directory,
            heudiconv_image_path = heudiconv_image_path,
            project_name = project_name,
            raw_subject_template = raw_subject_template,
            bids_data_directory = bids_data_directory,
            bids_scripts_directory = bids_scripts_directory,
            session_flag = session_flag,
            pID = sub)
print("Running:\n", getHeuristic)
!{getHeuristic}

* The output from this run is in a hidden folder called .heudiconv

In [None]:
import glob
glob.glob('heudiconv/.heudiconv/{}/info/*'.format(sub))

* We can look at the `dicominfo.tsv` in Pandas
* The goal is to set up a mapping between values some of these fields and a BIDS compliant file


* The most useful fields for this are the `series_id` or `series_description` and the 4 dimension fields.
  * `dim4` is the number of volumes and using this can help when there is a partial (i.e. scan started and stopped quickly) scan before a complete one where the `series_id` will be the same.

In [None]:
scan_df=pd.read_csv('heudiconv/.heudiconv/{}/info/dicominfo.tsv'.format(sub), sep='\t')
scan_df[['series_id', 'series_description', 'dim1','dim2','dim3','dim4']]

* There is a Python file called `heuristic.py` that needs to be edited to set up the mappings between dicom files and these scan data and the location within the BIDS structure in the output folder.


* The two things that need to be added to this files are:
    1. __KEYS__ that provide a file template and mapping to where in the BIDS output a certain filetype should be written
    2. __MATCHES__ are condition statements that match a scan using the fields in the dataframe above and map them to a specific key



* __KEYS__
Use the ‘create_key’ function to create keys inside of the ‘infotodict’ function. See how keys are created for the t1 scan, the different functional runs, and the field maps. These are put into a dictionary, and the values will be added using the conditional matching.

```python
 t1w = create_key('sub-{subject}/anat/sub-{subject}_T1w')
    
    func_read=create_key('sub-{subject}/func/sub-{subject}_task-read_run-{item:01d}_bold')
    func_share=create_key('sub-{subject}/func/sub-{subject}_task-share_run-{item:01d}_bold')

    fmap_AP=create_key('sub-{subject}/fmap/sub-{subject}_acq-{item:01d}_dir-AP_epi')

    
    info = {
            t1w: [], 
            func_read: [], func_share: [],
            fmap_AP: []
           }
    
```

![](img/heudiconv1.png)


* __CONDITIONAL MATCHES__
Now, using the information from the dicominfo.tsv, we will go through the series and put the scans only when they have the full number of scans. 
```python
if (s.dim1 == 256) and (s.dim2 == 192) and ('MPRAGE_TI1100_ipat2' in s.series_id):
            info[t1w].append(s.series_id)
        
        if (s.dim4 == 824 or s.dim4 == 813) and ('read' in s.series_id):
            info[func_read].append(s.series_id)

        if (s.dim4 == 865 or s.dim4 == 854 or s.dim4 == 568) and ('share' in s.series_id):
            info[func_share].append(s.series_id)

        if (s.dim4 == 3) and ('epi' in s.series_id):
            info[fmap_AP].append(s.series_id)
```
![](img/heudiconv2.png)





* Once these are added to `heuristic.py` which we keep in:
    ```
    {Your Project}/scripts/BIDS/heudiconv/code
    ```
    
    
* and we delete the `.heudiconv` heuristic folder. Then we are ready to convert DICOM files to NIFTIs and set them up in a BIDS compliant structure!

In [None]:
rm -fr heudiconv/.heudiconv

## Try it out

Run the cell below to bidsify the dicoms of the subject you just identified and check the output. If the heuristics.py will work for the rest of your participants, run them using the dicom2bids_heudiconv_loop notebook. If there are errors, or your data isn't correctly BIDSified, try editing the heuristic.py file and try again.

In [None]:
heudiConvert_template = r'''#!/bin/bash
singularity run --cleanenv \
    -B {project_directory}:/base  \
    -B {raw_directory}:/raw \
    {heudiconv_image_path} \
    -d /raw/{project_name}/{raw_subject_template} \
    -o /base/{bids_data_directory} \
    -f /base/{bids_scripts_directory}/heudiconv/code/heuristic.py {session_flag}-s {pID} -c dcm2niix -b --overwrite
'''

session_flag = ''
if session:
    session_flag = '-ss ' + session + ' '

if os.path.exists(job_directory) == False:
    os.mkdir(job_directory)
    
file_path = os.path.join(job_directory, job_name).format(sub)
#print(file_path)
heudiConvert = heudiConvert_template.format(
       project_directory = project_directory,
       raw_directory = raw_directory,
       heudiconv_image_path = heudiconv_image_path,
       project_name = project_name,
       raw_subject_template = raw_subject_template,
       bids_data_directory = bids_data_directory,
       bids_scripts_directory = bids_scripts_directory,
       session_flag = session_flag,
       pID = sub)
try:
    with open(file_path.format(sub), 'w') as job:
       job.write(heudiConvert)
    print("-------------- Job Created For: {} -------------\n".format(sub), heudiConvert)
except IOError as e:
    print ("I/O error({0}): {1}".format(e.errno, e.strerror))
except: #handle other exceptions such as attribute errors
    print ("Unexpected error:", sys.exc_info()[0])

print('-------------- Running: {} -------------'.format(sub))

if session:
    file_path = os.path.join(job_directory, 'heudiconv_{}_ses-{}.job').format(s,session)
else:
    file_path = os.path.join(job_directory, 'heudiconv_{}.job').format(s)
print(file_path)

!bash $file_path
print('-------------- Subject {} Done -------------'.format(sub))