The finishing touches on the GitHub first release and README's.

ABCD-STUDY · Apr 28, 2019 · 36f9bb2 · 36f9bb2
1 parent c61bc42
commit 36f9bb2
Show file tree

Hide file tree

Showing 7 changed files with 244 additions and 64 deletions.
diff --git a/DAL_ABCD_merged_pcqcinfo_importer.m b/DAL_ABCD_merged_pcqcinfo_importer.m
@@ -1,15 +1,15 @@
+function data = DAL_ABCD_merged_pcqcinfo_importer(filename)
 %% Import data from text file.
 % Script for importing data from the following text file:
 %
-%    /mnt/max/shared/projects/ABCD/daic_spreadsheet/DAL_ABCD_merged_pcqcinfo.csv
+%    spreadsheets/DAL_ABCD_merged_pcqcinfo.csv
 %
 % To extend the code to different selected data or a different text file,
 % generate a function instead of a script.
 
 % Auto-generated by MATLAB on 2018/08/14 13:31:43
 
 %% Initialize variables.
-filename = './spreadsheets/20181127_DAL_ABCD_QC_merged_pcqcinfo.csv';
 delimiter = ',';
 startRow = 2;
 
@@ -119,3 +119,4 @@
 data = DALABCDmergedpcqcinfo;
 QC_flag = data.QC == 1;
 cleandata_idx = QC_flag;
+
diff --git a/README.md b/README.md
@@ -1,58 +1,105 @@
-# ABCD QC-Passed Image Downloader
+# ABCD DICOM to BIDS
 
-## Required spreadsheets
+Written by the OHSU ABCD site for selectively downloading ABCD Study imaging DICOM data QC'ed as good by the ABCD DAIC site, converting it to BIDS standard input data, selecting the best pair of spin echo field maps, and correcting the sidecar JSON files to meet the BIDS Validator specification.
 
-To download images for ABCD you must have two spreadsheets:
+## Installation
 
-1. `DAL_ABCD_merged_pcqcinfo.csv`
+Clone this repository and save it somewhere on the Linux system you want to do ABCD DICOM downloads and conversions to BIDS.
+
+## Dependencies
+
+1. [MathWorks MATLAB (R2016b and newer)](https://www.mathworks.com/products/matlab.html)
+1. [Python 2.7](https://www.python.org/download/releases/2.7/)
+1. [NIMH Data Archive (NDA) `nda_aws_token_generator`](https://github.com/NDAR/nda_aws_token_generator)
+1. [cbedetti Dcm2Bids](https://github.com/cbedetti/Dcm2Bids) (`export` into your BASH `PATH` variable)
+1. [Rorden Lab dcm2niix](https://github.com/rordenlab/dcm2niix) (`export` into your BASH `PATH` variable)
+1. [zlib's pigz-2.4](https://zlib.net/pigz) (`export` into your BASH `PATH` variable)
+
+## Spreadsheets (not included)
 
-  - This is provided to us by the DAIC
-  - Contains operator QC information for each scan. If the image fails operator QC (0) the image is not downloaded
+To download images for ABCD you must have two spreadsheets downloaded to this repository's `spreadsheets` folder:
 
+1. `DAL_ABCD_merged_pcqcinfo.csv`
 1. `image03.txt`
 
-  - This is downloaded from the NIMH Data Archive.
-  - Contains paths to the tgz on s3 where the image is downloaded from
-  - Login to the NIMH Data Archive (https://ndar.nih.gov/)
-  - Go to "Data Dictionary" under Quick Navigation
-  - Select all ABCD Releases under Source
-  - Click 'Filter'
-  - Select just Image/image03
-  - Click 'Download'
-  - In the upper right hand corner under 'Selected Filters' click 'Download/Add to Study'
-    - Under Collections by Permission Group click 'Deselect All'
-    - At the bottom re-select 'Adolescent Brain Cognitive Development (ABCD)'
-    - Click 'Create Package'
-      - Name the package Image03
-      - Select only 'Include documentation'
-      - Click 'Create Package'
-  - Download the Package Manager and download
+`DAL_ABCD_merged_pcqcinfo.csv` was provided to OHSU by the ABCD DAIC.  A future version of this code will utilize the [NIMH Data Archive (NDA)](https://ndar.nih.gov/) version of this QC information.  The spreadsheet contains operator QC information for each MRI series.  If the image fails operator QC (a score of 0) the image is not downloaded.
+
+`image03.txt` can be downloaded from [the NDA](https://ndar.nih.gov/) with an ABCD Study Data Use Certification in place.  It contains paths to the TGZ files on the NDA's Amazon AWS S3 buckets where the images can be downloaded from per series.  The following are explicit steps to download just this file:
+
+1. Login to the [NIMH Data Archive](https://ndar.nih.gov/)
+1. Go to **Data Dictionary** under **Quick Navigation**
+1. Select **All ABCD Releases** under **Source**
+1. Click **Filter**
+1. Select just **Image/image03**
+1. Click **Download**
+1. In the upper right hand corner under **Selected Filters** click **Download/Add to Study**
+    - Under **Collections** by **Permission Group** click **Deselect All**
+    - At the bottom re-select **Adolescent Brain Cognitive Development (ABCD)**
+1. Click **Create Package**
+    - Name the package something like **Image03**
+    - Select Only **Include documentation**
+    - Click **Create Package**
+1. Download and use the **Package Manager** to download your package
+
+## Setup
+
+You will also need to reach into the `nda_aws_token_maker.py` in this repository and update it with your NDA USERNAME and PASSWORD.  Make sure the file is locked down to only your own read and write privileges so no one else can read your username and password in there:
+
+```
+chmod 600 nda_aws_token_maker.py
+```
+
+We don't have a better solution for securing your credentials while automating downloads right now.
 
-These two files are used in the `data_gatherer.m` to create the `ABCD_good_bad_series_table.csv` that is used to actually download the images.
+## Usage
 
-`data_gatherer.m` also depends on a mapping file (`mapping.mat`), which maps among other things the SeriesDescriptions from each txt to known OHSU descriptors that classify each tgz into T1, T2, rfMRI, tfMRI_nBack, etc.
+This repo's usage is broken out into four distinct scripting sections.  You will need to run them in order, each independently of the next waiting for the first to complete.
 
-## Required software dependencies
+1. (MATLAB) `data_gatherer.m`
+2. (Python) `good_bad_series_parser.py`
+3. (BASH) `unpack_and_setup.sh`
+4. (Python) `correct_jsons.py`
 
-From there, other necessary scripts are the `nda_aws_token_maker.py` to be called before each attempted DICOM series TGZ download.  Requires:
+The MATLAB portion is for producing a download list for the Python & BASH portion to download, convert, select, and prepare.
 
-- https://github.com/NDAR/nda_aws_token_generator
+## 1. (MATLAB) `data_gatherer.m`
 
-You will also need to reach into the `nda_aws_token_maker.py` and update it with your NDA USERNAME and PASSWORD.  Make sure the file is locked down with `chmod 600 nda_aws_token_maker.py` so no one else can read your username and password in there.  We don't have a better solution right now.
+The two spreadsheets referenced above are used in the `data_gatherer.m` to create the `ABCD_good_and_bad_series_table.csv` which gets used to actually download the images.
 
-# Usage
+`data_gatherer.m` depends on a mapping file (`mapping.mat`), which maps the SeriesDescriptions to known OHSU descriptors that classify each TGZ file into T1, T2, task-rest, task-nback, etc.
 
-The actual download work is done by `good_bad_series_parser.py` which only requires the `ABCD_good_bad_series_table.csv` spreadsheet present under `./spreadsheets/`.
+## 2. (Python) `good_bad_series_parser.py`
 
+The download is done like this:
 
-# ABCD TGZ to BIDS Input Setup
+```
+./good_bad_series_parser.py
+```
+
+This only requires the `ABCD_good_and_bad_series_table.csv` spreadsheet present under a `spreadsheets` folder inside this repository's cloned folder.
+
+**Note:** The `nda_aws_token_maker.py` is called before each attempted DICOM series TGZ download.
 
-`unpack_and_setup.sh` does the work.  It takes three arguments:
+## 3. (BASH) `unpack_and_setup.sh`
+
+`unpack_and_setup.sh` should be called in a loop to do the DICOM to BIDS conversion and spin echo field map selection.  It takes three arguments:
 
 ```
 SUB=$1 # Full BIDS formatted subject ID (sub-SUBJECTID)
 VISIT=$2 # Full BIDS formatted session ID (ses-SESSIONID)
-TGZDIR=$3 # Path to directory containing all .tgz for subject
+TGZDIR=$3 # Path to directory containing all TGZ files for SUB/VISIT
 ```
 
-**IMPORTANT**: update paths inside the script everywhere a `...` appears.
+Here is an example:
+
+```
+./unpack_and_setup.sh sub-NDARINVABCD1234 ses-baselineYear1Arm1 ./new_download/sub-NDARINVABCD1234/ses-baseline_year_1_arm_1
+```
+
+## 4. (Python) `correct_jsons.py`
+
+Finally at the end `correct_jsons.py` is run on the whole BIDS input directory to correct/prepare all BIDS sidecar JSON files to comply with the BIDS specification standard version 1.2.0.
+
+```
+./correct_jsons.py ./ABCD-HCP
+```
diff --git a/correct_jsons.py b/correct_jsons.py
@@ -0,0 +1,115 @@
+#! /usr/bin/env python3
+
+import json,os,sys,argparse,re
+
+__doc__ = \
+"""
+This scripts is meant to correct ABCD BIDS input data to
+conform to the Official BIDS Validator.
+"""
+__version__ = "1.0.0"
+
+def read_json_field(json_path, json_field):
+
+    with open(json_path, 'r') as f:
+        data = json.load(f)
+
+    if json_field in data:
+        return data[json_field]
+    else:
+        return None
+
+def remove_json_field(json_path, json_field):
+
+    with open(json_path, 'r+') as f:
+        data = json.load(f)
+
+        if json_field in data:
+            del data[json_field]
+            f.seek(0)
+            json.dump(data, f, indent=4)
+            f.truncate()
+            flag = True
+        else:
+            flag = False
+
+    return flag
+
+def update_json_field(json_path, json_field, value):
+
+    with open(json_path, 'r+') as f:
+        data = json.load(f)
+
+        if json_field in data:
+            flag = True
+        else:
+            flag = False
+
+        data[json_field] = value
+        f.seek(0)
+        json.dump(data, f, indent=4)
+        f.truncate()
+
+    return flag
+
+def main(argv=sys.argv):
+    parser = argparse.ArgumentParser(
+        prog='correct_jsons.py',
+        description=__doc__,
+        usage='%(prog)s BIDS_DIR'
+    )
+    parser.add_argument(
+        'BIDS_DIR',
+        help='Path to the input BIDS dataset root directory.  Read more '
+             'about the BIDS standard in the link in the description.  It is '
+             'recommended to use Dcm2Bids to convert from participant dicoms '
+             'into BIDS format.'
+    )
+    parser.add_argument(
+        '--version', '-v', action='version', version='%(prog)s ' + __version__
+    )
+
+    args = parser.parse_args()
+
+    for root, dirs, files in os.walk(args.BIDS_DIR):
+        for filename in files:
+            fn, ext = os.path.splitext(filename)
+
+            if ext == '.json':
+                json_path = os.path.join(root, filename)
+
+                with open(json_path, 'r') as f:
+                    data = json.load(f)
+
+                # If TotalReadoutTime is missing from fmap JSON
+                if ('fmap' in root or 'func' in root) and 'TotalReadoutTime' not in data:
+                    # Then check for EffectiveEchoSpacing and ReconMatrixPE
+                    if 'EffectiveEchoSpacing' in data and 'ReconMatrixPE' in data:
+                        # If both are present then update the JSON with a calculated TotalReadoutTime
+                        EffectiveEchoSpacing = data['EffectiveEchoSpacing']
+                        ReconMatrixPE = data['ReconMatrixPE']
+                        # Calculated TotalReadoutTime = EffectiveEchoSpacing * (ReconMatrixPE - 1)
+                        TotalReadoutTime = EffectiveEchoSpacing * (ReconMatrixPE - 1)
+                        update_json_field(json_path, 'TotalReadoutTime', TotalReadoutTime)
+
+                    # If EffectiveEchoSpacing is missing print error
+                    if 'EffectiveEchoSpacing' not in data:
+                        print(json_path + ': No EffectiveEchoSpacing')
+
+                    # If ReconMatrixPE is missing print error
+                    if 'ReconMatrixPE' not in data:
+                        print(json_path + ': No ReconMatrixPE')
+
+                # Find the IntendedFor field that is a non-empty list
+                if 'fmap' in root and 'IntendedFor' in data and len(data['IntendedFor']) > 0:
+                    # Regular expression replace all paths in that list with a relative path to ses-SESSION
+                    intended_list = data['IntendedFor']
+                    corrected_intended_list = [re.sub(r'.*(ses-.*_ses-.+)','\g<1>',entry) for entry in intended_list]
+                    update_json_field(json_path, 'IntendedFor', corrected_intended_list)
+
+                # Remove SliceTiming field from func JSONs
+                if 'func' in root and 'SliceTiming' in data:
+                    remove_json_field(json_path, 'SliceTiming')
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/data_gatherer.m b/data_gatherer.m
@@ -1,8 +1,15 @@
-%%
+%% variable initialization
+
 clear variables
 load mapping.mat
 
-DAL_ABCD_merged_pcqcinfo_importer;
+QC_file = 'spreadsheets/DAL_ABCD_QC_merged_pcqcinfo.csv';
+image03_file = 'spreadsheets/image03.txt';
+output_csv = 'spreadsheets/ABCD_good_and_bad_series_table.csv';
+
+%% QC parsing
+
+data = DAL_ABCD_merged_pcqcinfo_importer(QC_file);
 
 for i = 1:height(data) 
     if data.SeriesTime(i) < 100000 
@@ -13,8 +20,10 @@
 end
 data.CleanFlag = cleandata_idx;
 
-%%
-image03_importer;
+%% image03 parsing
+
+image03 = image03_importer(image03_file);
+
 for i = 1:height(image03)
     image03.timestamp{i} = image03.image_file{i}(end-10:end-5);
 end
@@ -26,19 +35,23 @@
 image03_2.Properties.VariableNames{1} = 'pGUID';
 image03_2.Properties.VariableNames{9} = 'EventName';
 
-%%
+%% table joins
+
 data_1 = innerjoin(data,map_qc_descriptor);
 data_1 = sortrows(data_1,'pGUID','ascend');
+
 % Hack to deal with quotations around strings in table
 foo = image03_2.SeriesType;
 [l,w] = size(foo);
 for i=1:l
     foo(i) = strjoin(['"' string(foo(i)) '"'],'');
 end
 image03_2.SeriesType = foo;
+
 data_2 = innerjoin(data_1,image03_2);
 
 
-%%
-writetable(data_2,'./spreadsheets/ABCD_good_and_bad_series_table.csv');
+%% final output table (path hardcoded)
+
+writetable(data_2,output_csv);
 
diff --git a/image03_importer.m b/image03_importer.m
@@ -1,3 +1,4 @@
+function image03 = image03_importer(filename)
 %% Import data from text file.
 % Script for importing data from the following text file:
 %
@@ -9,7 +10,6 @@
 % Auto-generated by MATLAB on 2018/08/14 10:58:42
 
 %% Initialize variables.
-filename = './spreadsheets/20181127_image03.txt';
 delimiter = '\t';
 startRow = 3;
 

diff --git a/spreadsheets/README.md b/spreadsheets/README.md
@@ -0,0 +1,7 @@
+# `spreadsheets` folder
+
+This is where the spreadsheets belong:
+
+1. `DAL_ABCD_merged_pcqcinfo.csv` (the DAIC QC info)
+1. `image03.txt` (the NDA DICOM imaging data info)
+1. `ABCD_good_bad_series_table.csv` (generated after `data_gatherer.m` is run)