Skip to content

Commit

Permalink
The finishing touches on the GitHub first release and README's.
Browse files Browse the repository at this point in the history
  • Loading branch information
ericearl committed Apr 28, 2019
1 parent c61bc42 commit 36f9bb2
Show file tree
Hide file tree
Showing 7 changed files with 244 additions and 64 deletions.
5 changes: 3 additions & 2 deletions DAL_ABCD_merged_pcqcinfo_importer.m
@@ -1,15 +1,15 @@
function data = DAL_ABCD_merged_pcqcinfo_importer(filename)
%% Import data from text file.
% Script for importing data from the following text file:
%
% /mnt/max/shared/projects/ABCD/daic_spreadsheet/DAL_ABCD_merged_pcqcinfo.csv
% spreadsheets/DAL_ABCD_merged_pcqcinfo.csv
%
% To extend the code to different selected data or a different text file,
% generate a function instead of a script.

% Auto-generated by MATLAB on 2018/08/14 13:31:43

%% Initialize variables.
filename = './spreadsheets/20181127_DAL_ABCD_QC_merged_pcqcinfo.csv';
delimiter = ',';
startRow = 2;

Expand Down Expand Up @@ -119,3 +119,4 @@
data = DALABCDmergedpcqcinfo;
QC_flag = data.QC == 1;
cleandata_idx = QC_flag;

115 changes: 81 additions & 34 deletions README.md
@@ -1,58 +1,105 @@
# ABCD QC-Passed Image Downloader
# ABCD DICOM to BIDS

## Required spreadsheets
Written by the OHSU ABCD site for selectively downloading ABCD Study imaging DICOM data QC'ed as good by the ABCD DAIC site, converting it to BIDS standard input data, selecting the best pair of spin echo field maps, and correcting the sidecar JSON files to meet the BIDS Validator specification.

To download images for ABCD you must have two spreadsheets:
## Installation

1. `DAL_ABCD_merged_pcqcinfo.csv`
Clone this repository and save it somewhere on the Linux system you want to do ABCD DICOM downloads and conversions to BIDS.

## Dependencies

1. [MathWorks MATLAB (R2016b and newer)](https://www.mathworks.com/products/matlab.html)
1. [Python 2.7](https://www.python.org/download/releases/2.7/)
1. [NIMH Data Archive (NDA) `nda_aws_token_generator`](https://github.com/NDAR/nda_aws_token_generator)
1. [cbedetti Dcm2Bids](https://github.com/cbedetti/Dcm2Bids) (`export` into your BASH `PATH` variable)
1. [Rorden Lab dcm2niix](https://github.com/rordenlab/dcm2niix) (`export` into your BASH `PATH` variable)
1. [zlib's pigz-2.4](https://zlib.net/pigz) (`export` into your BASH `PATH` variable)

## Spreadsheets (not included)

- This is provided to us by the DAIC
- Contains operator QC information for each scan. If the image fails operator QC (0) the image is not downloaded
To download images for ABCD you must have two spreadsheets downloaded to this repository's `spreadsheets` folder:

1. `DAL_ABCD_merged_pcqcinfo.csv`
1. `image03.txt`

- This is downloaded from the NIMH Data Archive.
- Contains paths to the tgz on s3 where the image is downloaded from
- Login to the NIMH Data Archive (https://ndar.nih.gov/)
- Go to "Data Dictionary" under Quick Navigation
- Select all ABCD Releases under Source
- Click 'Filter'
- Select just Image/image03
- Click 'Download'
- In the upper right hand corner under 'Selected Filters' click 'Download/Add to Study'
- Under Collections by Permission Group click 'Deselect All'
- At the bottom re-select 'Adolescent Brain Cognitive Development (ABCD)'
- Click 'Create Package'
- Name the package Image03
- Select only 'Include documentation'
- Click 'Create Package'
- Download the Package Manager and download
`DAL_ABCD_merged_pcqcinfo.csv` was provided to OHSU by the ABCD DAIC. A future version of this code will utilize the [NIMH Data Archive (NDA)](https://ndar.nih.gov/) version of this QC information. The spreadsheet contains operator QC information for each MRI series. If the image fails operator QC (a score of 0) the image is not downloaded.

`image03.txt` can be downloaded from [the NDA](https://ndar.nih.gov/) with an ABCD Study Data Use Certification in place. It contains paths to the TGZ files on the NDA's Amazon AWS S3 buckets where the images can be downloaded from per series. The following are explicit steps to download just this file:

1. Login to the [NIMH Data Archive](https://ndar.nih.gov/)
1. Go to **Data Dictionary** under **Quick Navigation**
1. Select **All ABCD Releases** under **Source**
1. Click **Filter**
1. Select just **Image/image03**
1. Click **Download**
1. In the upper right hand corner under **Selected Filters** click **Download/Add to Study**
- Under **Collections** by **Permission Group** click **Deselect All**
- At the bottom re-select **Adolescent Brain Cognitive Development (ABCD)**
1. Click **Create Package**
- Name the package something like **Image03**
- Select Only **Include documentation**
- Click **Create Package**
1. Download and use the **Package Manager** to download your package

## Setup

You will also need to reach into the `nda_aws_token_maker.py` in this repository and update it with your NDA USERNAME and PASSWORD. Make sure the file is locked down to only your own read and write privileges so no one else can read your username and password in there:

```
chmod 600 nda_aws_token_maker.py
```

We don't have a better solution for securing your credentials while automating downloads right now.

These two files are used in the `data_gatherer.m` to create the `ABCD_good_bad_series_table.csv` that is used to actually download the images.
## Usage

`data_gatherer.m` also depends on a mapping file (`mapping.mat`), which maps among other things the SeriesDescriptions from each txt to known OHSU descriptors that classify each tgz into T1, T2, rfMRI, tfMRI_nBack, etc.
This repo's usage is broken out into four distinct scripting sections. You will need to run them in order, each independently of the next waiting for the first to complete.

## Required software dependencies
1. (MATLAB) `data_gatherer.m`
2. (Python) `good_bad_series_parser.py`
3. (BASH) `unpack_and_setup.sh`
4. (Python) `correct_jsons.py`

From there, other necessary scripts are the `nda_aws_token_maker.py` to be called before each attempted DICOM series TGZ download. Requires:
The MATLAB portion is for producing a download list for the Python & BASH portion to download, convert, select, and prepare.

- https://github.com/NDAR/nda_aws_token_generator
## 1. (MATLAB) `data_gatherer.m`

You will also need to reach into the `nda_aws_token_maker.py` and update it with your NDA USERNAME and PASSWORD. Make sure the file is locked down with `chmod 600 nda_aws_token_maker.py` so no one else can read your username and password in there. We don't have a better solution right now.
The two spreadsheets referenced above are used in the `data_gatherer.m` to create the `ABCD_good_and_bad_series_table.csv` which gets used to actually download the images.

# Usage
`data_gatherer.m` depends on a mapping file (`mapping.mat`), which maps the SeriesDescriptions to known OHSU descriptors that classify each TGZ file into T1, T2, task-rest, task-nback, etc.

The actual download work is done by `good_bad_series_parser.py` which only requires the `ABCD_good_bad_series_table.csv` spreadsheet present under `./spreadsheets/`.
## 2. (Python) `good_bad_series_parser.py`

The download is done like this:

# ABCD TGZ to BIDS Input Setup
```
./good_bad_series_parser.py
```

This only requires the `ABCD_good_and_bad_series_table.csv` spreadsheet present under a `spreadsheets` folder inside this repository's cloned folder.

**Note:** The `nda_aws_token_maker.py` is called before each attempted DICOM series TGZ download.

`unpack_and_setup.sh` does the work. It takes three arguments:
## 3. (BASH) `unpack_and_setup.sh`

`unpack_and_setup.sh` should be called in a loop to do the DICOM to BIDS conversion and spin echo field map selection. It takes three arguments:

```
SUB=$1 # Full BIDS formatted subject ID (sub-SUBJECTID)
VISIT=$2 # Full BIDS formatted session ID (ses-SESSIONID)
TGZDIR=$3 # Path to directory containing all .tgz for subject
TGZDIR=$3 # Path to directory containing all TGZ files for SUB/VISIT
```

**IMPORTANT**: update paths inside the script everywhere a `...` appears.
Here is an example:

```
./unpack_and_setup.sh sub-NDARINVABCD1234 ses-baselineYear1Arm1 ./new_download/sub-NDARINVABCD1234/ses-baseline_year_1_arm_1
```

## 4. (Python) `correct_jsons.py`

Finally at the end `correct_jsons.py` is run on the whole BIDS input directory to correct/prepare all BIDS sidecar JSON files to comply with the BIDS specification standard version 1.2.0.

```
./correct_jsons.py ./ABCD-HCP
```
115 changes: 115 additions & 0 deletions correct_jsons.py
@@ -0,0 +1,115 @@
#! /usr/bin/env python3

import json,os,sys,argparse,re

__doc__ = \
"""
This scripts is meant to correct ABCD BIDS input data to
conform to the Official BIDS Validator.
"""
__version__ = "1.0.0"

def read_json_field(json_path, json_field):

with open(json_path, 'r') as f:
data = json.load(f)

if json_field in data:
return data[json_field]
else:
return None

def remove_json_field(json_path, json_field):

with open(json_path, 'r+') as f:
data = json.load(f)

if json_field in data:
del data[json_field]
f.seek(0)
json.dump(data, f, indent=4)
f.truncate()
flag = True
else:
flag = False

return flag

def update_json_field(json_path, json_field, value):

with open(json_path, 'r+') as f:
data = json.load(f)

if json_field in data:
flag = True
else:
flag = False

data[json_field] = value
f.seek(0)
json.dump(data, f, indent=4)
f.truncate()

return flag

def main(argv=sys.argv):
parser = argparse.ArgumentParser(
prog='correct_jsons.py',
description=__doc__,
usage='%(prog)s BIDS_DIR'
)
parser.add_argument(
'BIDS_DIR',
help='Path to the input BIDS dataset root directory. Read more '
'about the BIDS standard in the link in the description. It is '
'recommended to use Dcm2Bids to convert from participant dicoms '
'into BIDS format.'
)
parser.add_argument(
'--version', '-v', action='version', version='%(prog)s ' + __version__
)

args = parser.parse_args()

for root, dirs, files in os.walk(args.BIDS_DIR):
for filename in files:
fn, ext = os.path.splitext(filename)

if ext == '.json':
json_path = os.path.join(root, filename)

with open(json_path, 'r') as f:
data = json.load(f)

# If TotalReadoutTime is missing from fmap JSON
if ('fmap' in root or 'func' in root) and 'TotalReadoutTime' not in data:
# Then check for EffectiveEchoSpacing and ReconMatrixPE
if 'EffectiveEchoSpacing' in data and 'ReconMatrixPE' in data:
# If both are present then update the JSON with a calculated TotalReadoutTime
EffectiveEchoSpacing = data['EffectiveEchoSpacing']
ReconMatrixPE = data['ReconMatrixPE']
# Calculated TotalReadoutTime = EffectiveEchoSpacing * (ReconMatrixPE - 1)
TotalReadoutTime = EffectiveEchoSpacing * (ReconMatrixPE - 1)
update_json_field(json_path, 'TotalReadoutTime', TotalReadoutTime)

# If EffectiveEchoSpacing is missing print error
if 'EffectiveEchoSpacing' not in data:
print(json_path + ': No EffectiveEchoSpacing')

# If ReconMatrixPE is missing print error
if 'ReconMatrixPE' not in data:
print(json_path + ': No ReconMatrixPE')

# Find the IntendedFor field that is a non-empty list
if 'fmap' in root and 'IntendedFor' in data and len(data['IntendedFor']) > 0:
# Regular expression replace all paths in that list with a relative path to ses-SESSION
intended_list = data['IntendedFor']
corrected_intended_list = [re.sub(r'.*(ses-.*_ses-.+)','\g<1>',entry) for entry in intended_list]
update_json_field(json_path, 'IntendedFor', corrected_intended_list)

# Remove SliceTiming field from func JSONs
if 'func' in root and 'SliceTiming' in data:
remove_json_field(json_path, 'SliceTiming')

if __name__ == "__main__":
sys.exit(main())
27 changes: 20 additions & 7 deletions data_gatherer.m
@@ -1,8 +1,15 @@
%%
%% variable initialization

clear variables
load mapping.mat

DAL_ABCD_merged_pcqcinfo_importer;
QC_file = 'spreadsheets/DAL_ABCD_QC_merged_pcqcinfo.csv';
image03_file = 'spreadsheets/image03.txt';
output_csv = 'spreadsheets/ABCD_good_and_bad_series_table.csv';

%% QC parsing

data = DAL_ABCD_merged_pcqcinfo_importer(QC_file);

for i = 1:height(data)
if data.SeriesTime(i) < 100000
Expand All @@ -13,8 +20,10 @@
end
data.CleanFlag = cleandata_idx;

%%
image03_importer;
%% image03 parsing

image03 = image03_importer(image03_file);

for i = 1:height(image03)
image03.timestamp{i} = image03.image_file{i}(end-10:end-5);
end
Expand All @@ -26,19 +35,23 @@
image03_2.Properties.VariableNames{1} = 'pGUID';
image03_2.Properties.VariableNames{9} = 'EventName';

%%
%% table joins

data_1 = innerjoin(data,map_qc_descriptor);
data_1 = sortrows(data_1,'pGUID','ascend');

% Hack to deal with quotations around strings in table
foo = image03_2.SeriesType;
[l,w] = size(foo);
for i=1:l
foo(i) = strjoin(['"' string(foo(i)) '"'],'');
end
image03_2.SeriesType = foo;

data_2 = innerjoin(data_1,image03_2);


%%
writetable(data_2,'./spreadsheets/ABCD_good_and_bad_series_table.csv');
%% final output table (path hardcoded)

writetable(data_2,output_csv);

2 changes: 1 addition & 1 deletion image03_importer.m
@@ -1,3 +1,4 @@
function image03 = image03_importer(filename)
%% Import data from text file.
% Script for importing data from the following text file:
%
Expand All @@ -9,7 +10,6 @@
% Auto-generated by MATLAB on 2018/08/14 10:58:42

%% Initialize variables.
filename = './spreadsheets/20181127_image03.txt';
delimiter = '\t';
startRow = 3;

Expand Down
7 changes: 7 additions & 0 deletions spreadsheets/README.md
@@ -0,0 +1,7 @@
# `spreadsheets` folder

This is where the spreadsheets belong:

1. `DAL_ABCD_merged_pcqcinfo.csv` (the DAIC QC info)
1. `image03.txt` (the NDA DICOM imaging data info)
1. `ABCD_good_bad_series_table.csv` (generated after `data_gatherer.m` is run)

0 comments on commit 36f9bb2

Please sign in to comment.