#  Downloading data from XNAT
(n.bourke@imperial.ac.uk)
  
#### Version 4.0 ~ Niall Bourke    

Updates: 
~ 2018 Maria Yanez-Lopez  
~ 24/09/2019: Checks for data that is already pulled from xnat to allow rolling updating of data on the HPC  
~ 12/10/2021: Adapting for new broken XNAT. **Requires project access**  
~ Nov 2021: incorporation of BIDS converter scripts  
~ Dec 2021: **Change to python 3.7**
 
  
### Documentation: 


** Need to check and update BIDS scripts!! **


This scripts downloads DICOM data from XNAT according to users specifications.

Works with python version py2.7 (upadte envionment libraries for py3 for continued support)

https://github.com/pyxnat/pyxnat/blob/master/pyxnat/core/downloadutils.py

https://groups.google.com/forum/#!topic/xnat_discussion/K8h4VP4CBMg

https://gist.github.com/mattsouth/db8f2d09acf3c57ba605fa93c4e8d03e

https://ubuntuforums.org/showthread.php?t=786879

https://wiki.imperial.ac.uk/pages/viewpage.action?spaceKey=HPC&title=Jupyter


### Import python libraries

In [1]:
import sys, os, getpass                           
from pyxnat import Interface
import pandas as pd

ModuleNotFoundError: No module named 'pyxnat'

### Introduce your XNAT login details (same as college credentials) and project folder

In [3]:
userName = raw_input('Type XNAT User Name: ')
passWord = getpass.getpass('Type XNAT Password: ')
projectID = raw_input('Type XNAT Project ID: ')
server = 'http://cif-xnat.hh.med.ic.ac.uk'

Type XNAT User Name:  nbourke
Type XNAT Password:  ·········
Type XNAT Project ID:  Study__PCNORAD





In [4]:
print 'INPUT'
print 'Server: ', server
print 'Username: ', userName
print 'Password: ', ''.join(['*']*len(passWord))
print 'ProjectID: ', projectID 

INPUT
Server:  http://cif-xnat.hh.med.ic.ac.uk
Username:  nbourke
Password:  *********
ProjectID:  Study__PCNORAD


### Create PYXNAT interface

In [5]:
central = Interface(server=server, user=userName, password=passWord)

# Full project download:
subjects = central.select.project(projectID).subjects().get()

# individual subject:
#subID= "CIF3_S04363" 
#subjects = central.select.project(projectID).subjects(subID).get()

print(subjects)
#head(subjects)
allSessions = []
number_subjects = 0

['CIF3_S00234', 'CIF3_S01085', 'CIF3_S01692', 'CIF3_S02562', 'CIF3_S04372', 'CIF3_S04376', 'CIF3_S04382', 'CIF3_S04383', 'CIF4_S00002', 'CIF4_S00003', 'CIF4_S00016']


### Browse through project, collect subjects/sessions/scans and print subject labels

In [6]:
for i, subject in enumerate(subjects):
    label = central.select.project(projectID).subject(subject).label()
    print label, ('%i/%i' % (i+1, len(subjects)))
    sessions = central.select.project(projectID).subjects(subject).experiments().get()
    allSessions.append(sessions)

CIF3271 1/11
CIF0381 2/11
CIF3267 3/11
CIF3287 4/11
CIF3427 5/11
CIF3409 6/11
CIF3404 7/11
CIF3426 8/11
CIF3442 9/11
CIF3440 10/11
CIF3449 11/11


## Modify the output directory, where the datasets will be saved from XNAT

* The path is currently set to c3nl_djs_working_dir/ephemeral directory and will download to a folder with the name of project being downloaded
* For curation purposes a defined location should be set to host the raw data


In [7]:
dirName = os.path.join('/rds/general/project/c3nl_djs_working_dir/ephemeral/', projectID)

# Create target Directory if don't exist
if not os.path.exists(dirName):
    os.mkdir(dirName)
    print("Directory " , dirName ,  " Created ")
else:    
    print("Directory " , dirName ,  " already exists")
    
dirName = os.path.join('/rds/general/project/c3nl_djs_working_dir/ephemeral/', projectID, 'raw')
if not os.path.exists(dirName):
    os.mkdir(dirName)
    print("Directory " , dirName ,  " Created ")
else:    
    print("Directory " , dirName ,  " already exists")
    
Results_Dir = dirName # needs to exist or next cell will throw error
base_dir = ('/rds/general/project/c3nl_djs_working_dir/ephemeral/' + projectID)

('Directory ', '/rds/general/project/c3nl_djs_working_dir/ephemeral/Study__PCNORAD', ' already exists')
('Directory ', '/rds/general/project/c3nl_djs_working_dir/ephemeral/Study__PCNORAD/raw', ' already exists')


'/raw/'

### Download datasets
This script will look into the predefined project. Check the printed output to look for duplicates and incomplete datasets.

In [8]:
import glob 

subjectCounter = 0
for s, subjectID in enumerate(subjects):
    subjectLabel = central.select.project(projectID).subject(subjectID).label()
    
    for experimentID in allSessions[s]:
            scans = central.select.project(projectID).subject(subjectID).experiments(experimentID).scans()
            scanIDs = scans.get()
            
            coll = central.select.project(projectID).subject(subjectID).experiments(experimentID)
            for ese in coll:
                explab = ese.attrs.get('label')
            
            # Check if data has already been pulled
            dataCheck = glob.glob(Results_Dir + "/" + subjectLabel + "/*" + explab )
            #print("sub label is: " + subjectLabel)
            #print("exp label is: " + explab)
            dataCheck = ''.join(dataCheck) # covert list to string
            #print("data path is: " + dataCheck)
            if not os.path.exists(dataCheck):
                print("Downloading:", explab)        
                number_subjects+=1
            
                if len(scanIDs) == 0:
                    print("There are no scans to download for", explab)
                else:
                    filenames = central.select.project(projectID).subject(subjectID).experiment(experimentID).scans()
                    filenames.download(Results_Dir, type='ALL', extract=False, removeZip=True)   
            else:
                print(explab + " already pulled")
print "The total number of scanning sessions downloaded is = " + str(number_subjects)


PCNORAD05 already pulled
('Downloading:', 'PC_NorAD_004')
('Downloading:', 'PC_Norad_02')
('Downloading:', 'PC_NorAD_003')
('Downloading:', 'PCNORAD13')
('Downloading:', 'PCNORAD11')
('Downloading:', 'PCNORAD11_v2')
('Downloading:', 'PCNORAD10')
('Downloading:', 'PCNORAD12')
('Downloading:', 'PCNORAD015')
('Downloading:', 'PCNORAD014')
('Downloading:', 'PCNORAD016')
The total number of scanning sessions downloaded is = 11


## Sweet now we're rolling! 
To make life easy all our labs notebooks are going assume a BIDS format.
The following curates data in a standardised format, which will be the starting point of analysis pipelines

## Dependencies

#### A CIF_config.json has been created to match MRI acquisitions and label them in the correct format. 
This may need to be updated if new seqences are being collected. 
Requires the labels from the scan card for each acquisition being formated (NOTE: How these are displayed on the XNAT website unhelpfuly does not necessarily match with the actual data labels!)  

#### Index files
* I have used XDC (xnat data cliant) to pull metaData about scan labels from xnat.

* The bids scripts are hardcoded to look for this metaData in a indexFiles directory within the working dir. This should contain two files for the project PROJECT_experiments.csv and PROJECT_subject.csv

* The following XDC function can be used to pull project and subject information from xnat


### XDC setup
Install via the following instructions:
https://wiki.xnat.org/xnat-tools/xnatdataclient

This folder has been saved in the dependencies folder on the cluster (17/11/2021). 


**UPDATE**
- Pass python variables and save output to working directory
- Check paths in bash scripts called (CIF_unzip & bids_beta)



In [10]:
%%bash -s "$userName" "$passWord" "$projectID" "$base_dir"

username=${1} 
password=${2}
ID=${3}
path=${4} 
dep=/rds/general/project/c3nl_shared/live/dependencies

input1="http://wmec-transtec1.hh.med.ic.ac.uk:/data/archive/projects/"${ID}"/experiments?format=csv"
#echo ${input}
input2="http://wmec-transtec1.hh.med.ic.ac.uk:/data/archive/projects/"${ID}"/subjects?format=csv"

# Run xnat data client
## This updates the indexing information from xnat which the bids script relies on
java -jar /rds/general/project/c3nl_shared/live/dependencies/data-client-shadow-1.7.6/lib/XnatDataClient-1.7.6-all.jar -u ${username} -p ${password} -r ${input1} -o ${path}/${ID}_experiments  
java -jar /rds/general/project/c3nl_shared/live/dependencies/data-client-shadow-1.7.6/lib/XnatDataClient-1.7.6-all.jar -u ${username} -p ${password} -r ${input2} -o ${path}/${ID}_subject

# Extract data
#${dep}/CIF_unzip_Study__ -i ${ID}

echo "${dep}/xnat2bids/bids_beta_Study__ -i ${ID} -c CIF_config.json" > ${path}/bidJob.txt



    # Run job
    ${dep}/hpcSubmit ${path}/bidJob.txt 04:00:00 3 6Gb
    echo ""; echo "***"; echo ""; echo "Submitted commands:"
    head ${job}
 

input is = /rds/general/project/c3nl_djs_working_dir/ephemeral/Study__PCNORAD/bidJob.txt
Walltime = 04:00:00
Number of CPUs = 3
Memory = 6Gb
Check this correct
Job submitted: Tue 30 Nov 12:51:28 GMT 2021
4873860.pbs

***

Submitted commands:
 


bash: module: line 1: syntax error: unexpected end of file
bash: error importing function definition for `module'
bash: scl: line 1: syntax error: unexpected end of file
bash: error importing function definition for `scl'
bash: ml: line 1: syntax error: unexpected end of file
bash: error importing function definition for `ml'
/bin/sh: ml: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `ml'
/bin/sh: scl: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `scl'
/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `module'
/bin/sh: ml: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `ml'
/bin/sh: scl: line 1: syntax error: unexpected end of file
/bin/sh: error importing function definition for `scl'
/bin/sh: module: line 1: syntax error: unexpected end of file
/bin/sh: error importing function defi

## Extracting and indexing data from xnat

The following functions are in the dependencies folder on the Imperial HPC along with the CIF_config.json file
* New aquisitions need to be added to the CIF_config.json (this is a sort of dictionary for standardised naming)

#### 1: CIF_unzip -i project
    Unzips & indexes files downloaded from XNAT with more meaningfull labels such as participant ID and scan session.  
    This sets up the initial file structure to run the conversion to BIDS.
    
#### 2: bids_proc -i project -c config.json
    Loops over all subjects->sessions->modalities->scans and converts DICOMS to NIFTI.   
    The labels for each of the scans on the scan card are then converted to match the BIDS format and file structure  
    
#### Sources of error
* Conversion to nii at this point should be robust and all data will be in raw under the project name
* Missing data in source directory is likely due to a **new exception** in how something was named on the scanner - this should be added to the config.json file. Be careful not to clash with similar names. 
* This works well for data comming off the CIF scanner (Imperial). Data from new sites have to be checked/validated as something in the structure may cause unexpected outcomes. 


##### Known bugs
XDC as an alias set in .bash_profile wont be sourced in Jupyter, not sure why. 

Point to it by adding the following lines to your .bash_profile
#XDC
alias XDC='java -jar /rds/general/user/**nbourke**/home/data-client-shadow-1.7.6/lib/XnatDataClient-1.7.6-all.jar'
