# Full project downloading data from XNAT

#### Maria Yanez Lopez 2018 (maria.yanez-lopez@imperial.ac.uk)
#### ~ adapted for full project download Niall Bourke Feb 2019 (n.bourke@imperial.ac.uk)
### Documentation: 

https://github.com/pyxnat/pyxnat/blob/master/pyxnat/core/downloadutils.py

https://groups.google.com/forum/#!topic/xnat_discussion/K8h4VP4CBMg

https://gist.github.com/mattsouth/db8f2d09acf3c57ba605fa93c4e8d03e

https://ubuntuforums.org/showthread.php?t=786879

https://wiki.imperial.ac.uk/pages/viewpage.action?spaceKey=HPC&title=Jupyter

Version 2.0 ~ Niall Bourke  
Updated 24/09/2019  
~ Checks for data that is already pulled from xnat to allow rolling updating of data on the HPC
 
This scripts downloads DICOM data from XNAT according to users specifications.

### Import python libraries

In [None]:
import sys, os, getpass                           
from pyxnat import Interface

### Introduce your XNAT login details (same as college credentials) and project folder

In [None]:
userName = raw_input('Type XNAT User Name: ')
passWord = getpass.getpass('Type XNAT Password: ')
projectID = raw_input('Type XNAT Project ID: ')
server = 'http://cif-xnat.hh.med.ic.ac.uk'

In [None]:
print 'INPUT'
print 'Server: ', server
print 'Username: ', userName
print 'Password: ', ''.join(['*']*len(passWord))
print 'ProjectID: ', projectID 


### Create PYXNAT interface

In [None]:
central = Interface(server=server, user=userName, password=passWord)

# Full project download:
subjects = central.select.project(projectID).subjects().get()

# individual subject:
#subID="CIF02_S00677"
#subjects = central.select.project(projectID).subjects(subID).get()

print(subjects)
allSessions = []
number_subjects = 0

### Browse through project, collect subjects/sessions/scans and print subject labels

In [None]:
for i, subject in enumerate(subjects):
    label = central.select.project(projectID).subject(subject).label()
    print label, ('%i/%i' % (i+1, len(subjects)))
    sessions = central.select.project(projectID).subjects(subject).experiments().get()
    allSessions.append(sessions)

## Modify the output diretory, where the datasets will be saved form XNAT
 Set so path is always the tbi group raw direcotry and will download to a folder with the name of project being downloaded


In [None]:

dirName = os.path.join('/rds/general/project/c3nl_djs_imaging_data/live/data/raw/', projectID)

# Create target Directory if don't exist
if not os.path.exists(dirName):
    os.mkdir(dirName)
    print("Directory " , dirName ,  " Created ")
else:    
    print("Directory " , dirName ,  " already exists")
    
Results_Dir = dirName # needs to exist or next cell will throw error


### Download datasets
This script will look into the predefined project. Check the printed output to look for duplicates and incomplete datasets.

In [None]:
import glob 

subjectCounter = 0
for s, subjectID in enumerate(subjects):
    subjectLabel = central.select.project(projectID).subject(subjectID).label()
    
    for experimentID in allSessions[s]:
            scans = central.select.project(projectID).subject(subjectID).experiments(experimentID).scans()
            scanIDs = scans.get()
            
            coll = central.select.project(projectID).subject(subjectID).experiments(experimentID)
            for ese in coll:
                explab = ese.attrs.get('label')
            
            # Check if data has already been pulled
            dataCheck = glob.glob(Results_Dir + "/" + subjectLabel + "/*" + explab )
            #print("sub label is: " + subjectLabel)
            #print("exp label is: " + explab)
            dataCheck = ''.join(dataCheck) # covert list to string
            #print("data path is: " + dataCheck)
            if not os.path.exists(dataCheck):
                print("Downloading:", explab)        
                number_subjects+=1
            
                if len(scanIDs) == 0:
                    print("There are no scans to download for", explab)
                else:
                    filenames = central.select.project(projectID).subject(subjectID).experiment(experimentID).scans()
                    filenames.download(Results_Dir, type='ALL', extract=False, removeZip=True)   
            else:
                print(explab + " already pulled")
print "The total number of scanning sessions downloaded is = " + str(number_subjects)

## Sweet now we're rolling! 
To make life easy all our labs notebooks are going assume a BIDS format.
As data curating can be a pain in the derrière, lets run a nice little function to sort that for us ;)

## Dependencies

#### A CIF_config.json has been created to match MRI acquisitions and label them in the correct format. 
This may need to be update if new seqences are being collected. 
Requires the labels from the scan card for each acquisition being formated (NOTE: How these are displayed on the XNAT website unhelpfuly does not necessarily match with the actual data labels!)  

#### Index files
* I have used XDC (xnat data cliant) to pull metaData about scan labels from xnat.

* The bids scripts are hardcoded to look for this metaData in a indexFiles directory within the working dir. This should contain two files for the project PROJECT_experiments.csv and PROJECT_subject.csv

* This requires local setup. I have a function that runs through a list of TBI projects on xnat and downloads the metadata. If a proejct is not in the imaging directory try the XDC setup below. 

* The following XDC function can be used to pull project and subject information from xnat


### XDC setup

Install via the following instructions:
https://wiki.xnat.org/xnat-tools/xnatdataclient


Use some lines like the following to pull the csv files used for indexing and renaming the files

In [None]:
# XDC is an alias set in .bash_profile to the function which can be downloaded in the above link

XDC -u USERNAME -p PASSWORD -r "http://wmec-transtec1.hh.med.ic.ac.uk:/data/archive/projects/PROJECT/experiments?format=csv" -o PROJECT_experiments
    
XDC -u USERNAME -p PASSWORD -r "http://wmec-transtec1.hh.med.ic.ac.uk:/data/archive/projects/PROJECT/subjects?format=csv" -o PROJECT_subject

## Extracting and indexing data from xnat

#### 1: bids_1_preproc -i project
    Unzips & indexes files downloaded from XNAT with more meaningfull labels such as participant ID and scan session.  
    This sets up the initial file structure to run the conversion to BIDS.
    
    
#### 2: bids_2_proc -i project -c config.json
    Loops over all subjects->sessions->modalities->scans and converts DICOMS to NIFTI.   
    The labels for each of the scans on the scan card are then converted to match the BIDS format and file structure  
    
#### Sources of error
* Conversion to nii at this point should be robust and all data will be in raw under the project name
* Missing data in source directory is likely due to a new exception in how something was named on the scanner - this should be added to the config.json file. Be careful not to clash with similar names. 
* This works well for data comming off the CIF scanner (Imperial). Data from new sites have to be checked as something in the structure may cause unexpected outcomes. 