# Post-Curation BIDS Clean-Update
### The following functions assist in removing extraneous `.tsv` files that were added to Flywheel during the curation process for 22Q.

Project: 22Q_812481

Author: Katja Zoner

Updated: 03/12/2021



## Step 0: Setting up logging, initializing the Flywheel client, gathering sessions to process.

In [None]:
import flywheel
import logging
import sys
import re

# Set up logging
logging.basicConfig(
    stream=sys.stdout,
    level=logging.INFO,
    format='%(levelname)s: %(message)s')
logger = logging.getLogger('fw_remove_files')

In [None]:
# Get API_KEY from FW profile
API_KEY = "upenn.flywheel.io:47vhOSDkwMxGRNxFq0"

# Get client
fw = flywheel.Client(API_KEY)
assert fw, "Your Flywheel CLI credentials aren't set!"

# Specify project
project_label = "22Q_812481"

# Get project object
project = fw.projects.find_first('label="{}"'.format(project_label))
assert project, "Project not found!"

# Specify sessions (optional)
session_labels = None
#session_labels = ["010288","008120"]

# Get all sessions in project
sessions = project.sessions()

# Filter by optionally-specified session label list
if session_labels:
    sessions = [s for s in sessions if s.label in session_labels]



# Post-curation BIDS clean-up

## Step 1: Removing all mis-named `aslcontext.tsv` files.

In an earlier round of curation, aslcontext.tsv files were added to flywheel with the wrong naming scheme:

- aslcontext.tsv files named: `sub-<#>_ses-22Q<#>_aslcontext.tsv` should be deleted.
- aslcontext.tsv files named: `sub-<#>_ses-22Q<#>_acq-se_aslcontext.tsv`, should be retained. Note the addition of `acq-se`.




In [18]:
def removeFile(rmRegex):
    '''
    Remove file matching rmRegex from session level.
    '''
    
    for sess in sessions:
        files = [f for f in sess.files if re.match(rmRegex, f.name) ]
        for f in files:
            sess.delete_file(f.name)

In [None]:
# Remove mis-named: sub-<#>_ses-22Q<#>_aslcontext.tsv files,
# but retain correctly named: sub-<#>_ses-22Q<#>_acq-se_aslcontext.tsv
rmRegex = ".*(?<!acq-se)_aslcontext.tsv" # target files ending with _aslcontext.tsv, not preceded by acq-se
removeFile(rmRegex)


## Step 2: Conditionally removing `.tsv` files in sessions without accompanying scan files.

The `heuristic.py` curation script attaches `*_aslcontext.csv` and `*_task-idemo_events.tsv` files to _all_ sessions, regardless of whether or not the session contains ASL or idemo task fMRI scans.

The following code:

1. Removes the `*_aslcontext.tsv` from each session that does not contain the ASL scan, and
2. Removes the `*_task-idemo_events.tsv` file from each session that does not contain the idemo fMRI task scan.


In [17]:
def conditionalRemoveFile(reqRegex,rmRegex):
    ''' 
    Remove file matching rmRegex from session only if
    required file matching reqRegex exists in any of the
    acquisitions in this session.
    '''

    # Loop through each session
    for sess in sessions:
        hasRequiredFile = False
        # Loop through each acquition
        for acq in sess.acquisitions():
            # Check current acquisition for required bids file.
            for f in acq.files:
                try:
                    # If current file is bids file matching reqRegex --> break.
                    if re.match(reqRegex,f.info['BIDS']['Filename']):
                        hasRequiredFile = True
                        break
                except:
                    next

            # If required file was found in any acquisition, 
            # then no file removal is necessary for this session.
            if hasRequiredFile:
                break
        
        # If required file not found after checking all acquisitions/files in session,
        # delete file(s) matching rmRegex.
        if not hasRequiredFile:
            removalFiles = [f for f in sess.files if re.match(rmRegex,f.name)]
            for f in removalFiles:
                sess.delete_file(f.name)
                logger.info(f"Removing {f.name} from session {sess.label}")   
        


In [19]:
# Remove *_aslcontext.tsv file from all sessions missing asl scans.
reqRegex = ".*acq-se_asl.nii.gz"
rmRegex = ".*_aslcontext.tsv"
conditionalRemoveFile(reqRegex,rmRegex)


INFO: Removing sub-14528_ses-22Q1_acq-se_aslcontext.tsv from session 010287
INFO: Removing sub-16261_ses-22Q1_acq-se_aslcontext.tsv from session 005634
INFO: Removing sub-17722_ses-22Q1_acq-se_aslcontext.tsv from session 010291


In [20]:
# Remove *_task-idemo_events.tsv from all sessions missing the idemo fMRI task scan.
reqRegex = ".*task-idemo_bold.nii.gz"
rmRegex = ".*task-idemo_events.tsv"
conditionalRemoveFile(reqRegex,rmRegex)

INFO: Removing sub-17667_ses-22Q1_task-idemo_events.tsv from session 007699
INFO: Removing sub-17053_ses-22Q1_task-idemo_events.tsv from session 006493
INFO: Removing sub-16415_ses-22Q1_task-idemo_events.tsv from session 005121
INFO: Removing sub-15972_ses-22Q1_task-idemo_events.tsv from session 004192
INFO: Removing sub-14549_ses-22Q1_task-idemo_events.tsv from session 004910
INFO: Removing sub-15062_ses-22Q1_task-idemo_events.tsv from session 004796
INFO: Removing sub-16277_ses-22Q1_task-idemo_events.tsv from session 005665
INFO: Removing sub-16527_ses-22Q1_task-idemo_events.tsv from session 005535
INFO: Removing sub-16017_ses-22Q1_task-idemo_events.tsv from session 004291
INFO: Removing sub-19489_ses-22Q1_task-idemo_events.tsv from session 009917
INFO: Removing sub-16812_ses-22Q1_task-idemo_events.tsv from session 006298
INFO: Removing sub-13473_ses-22Q1_task-idemo_events.tsv from session 005962
INFO: Removing sub-16779_ses-22Q1_task-idemo_events.tsv from session 005889
INFO: Removi

### Now the 22Q BIDS data on Flywheel should contain no extraneous `.tsv` files!