# UPenn Flywheel Data Transfer to ASC FMRISrv 

This notebook was shared by Dr Nicole Cooper from CNLab referenced this notebook for Flywheel MURI scan downloads as an example... The same should work for CNLab & AHALab Flywheel projects.

+ 02/21/2020 - [José Carreras-Tartak](mailto:jcarreras@falklab.org) original author
+ 04/28/2021 - [Etienne Jacquot](mailto:etienne.jacquot@asc.upenn.edu) revisited


## *Getting Started w/ [UPenn Flywheel](https://upenn.flywheel.io/) Python-SDK*:

The AHA lab does not have a project on Flywheel so maybe not all the steps are exact yet. This eventually will be in place though. For now let us try based on a specific sessionID

- Please navigate here for access via Pennkey: https://upenn.flywheel.io/
- you need an **api key**, be careful with this secret


In [1]:
import flywheel
import tarfile
import os
import time
import zipfile
from zipfile import ZipFile

import configparser

### Create you Flywheel API secret config file 


- You can manually navigate and create your [configs/config.ini](./configs/config.ini) (_we add *.ini to the .gitignore_)



- _**ALTERNATIVELY** you can run the following in a python notebook cell, just delete the row after which contains your key:_

_____

```python
!touch configs/config.ini                           # <-- create your blank config

!echo '[UPENN-FLYWHEEL]' >> configs/config.ini      # <-- add the header to config file

your_api_key = "upenn.flywheel.io:LLJb..........." # <-- MANUALLY ENTER YOUR API KEY

!echo 'apikey='$your_api_key >> configs/config.ini  # <-- add your key to config 

```

_______

### Read Flywheel API secret into Python w/ ConfigParser

- You must login and navigate to https://upenn.flywheel.io/#/profile, this has your API key 


In [2]:
# specify your home directory where the config file should be saves
home_dir = "/home/dcosme@asc.upenn.edu"

In [3]:
# # commented out because the config file was generated successfully

# home_dir = "/home/dcosme@asc.upenn.edu"
# your_api_key = "upenn.flywheel.io:..."

# !touch $home_dir/configs/config.ini
# !echo '[UPENN-FLYWHEEL]' >> $home_dir/configs/config.ini
# !echo 'apikey='$your_api_key >> $home_dir/configs/config.ini
# !more $home_dir/configs/config.ini

In [4]:
# add UPenn Flywheel api key to your config.ini
fw_cred = {}
config = configparser.ConfigParser()

config.read(home_dir + '/configs/config.ini') # <--- add your Twitter API tokens to this file!
for item,value in config['UPENN-FLYWHEEL'].items():
    fw_cred[item]=value

In [5]:
# read your API key
api = fw_cred['apikey']

### Confirm your access to Flywheel via python SDK

- The `fw.get_current_user()` command is a quick way to ensure you have established a secure connection to UPenn Flywheel

In [6]:
# Create client using your API key
fw = flywheel.Client(api)

In [7]:
# print your flywheel information & confirm it works as expected
self = fw.get_current_user()
print('UPenn Flywheel User: %s %s (%s)' % 
      (self.firstname, self.lastname, self.email))

UPenn Flywheel User: Dani Cosme (dcosme@upenn.edu)


_______

## Proceed by Navigating to Flywheel, you'll notice the URL always has respective identifiers

In this example, our notebook tests for a known session ID associated w/ Dr Lydon-Staley AHA Lab:

- https://upenn.flywheel.io/#/projects/5ba2913fe849c300150d02ed/sessions/6088730ee6de2e3066bd7249
    - where the session ID is in the URL --> `6088730ee6de2e3066bd7249`



### Set your Flywheel Project Container & Corresponding Local Out Project


In [13]:
# replace with name of Flywheel project container (i.e. "geoscan")
#in_project = "geoscan"
in_project = 'bbprime'

# replace with output project folder name in fMRI server (i.e. "geoscanR01")
#out_project = "GS"
out_project = "bbprime"

### Set your session specific ID & corresponding out ID

- not sure why the `opID` is entirely needed here... TBD

In [14]:
## MODIFY BELOW
# replace with ppt ID as listed on Flywheel (e.g. for geoscan, typically "gsXXX")
#ipID = "gs004"
ipID = "bpp01"

# replace with ppt ID as it will be stored in the server (i.e. "GSXXX")
#opID = "GS004"
opID = 'BPP01' # <--- I think this could be whatever, so long as this is unique on the FMRI host

### Verify that output directory in the server is accurate

- You may need to create this directory ahead of time...

In [15]:
outpath = '/fmriDataRaw/fmri_data_raw/{PROJECT}'.format(PROJECT=out_project)

os.listdir(outpath)

['BPP00']

________

## Proceed with looking up your subject data & downloading Dicom tarball

NOTE!!

* Location for DICOMS on the server IS:

    - `/fmriDataRaw/fmri_data_raw/{PROJECT}/`

e.g. untar the appropriate folder to e.g. `/fmriDataRaw/fmri_data_raw/{PROJECT}/`


### Flywheel uses `Group / Project / Subject / Session` to identify scan ... 

- the **group** is `falklab`

- the **project** is `bbprime` *(fw://unknown/Unsorted)*

- the **subject** is `bpp00` *(probably a default for the unsorted group)*

- the **session** is `CAMRIS^Falk`

#### Thus our lookup string is --> `'falklab/bbprime/bpp00/CAMRIS^Falk'` 

In [16]:
#group_label = 'falklab'
group_label = 'falklab'

#project_label = 'bbprime'
project_label = in_project # <-- values are set early on in the notebook... maybe that isn't helpful though?

#subject_label = 'bpp00'
subject_label = ipID # <-- values are set early on in the notebook... maybe that isn't helpful though?

session_label = 'CAMRIS^Falk'

######################################################

lookup_string = '{}/{}/{}/{}'.format(group_label,project_label,subject_label,session_label)
lookup_string

'falklab/bbprime/bpp01/CAMRIS^Falk'

### Proceed with looking up the known session in the *Unsorted* project

Create `session` object to lookup session of interest, you want to then confirm metadata is accurate!

- For a helpful video overview on finding your data on Flywheel w/ Python-SDK, I strongly encourage you to visit here:
https://docs.flywheel.io/hc/en-us/articles/360048440933-Webinar-Series-Finding-your-stuff-in-Flywheel-with-the-Python-SDK

*TODO --> CONTACT UPENN FLYWHEEL ADMIN TEAM TO FIGURE OUT LAB PROJECTS!*

In [17]:
#session = fw.lookup('{}'.format(lookup_string))
session = fw.lookup('falklab/{proj}/{pid}'.format(proj=in_project,pid=ipID))
session

{'age': None,
 'analyses': [],
 'code': 'bpp01',
 'cohort': None,
 'created': datetime.datetime(2021, 7, 10, 13, 26, 39, 95000, tzinfo=tzutc()),
 'ethnicity': None,
 'files': [],
 'firstname': '',
 'id': '60edbc44852dced2db56b0d6',
 'info': {},
 'info_exists': None,
 'label': 'bpp01',
 'lastname': '',
 'master_code': None,
 'modified': datetime.datetime(2021, 7, 13, 16, 19, 41, 755000, tzinfo=tzutc()),
 'notes': [],
 'parents': {'acquisition': None,
             'analysis': None,
             'group': 'falklab',
             'project': '60bf921979936cf97a3d09fa',
             'session': None,
             'subject': None},
 'permissions': [{'access': None,
                  'id': 'holder@upenn.edu',
                  'role_ids': ['5ef07972374bc20010a37aa3']},
                 {'access': None,
                  'id': 'ebfalk@upenn.edu',
                  'role_ids': ['5ef07972374bc20010a37aa3']},
                 {'access': None,
                  'id': 'alpaul@upenn.edu',
             

### Download the Flywheel Session tarball to FMRISrv

- Once we have the tar zip we can then extract our dicoms to the network


- *On running for Dr Lydon-Staley test subject, this tarball file is nearly 1GB*

#### What about the `./working_data` directory? 

*TODO --> Where does working data directory go? Is that just in the jupyterhub environment? does the tarball get deleted after or saved to the network in raw data?*

In [18]:
!mkdir working_data

mkdir: cannot create directory ‘working_data’: File exists


In [19]:
fw.download_tar(session,'./working_data/{opID}.tar'.format(opID=opID))


{'file_cnt': 17,
 'size': 1306307861,
 'ticket': '75892b43-45d8-453d-8a89-d0da2dd4277c'}

## Extract contents of Flywheel tar download:

In the following cells, you will:

1. Load tarball into jupyterhub notebook memory space

2. Set your dicom out directory and confirm permissions

3. Loop through tarball `.getmembers()` and then extract zipped dicoms

### Load into Memory:

In [20]:
f = open("working_data/{opID}.tar".format(opID=opID), 'rb') # <--- Flywheel download as Read Bytes
print ('Opening tar in memory as:',f,'\n')
tar_data = tarfile.open(fileobj=f, mode='r:') # <--- Unpack tar in memory

Opening tar in memory as: <_io.BufferedReader name='working_data/BPP01.tar'> 



### Set and Create your Out Directory:

- Jupyterhub does not respect secondary group permissions... so when I create a directory it's default to FMRISrvUser1@asc.upenn.edu instead of FMRISrvAHAUsers@asc.upenn.edu ... will manually correct

In [21]:
output_dicom_dir = '{outpath}/{opID}/'.format(outpath=outpath,opID=opID)
print(output_dicom_dir)

/fmriDataRaw/fmri_data_raw/bbprime/BPP01/


In [22]:
# Create the directory if not exist
if not os.path.exists(os.path.dirname(output_dicom_dir)):
    try:
        print('makedirs --> {}'.format(output_dicom_dir))
        os.makedirs(os.path.dirname(output_dicom_dir))
    except:
        print('oops! failed to create --> {}'.format(output_dicom_dir))        

makedirs --> /fmriDataRaw/fmri_data_raw/bbprime/BPP01/


## Confirm permissions for out directory

### Had to make the outdir permission 777 -R

- Secondary group permission is not respected in jhub so I had to manually change for my user created folder ... 

```bash
sudo chgrp fmrisrvahausers@asc.upenn.edu -R /AHAData/fmri_data_raw/
```

In [24]:
ls -la $output_dicom_dir

total 0
drwxrwxr-x. 1 dcosme@asc.upenn.edu          fmrisrv1users@asc.upenn.edu 0 Jul 13 20:12 [0m[01;34m.[0m/
drwxrwxr-x. 1 jcarrerastartak@asc.upenn.edu fmrisrv1users@asc.upenn.edu 0 Jul 13 20:12 [01;34m..[0m/


## EXTRACT YOUR TARBALL DICOM TO FMRISRV NETWORK STORAGE

In [25]:
for member in tar_data.getmembers():
    
    if 'dicom.zip' in member.name:       # <--- Only extract files with 'dicom.zip' 
        
        print('Extracting: {}\n'.format(member.name))
        
        tfile = tar_data.extractfile(member.name)
        dicom_zip = zipfile.ZipFile(tfile, mode='r')
        dicom_zip.extractall(output_dicom_dir)

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/localizer_multislice/1.3.12.2.1107.5.2.43.66044.2021071213544651472282909.0.0.0.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/PhoenixZIPReport/1.3.12.2.1107.5.2.43.66044.30000021071001403492400000326.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/AAHead_Scout/1.3.12.2.1107.5.2.43.66044.2021071213554977830483660.0.0.0.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/AAHead_Scout_MPR_sag/1.3.12.2.1107.5.2.43.66044.2021071213555412448584760.0.0.0.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/AAHead_Scout_MPR_cor/1.3.12.2.1107.5.2.43.66044.2021071213555412493984766.0.0.0.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/AAHead_Scout_MPR_tra/1.3.12.2.1107.5.2.43.66044.2021071213555412520484770.0.0.0.dicom.zip

Extracting: scitran/falklab/bbprime/bpp01/CAMRIS^Falk/MPRAGE_TI1100_ipat2/1.3.12.2.1107.5.2.43.66044.2021071213565473756285616.0.0.0.dicom.zip



### You have now successfully downloaded the dicom data from Flywheel to ASC servers

- this goes to `/fmriDataRaw/fmri_data_raw/bbprime/BPP00/`

In [26]:
os.listdir('{}'.format(output_dicom_dir))

['1.3.12.2.1107.5.2.43.66044.2021071213544651472282909.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071213554977830483660.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071213555412448584760.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071213555412493984766.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071213555412520484770.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071213565473756285616.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214031950833287040.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214110627851477758.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214111678937578166.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214202079579573394.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214203115195073802.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214294771759469030.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214295824791169438.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.2021071214383773482864666.0.0.0.dicom',
 '1.3.12.2.1107.5.2.43.66044.30000

## Loop through participants

The code below allows you to loop through multiple participants

### Define function

In [3]:
def transfer_data (sub, input_prefix, output_prefix):
    ipID = input_prefix + sub
    opID = output_prefix + sub
    outpath = '/fmriDataRaw/fmri_data_raw/{PROJECT}'.format(PROJECT=out_project)
    
    if os.path.exists(os.path.join(outpath, opID)) == False:
        print('-------------- Transferring: {} -------------'.format(ipID))
        
        # download tar file
        session = fw.lookup('falklab/{proj}/{pid}'.format(proj=in_project,pid=ipID))
        print ('Downloading tar')
        fw.download_tar(session,'./working_data/{opID}.tar'.format(opID=opID))
        
        # open tar file
        f = open("working_data/{opID}.tar".format(opID=opID), 'rb') # <--- Flywheel download as Read Bytes
        print ('Opening tar in memory as:',f,'\n')
        tar_data = tarfile.open(fileobj=f, mode='r:') # <--- Unpack tar in memory
        
        # Create the directory if not exist
        output_dicom_dir = '{outpath}/{opID}/'.format(outpath=outpath,opID=opID)

        if not os.path.exists(os.path.dirname(output_dicom_dir)):
            try:
                print('makedirs --> {}'.format(output_dicom_dir))
                os.makedirs(os.path.dirname(output_dicom_dir))
            except:
                print('oops! failed to create --> {}'.format(output_dicom_dir))     

                
        # Extract dicom files
        for member in tar_data.getmembers():
            if 'dicom.zip' in member.name:       # <--- Only extract files with 'dicom.zip' 
                print('Extracting: {}\n'.format(member.name))
        
                tfile = tar_data.extractfile(member.name)
                dicom_zip = zipfile.ZipFile(tfile, mode='r')
                dicom_zip.extractall(output_dicom_dir)
    else:
        print('-------------- Skipping existing participant: {} -------------'.format(ipID))
                
                

### Define variables
* subs = participant numbers minus the prefix (e.g. use 01 for bpp01)
* input_prefix = project prefix listed on flywheel
* output_prefix = desired project prefix

In [4]:
subs=['00', '01']
input_prefix = 'bpp'
output_prefix = 'BPP'

### Loop through specified participants

In [57]:
for sub in subs:
    transfer_data(sub, input_prefix, output_prefix)
    

-------------- Skipping existing participant: bpp00 -------------
-------------- Skipping existing participant: bpp01 -------------


In [8]:
!ls /fmriDataRaw/fmri_data_raw/bbprime/BPP00

1.3.12.2.1107.5.2.43.66044.2021070809401896847630066.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070809412281865230828.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070809412685334831917.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070809412685377431923.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070809412685402031927.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.202107080944116265832773.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070809513278109134197.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070810062780459723705.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070810063830959124113.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.202107081016361044818131.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070810164641187318539.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.2021070810264198449112557.0.0.0.dicom
1.3.12.2.1107.5.2.43.66044.30000021062420405850600001483.dicom
