# Who is Ready to Upload?

This notebook tracks who is ready to upload data to flywheel. These are individuals who are currently BIDS valid on CUBIC.

In [42]:
import pandas as pd
import flywheel


def get_target_from_code(dataset, code):
    
    df = pd.read_csv('/cbica/projects/RBC/flywheel_curation/RBC/PennLINC/Validation/CUBIC_Curation/{}_validation.csv'.format(dataset))

    df[['subject', 'session', 'folder', 'filename']] = df['files'].str.split('/', expand=True).loc[:,1:4]

    res = df.loc[(df['code'] == code), ['files', 'subject', 'session', 'filename']]
    return res

---

# NKI

The NKI data was valid prior to downloading; we do need to move functional data to perf/ASL format before uploading it to flywheel though. No updates yet.

---

# HBN

Here we tally all of the subjects who did *not* have a BIDS validator error call on their files.

In [31]:
with open('/cbica/projects/RBC/flywheel_curation/RBC/PennLINC/Validation/HBN/hbn_subjects2.txt', 'r') as read_file:
    hbn_subjects = read_file.read()
    hbn_subjects = hbn_subjects.split('\n')

Here's the list of HBN subjects

In [38]:
hbn_subjects[:10]

['sub-NDARAA112DMH',
 'sub-NDARHR753ZKU',
 'sub-NDARAA117NEJ',
 'sub-NDARAC904DMU',
 'sub-NDARAE012DGA',
 'sub-NDARAN814UPR',
 'sub-NDARAP176AD1',
 'sub-NDARPV595RWB',
 'sub-NDARAV031PPJ',
 'sub-NDARWJ498CZY']

In [39]:
len(hbn_subjects)

2619

In [25]:
bids_validator_output = pd.read_csv('/cbica/projects/RBC/flywheel_curation/RBC/PennLINC/Validation/CUBIC_Curation/HBN_validation.csv')
bids_validator_output[['subject', 'session', 'folder', 'filename']] = bids_validator_output['files'].str.split('/', expand=True).loc[:,1:4]

invalid = bids_validator_output[['subject']].drop_duplicates().values.tolist()

invalid = list(set([y for x in invalid for y in x]))

Here's the list of subjects who had errors in the BIDS validator:

In [40]:
invalid[:10]

[nan,
 'sub-NDARVY859ENR',
 'sub-NDARUT233WU9',
 'sub-NDARBK106KRH',
 'sub-NDARBM839WR5',
 'sub-NDARAR238RZ8',
 'sub-NDARRZ927VC3',
 'sub-NDARGK943RL3',
 'sub-NDARNB390JL3',
 'sub-NDARMX328VWC']

In [26]:
len(invalid)

2024

So the remaining subjects are the difference:

In [35]:
to_upload = [x for x in hbn_subjects if x not in invalid]

In [41]:
to_upload[:5]

['sub-NDARPV595RWB',
 'sub-NDARPY458LTR',
 'sub-NDARRA383KVQ',
 'sub-NDARRA981BCM',
 'sub-NDARRE063LG2']

In [37]:
len(to_upload)

596

We also need the subjects currently on flywheel to make sure we don't overwrite data:

In [52]:
client = flywheel.Client()

proj = client.projects.find_first('label=RBC_HBN_cubic')

subjects = proj.subjects() # at this time it should be empty

existing_subject_labels = [x.label for x in subjects]



In [53]:
print(proj.label, '(', proj.id, ') num of subjects:')
print(len(existing_subject_labels))

RBC_HBN_cubic ( 5f75026f58e86f0fcabb7d23 ) num of subjects:
0


In [55]:
to_upload = [x for x in to_upload if x not in existing_subject_labels]
len(to_upload)

596

We write the `to_upload` file out and upload with the SGE:

In [51]:
with open("/cbica/projects/RBC/flywheel_curation/RBC/PennLINC/Validation/CUBIC_Curation/upload_ready_subject_lists/hbn_upload.txt", "w") as outfile:
    outfile.write("\n".join(to_upload))

In a bash shell:
```
while read line; 
do 
  qsub ./upload_hbn_qsub.sh $line; 
done < /cbica/projects/RBC/flywheel_curation/RBC/PennLINC/Validation/CUBIC_Curation/upload_ready_subject_lists/hbn_upload.txt
```