# Grouping Edits for File Classifications

In the case that you might want to edit a field uniformly, we can write a script that groups the acquisition files by certain fields. In this example, we show how to group the unique sequence names, TR, and voxel size.

First, we query flywheel for the full project:

In [1]:
import flywheel
import pandas as pd
# add the script to the path
import sys
import os
sys.path.append(os.path.abspath("/home/ttapera/bids-on-flywheel/flywheel_bids_tools"))
import query_bids
import upload_bids
from tqdm import tqdm
import math

  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
fw = flywheel.Client()
result = query_bids.query_fw("Reward2018", fw)

Convert this to a dataframe:

In [3]:
view = fw.View(columns='subject')
subject_df = fw.read_view_dataframe(view, result.id)
sessions = []
view = fw.View(columns='acquisition')
pbar = tqdm(total=100)

for ind, row in subject_df.iterrows():
    session = fw.read_view_dataframe(view, row["subject.id"])
    if(session.shape[0] > 0):
        sessions.append(session)
    pbar.update(10)
pbar.close()

3570it [00:42, 83.14it/s]                       


In [4]:
acquisitions = pd.concat(sessions)

And next, extract the acquisition's BIDS data.

A slight modification we add to the BIDS extractor function is adding the file classification, Series name, and TR (what we assume will be useful grouping criteria)

In [5]:
#an example
ac = fw.get('5c744134ba25800039399544')

# see where the targets are?
ac.files

[{'classification': {},
  'created': datetime.datetime(2019, 2, 25, 19, 29, 3, 179000, tzinfo=tzutc()),
  'hash': '',
  'id': '4039d74a-28e5-4b9d-b4c3-443e7e552329',
  'info': {},
  'info_exists': None,
  'mimetype': 'text/xml',
  'modality': None,
  'modified': datetime.datetime(2019, 2, 25, 19, 29, 2, 875000, tzinfo=tzutc()),
  'name': 'STIM_catalog.xml',
  'origin': {'id': 'harshakethineni@flywheel.io',
             'method': None,
             'name': None,
             'type': 'user',
             'via': None},
  'replaced': None,
  'size': 564,
  'tags': [],
  'type': 'markup',
  'zip_member_count': None}, {'classification': {},
  'created': datetime.datetime(2019, 2, 25, 19, 29, 3, 499000, tzinfo=tzutc()),
  'hash': '',
  'id': '689da673-ce47-4787-ab88-09611c941497',
  'info': {},
  'info_exists': None,
  'mimetype': 'text/xml',
  'modality': None,
  'modified': datetime.datetime(2019, 2, 25, 19, 29, 3, 199000, tzinfo=tzutc()),
  'name': 'PRESENTATION_catalog.xml',
  'origin': {

In [6]:
def extract_bids_data(acquisitionID, client):
    """Extract the BIDS data of an acquisition

    A helper function to dig into the file.info container
    (a dictionary of dictionaries) and extract the BIDS validity fields.

    Parameters
    -------
    acquisitionID
        The mongoDB hash key to identify the object.
    client
        The flywheel Client class object.

    Returns
    --------
    df
        A table of the bids fields and values.
    """
    # create the acquisition object and pull the niftis
    try:
        acq = client.get(acquisitionID)
    except:
        #global NO_DATA
        #NO_DATA += 1
        return None
    niftis = [x for x in acq.files if x['type'] == 'nifti']
    # if there are no niftis, return
    if (len(niftis) < 1):
        #global UNCLASSIFIED
        #UNCLASSIFIED += 1
        return None
    else:
        df = []
        # for each nifti, if the info has a BIDS dict:
        for nii in niftis:
            info = nii['info']
            bids = {}
            # also add the acquisition id to the dict for joining purposes
            bids['acquisition.id'] = str(acquisitionID)
            
            # get bids classifier info if it exists
            if 'BIDS' in info.keys() and isinstance(info['BIDS'], dict):
                # pull out the bids info
                bids.update(nii['info']['BIDS'])

            # include the classification
            if 'classification' in nii.keys():
                bids.update(nii.classification)
            
            # get nifti info
            bids['SeriesDescription'] = info['SeriesDescription']
            bids['RepetitionTime'] = info['RepetitionTime']
            bids['SequenceName'] = info['SequenceName']
            bids['Modality'] = info['Modality']
            
            df.append(bids)

        return(df)

In [7]:
dd=extract_bids_data("5c1a8bbc9011bd0011369990", fw)

In [8]:
dd

[{'acquisition.id': '5c1a8bbc9011bd0011369990',
  'Run': '',
  'error_message': '',
  'Ce': '',
  'Filename': 'sub-120217_ses-nodra_T1w.nii.gz',
  'ignore': False,
  'Acq': '',
  'valid': True,
  'template': 'anat_file',
  'Rec': '',
  'Path': 'sub-120217/ses-nodra/anat',
  'Folder': 'anat',
  'Modality': 'MR',
  'Mod': '',
  'Intent': ['Structural'],
  'Measurement': ['T1'],
  'SeriesDescription': 'MPRAGE_NAVprotocol',
  'RepetitionTime': 0.0112,
  'SequenceName': 'Moco3d1_32ns'}]

In [9]:
# loop through the acquisitions to extract the bids validity data
bids_classifications = []
pbar = tqdm(total=100)
tempDF = acquisitions.iloc[:200,]
for ind, row in acquisitions.iloc[:200,].iterrows():
    temp_info = extract_bids_data(row["acquisition.id"], fw)
    if temp_info is not None:
        bids_classifications.extend(temp_info)
    pbar.update(10)
pbar.close()
bids_classifications = pd.DataFrame(bids_classifications)

2000it [00:13, 157.00it/s]                       


Intent and measurement have been added, though as lists. We can still group by series description. 

In [10]:
bids_classifications.drop_duplicates(['SeriesDescription', 'RepetitionTime'])

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
0,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,,epfid2d1_64,bbl1_restbold_mb6_742,,5c1a796d9011bd0014368993,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
1,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc1_168,,5c1a796d9011bd00153688c1,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
2,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,1.85,,Moco3d1,MPRAGE_TI1110_ipat2_moco3,,5c1a796e9011bd00133688e9,,False,anat_file,True
3,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,0.0112,,Moco3d1_32ns,MPRAGE_NAVprotocol,,5c1a796e9011bd00133688ea,,False,anat_file,True
4,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc2_168,,5c1a796e9011bd0014368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
5,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_124,,5c1a796e9011bd00113688d6,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
7,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer,,5c1a796e9011bd00133688eb,,,,
10,,,,,,,[Shim],,,MR,...,2.0,,_epfid2d1_64,epi_singlerep_advshim,,5c1a796e9011bd00153688c5,,,,
11,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_204,,5c1a7ee19011bd0011368aac,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
12,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer_8channel,,5c1a848d9011bd0014369702,,,,


Note that this isn't *strictly* a grouped dataframe. We have effectively emulated grouping by dropping duplicate rows from specific columns. This means that within each group of non-duplicate columns, there are duplicate values in the full dataset; for example, there is only one line for MPRAGE_NAV, but there are 16 different values for other files. These fields, therefore, should not be changed en masse unless absolutely certain.

In [11]:
bids_classifications.groupby(['SeriesDescription', 'RepetitionTime']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,Path,Rec,Run,SequenceName,Task,acquisition.id,error_message,ignore,template,valid
SeriesDescription,RepetitionTime,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
MPRAGE_NAVprotocol,0.0112,16,16,0,0,16,16,16,16,16,16,16,16,16,16,0,16,16,16,16,16
MPRAGE_TI1100_ipat2,1.81,2,2,0,0,2,2,2,2,2,2,2,2,2,2,0,2,2,2,2,2
MPRAGE_TI1110_ipat2_moco3,1.85,8,8,0,0,8,8,8,8,8,8,8,8,8,8,0,8,8,8,8,8
bbl1_restbold1_124,3.0,6,0,1,6,6,6,6,6,0,6,6,6,6,6,6,6,6,6,6,6
bbl1_restbold1_204,3.0,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
bbl1_restbold_mb6_742,0.5,7,0,0,7,7,7,7,7,0,7,7,7,7,7,7,7,7,7,7,7
ep2d_effort1_168,3.0,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
ep2d_effort1_236,3.0,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
ep2d_effort2_168,3.0,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1
ep2d_effort2_236,3.0,1,0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1


As an aside, we should also have a system for dealing with list objects in columns:

In [12]:
bids_classifications.Intent.values

array([list(['Functional']), list(['Functional']), list(['Structural']),
       list(['Structural']), list(['Functional']), list(['Functional']),
       list(['Structural']), list(['Localizer']), list(['Localizer']),
       list(['Localizer']), list(['Shim']), list(['Functional']),
       list(['Localizer']), list(['Localizer']), list(['Localizer']),
       list(['Functional']), list(['Functional']), list(['Functional']),
       list(['Structural']), list(['Structural']), list(['Functional']),
       list(['Structural']), list(['Shim']), list(['Localizer']),
       list(['Localizer']), list(['Localizer']), list(['Functional']),
       list(['Structural']), list(['Functional']), list(['Structural']),
       list(['Functional']), list(['Structural']), list(['Shim']),
       list(['Localizer']), list(['Localizer']), list(['Localizer']),
       list(['Structural']), list(['Functional']), list(['Functional']),
       list(['Structural']), list(['Functional']), list(['Functional']),
       l

In [13]:
def unlist_item(ls):
    
    if type(ls) is list:
        ls.sort()
        return(', '.join(x for x in ls))
    else:
        return float('nan')

In [14]:
l = ['Functional']
unlist_item(l)

'Functional'

In [15]:
bids_classifications.Measurement.apply(unlist_item)

0      T2*
1      T2*
2       T1
3       T1
4      T2*
5      T2*
6       T1
7       T2
8       T2
9       T2
10     NaN
11     T2*
12      T2
13      T2
14      T2
15     T2*
16     T2*
17     T2*
18      T1
19      T1
20     T2*
21      T1
22     NaN
23      T2
24      T2
25      T2
26     T2*
27      T1
28     T2*
29      T1
      ... 
79     T2*
80     T2*
81     T2*
82      T1
83      T1
84     T2*
85      T1
86     NaN
87      T2
88      T2
89      T2
90     T2*
91     T2*
92     T2*
93     T2*
94      T1
95      T2
96      T2
97      T2
98     T2*
99     T2*
100    T2*
101    T2*
102    T2*
103     T1
104     T2
105     T2
106     T2
107    T2*
108    T2*
Name: Measurement, Length: 109, dtype: object

In [16]:
def relist_item(string):
    
    if type(string) is str:
        return(string.split(','))
    else:
        return(float('nan'))

In [17]:
bids_classifications.Measurement.apply(unlist_item).apply(relist_item)

0      [T2*]
1      [T2*]
2       [T1]
3       [T1]
4      [T2*]
5      [T2*]
6       [T1]
7       [T2]
8       [T2]
9       [T2]
10       NaN
11     [T2*]
12      [T2]
13      [T2]
14      [T2]
15     [T2*]
16     [T2*]
17     [T2*]
18      [T1]
19      [T1]
20     [T2*]
21      [T1]
22       NaN
23      [T2]
24      [T2]
25      [T2]
26     [T2*]
27      [T1]
28     [T2*]
29      [T1]
       ...  
79     [T2*]
80     [T2*]
81     [T2*]
82      [T1]
83      [T1]
84     [T2*]
85      [T1]
86       NaN
87      [T2]
88      [T2]
89      [T2]
90     [T2*]
91     [T2*]
92     [T2*]
93     [T2*]
94      [T1]
95      [T2]
96      [T2]
97      [T2]
98     [T2*]
99     [T2*]
100    [T2*]
101    [T2*]
102    [T2*]
103     [T1]
104     [T2]
105     [T2]
106     [T2]
107    [T2*]
108    [T2*]
Name: Measurement, Length: 109, dtype: object

Now that the list columsn are taken care of, we can attempt to edit something in a grouped view, and map that edit to the full dataframe.

In [18]:
grouped_view = bids_classifications.drop_duplicates(['SeriesDescription', 'RepetitionTime'])

In [19]:
grouped_view

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
0,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,,epfid2d1_64,bbl1_restbold_mb6_742,,5c1a796d9011bd0014368993,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
1,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc1_168,,5c1a796d9011bd00153688c1,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
2,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,1.85,,Moco3d1,MPRAGE_TI1110_ipat2_moco3,,5c1a796e9011bd00133688e9,,False,anat_file,True
3,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,0.0112,,Moco3d1_32ns,MPRAGE_NAVprotocol,,5c1a796e9011bd00133688ea,,False,anat_file,True
4,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc2_168,,5c1a796e9011bd0014368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
5,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_124,,5c1a796e9011bd00113688d6,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
7,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer,,5c1a796e9011bd00133688eb,,,,
10,,,,,,,[Shim],,,MR,...,2.0,,_epfid2d1_64,epi_singlerep_advshim,,5c1a796e9011bd00153688c5,,,,
11,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_204,,5c1a7ee19011bd0011368aac,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
12,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer_8channel,,5c1a848d9011bd0014369702,,,,


Using effort as an example, we can change the task for each of them to "Effort".

In [20]:
mask = grouped_view['SeriesDescription'] == "ep2d_effort1_168"

In [21]:
mask = grouped_view['SeriesDescription'].str.contains("effort")
grouped_view.loc[mask,]

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
37,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort2_236,,5c1a80139011bd0014368e9c,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
38,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,,epfid2d1_64,ep2d_effort3_1416,,5c1a80139011bd0015368c2d,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
40,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort1_236,,5c1a80139011bd0014368e9f,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
90,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort3_168,,5c1a7e169011bd0011368985,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
91,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort4_168,,5c1a7e169011bd00133689d2,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
92,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort2_168,,5c1a7e179011bd0015368995,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
93,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort1_168,,5c1a7e179011bd0015368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False


In [22]:
grouped_view.loc[mask, 'Task'] = "Effort"

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [23]:
grouped_view.loc[mask,]

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
37,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort2_236,Effort,5c1a80139011bd0014368e9c,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
38,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,,epfid2d1_64,ep2d_effort3_1416,Effort,5c1a80139011bd0015368c2d,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
40,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort1_236,Effort,5c1a80139011bd0014368e9f,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
90,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort3_168,Effort,5c1a7e169011bd0011368985,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
91,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort4_168,Effort,5c1a7e169011bd00133689d2,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
92,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort2_168,Effort,5c1a7e179011bd0015368995,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
93,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_effort1_168,Effort,5c1a7e179011bd0015368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False


Let's also change the run to correspond with the series description:

In [24]:
mask = grouped_view['SeriesDescription'].str.contains("effort1")
grouped_view.loc[mask,'Run'] = 1
mask = grouped_view['SeriesDescription'].str.contains("effort2")
grouped_view.loc[mask,'Run'] = 2
mask = grouped_view['SeriesDescription'].str.contains("effort3")
grouped_view.loc[mask,'Run'] = 3
mask = grouped_view['SeriesDescription'].str.contains("effort4")
grouped_view.loc[mask,'Run'] = 4

grouped_view

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
0,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,,epfid2d1_64,bbl1_restbold_mb6_742,,5c1a796d9011bd0014368993,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
1,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc1_168,,5c1a796d9011bd00153688c1,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
2,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,1.85,,Moco3d1,MPRAGE_TI1110_ipat2_moco3,,5c1a796e9011bd00133688e9,,False,anat_file,True
3,,,,,sub-90683_ses-nodra_T1w.nii.gz,anat,[Structural],[T1],,MR,...,0.0112,,Moco3d1_32ns,MPRAGE_NAVprotocol,,5c1a796e9011bd00133688ea,,False,anat_file,True
4,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,ep2d_itc2_168,,5c1a796e9011bd0014368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
5,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_124,,5c1a796e9011bd00113688d6,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
7,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer,,5c1a796e9011bd00133688eb,,,,
10,,,,,,,[Shim],,,MR,...,2.0,,_epfid2d1_64,epi_singlerep_advshim,,5c1a796e9011bd00153688c5,,,,
11,,,,,sub-90683_ses-nodra_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,,_epfid2d1_64,bbl1_restbold1_204,,5c1a7ee19011bd0011368aac,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
12,,,,,,,[Localizer],[T2],,MR,...,0.0086,,_fl2d1,localizer_8channel,,5c1a848d9011bd0014369702,,,,


So we've manipulated the data for a grouped view of Reward Effort scans. The next step is to map the group changes to the full spreadsheet.

In [25]:
modified_grouped = grouped_view.copy()
modified_flat = bids_classifications.copy()

In [26]:
# pandas.update works perfectly for this
modified_flat.update(modified_grouped)

In [27]:
modified_flat.loc[modified_flat['SeriesDescription'].str.contains("effort"),]

Unnamed: 0,Acq,Ce,Custom,Echo,Filename,Folder,Intent,Measurement,Mod,Modality,...,RepetitionTime,Run,SequenceName,SeriesDescription,Task,acquisition.id,error_message,ignore,template,valid
37,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,2,_epfid2d1_64,ep2d_effort2_236,Effort,5c1a80139011bd0014368e9c,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
38,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,0.5,3,epfid2d1_64,ep2d_effort3_1416,Effort,5c1a80139011bd0015368c2d,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
40,,,,,sub-93274_ses-neff2_task-{file.info.BIDS.Task}...,func,[Functional],[T2*],,MR,...,3.0,1,_epfid2d1_64,ep2d_effort1_236,Effort,5c1a80139011bd0014368e9f,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
90,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,3,_epfid2d1_64,ep2d_effort3_168,Effort,5c1a7e169011bd0011368985,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
91,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,4,_epfid2d1_64,ep2d_effort4_168,Effort,5c1a7e169011bd00133689d2,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
92,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,2,_epfid2d1_64,ep2d_effort2_168,Effort,5c1a7e179011bd0015368995,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False
93,,,,,sub-83835_ses-neff_task-{file.info.BIDS.Task}_...,func,[Functional],[T2*],,MR,...,3.0,1,_epfid2d1_64,ep2d_effort1_168,Effort,5c1a7e179011bd0015368996,Task u'' does not match '^[a-zA-Z0-9]+$',False,func_file,False


And as procedure, we can check if these changes are valid with the `upload_bids` tools

In [28]:
diff = upload_bids.get_unequal_cells(modified_flat, bids_classifications)

In [35]:
(modified_flat.applymap(type) == list).all()

Acq                  False
Ce                   False
Custom               False
Echo                 False
Filename             False
Folder               False
Intent                True
Measurement          False
Mod                  False
Modality             False
Path                 False
Rec                  False
RepetitionTime       False
Run                  False
SequenceName         False
SeriesDescription    False
Task                 False
acquisition.id       False
error_message        False
ignore               False
template             False
valid                False
dtype: bool

In [38]:
list_cols = (modified_flat.applymap(type) == list).all()
modified_flat.loc[:,list_cols].applymap(unlist_item)

Unnamed: 0,Intent
0,Functional
1,Functional
2,Structural
3,Structural
4,Functional
5,Functional
6,Structural
7,Localizer
8,Localizer
9,Localizer


In [29]:
upload_bids.validate_on_unequal_cells(diff, modified_flat)

The following changes don't seem to be valid for this data:

Row 38, Column 14, "2"
This field only accepts strings!

Row 39, Column 14, "3"
This field only accepts strings!

Row 41, Column 14, "1"
This field only accepts strings!

Row 91, Column 14, "3"
This field only accepts strings!

Row 92, Column 14, "4"
This field only accepts strings!

Row 93, Column 14, "2"
This field only accepts strings!

Row 94, Column 14, "1"
This field only accepts strings!


False

We can wrap up the above processes in two new modules: `grouped_query` and `ungroup_query`, which respectively will wrap the grouping and ungrouping process.