# Understanding the Subjects, Sessions, & Modalities in the Project

This notebook quickly wrangles the subjects, sessions, and modalities in the project

In [4]:
from bids import BIDSLayout

layout = BIDSLayout('/cbica/projects/RBC/HRC/working/BIDS', validate=False)

In [8]:
layout

BIDS Layout: .../projects/RBC/HRC/working/BIDS | Subjects: 608 | Sessions: 905 | Runs: 751

What are the session values?

In [9]:
layout.get_sessions()

['1', '2']

What folders (modalities) do we have?

In [18]:
df = layout.to_df()

In [23]:
set(df.datatype.values)

{'anat', 'dwi', 'func', nan}

In [27]:
for x in set(df.datatype.values):
    print("Acquisitions in", x, ":")
    print(set(df[df.datatype == x].suffix))


Acquisitions in nan :
set()
Acquisitions in anat :
{'T1w'}
Acquisitions in dwi :
{'dwi'}
Acquisitions in func :
{'bold'}


In [28]:
df

entity,path,datatype,extension,reconstruction,run,session,subject,suffix,task
0,/cbica/projects/RBC/HRC/working/BIDS/dataset_d...,,json,,,,,description,
1,/cbica/projects/RBC/HRC/working/BIDS/sub-10001...,anat,json,refaced,1,1,10001,T1w,
2,/cbica/projects/RBC/HRC/working/BIDS/sub-10001...,anat,nii.gz,refaced,1,1,10001,T1w,
3,/cbica/projects/RBC/HRC/working/BIDS/sub-10001...,dwi,bval,,1,1,10001,dwi,
4,/cbica/projects/RBC/HRC/working/BIDS/sub-10001...,dwi,bvec,,1,1,10001,dwi,
...,...,...,...,...,...,...,...,...,...
7071,/cbica/projects/RBC/HRC/working/BIDS/sub-21927...,func,json,,1,2,21927,bold,rest
7072,/cbica/projects/RBC/HRC/working/BIDS/sub-21927...,func,nii.gz,,1,2,21927,bold,rest
7073,/cbica/projects/RBC/HRC/working/BIDS/sub-21927...,func,json,,2,2,21927,bold,rest
7074,/cbica/projects/RBC/HRC/working/BIDS/sub-21927...,func,nii.gz,,2,2,21927,bold,rest


In [29]:
for x in set(df.datatype.values):
    print("Filetypes in", x, ":")
    print(set(df[df.datatype == x].extension))


Filetypes in nan :
set()
Filetypes in anat :
{'nii.gz', 'json'}
Filetypes in dwi :
{'bval', 'nii.gz', 'bvec', 'json'}
Filetypes in func :
{'nii.gz', 'json'}


Are there special tasks?

In [31]:
set(df[df.datatype == "func"].task)

{'rest'}

Are there metadata fields that should be removed?

In [36]:
#the following stores the output of a bash command in a variable
fields = !bond-print-metadata-fields ../BIDS/
print(fields)

['Acknowledgements', 'AcquisitionMatrixPE', 'AcquisitionNumber', 'AcquisitionTime', 'Authors', 'BIDSVersion', 'CoilString', 'ConversionSoftware', 'ConversionSoftwareVersion', 'DatasetDOI', 'DeviceSerialNumber', 'Dim1Size', 'Dim2Size', 'Dim3Size', 'EchoTime', 'EffectiveEchoSpacing', 'FlipAngle', 'Funding', 'HowToAcknowledge', 'ImageOrientationPatientDICOM', 'ImageType', 'ImagingFrequency', 'InPlanePhaseEncodingDirectionDICOM', 'InstitutionName', 'InternalPulseSequenceName', 'InversionTime', 'License', 'MRAcquisitionType', 'MagneticFieldStrength', 'Manufacturer', 'ManufacturersModelName', 'Modality', 'Name', 'NumVolumes', 'NumberOfAverages', 'Obliquity', 'ParallelReductionFactorInPlane', 'PatientPosition', 'PercentPhaseFOV', 'PercentSampling', 'PhaseEncodingDirection', 'PhaseEncodingPolarityGE', 'PixelBandwidth', 'ProcedureStepDescription', 'ProtocolName', 'PulseSequenceName', 'ReconMatrixPE', 'ReferencesAndLinks', 'RepetitionTime', 'SAR', 'ScanOptions', 'ScanningSequence', 'SequenceVari

In [37]:
sensitive = "AccessionNumber, PatientBirthDate, PatientID, PatientName, PatientSex, AcquisitionDateTime, SeriesInstanceUID, DeviceSerialNumber, InstitutionAddress, AcquisitionTime, StationName, ReferringPhysicianName, InstitutionName, InstitutionalDepartmentName, AccessionNumber".split(",")

In [39]:
any([x in sensitive for x in fields])

False

No sensitive metadata that needs to be removed.