# RBC Dataset

The Reproducible Brain Charts (RBC) dataset contains data from many studies that have been pre-processed and harmonized. This notebook demonstrates how to access these data using an extension of the library [`CloudPathLib`](https://cloudpathlib.drivendata.org/stable/).

## Getting Started

In [1]:
# We will need the RBCPath type from the rbclib package to load data from the RBC.
from rbclib import RBCPath

# We'll want to load some of the data using pandas.
import pandas as pd

In [2]:
# An RBC path is formatted as follows:
# rbc://GITHUB-REPO/GITHUB-PATH
# So we can represent the directory "freesurfer/sub-A00008326_ses-BAS1" of the RBC
# github repo github.com:ReproBrainChart/NKI_FreeSurfer as follows:
path = RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/')

In [3]:
# List the directory contents:
contents = list(path.iterdir())
contents

[RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_brainmeasures.json'),
 RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_brainmeasures.tsv'),
 RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_freesurfer.tar.xz'),
 RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_fsLR_den-164k.tar.xz'),
 RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_fsaverage.tar.xz'),
 RBCPath('rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_regionsurfacestats.tsv')]

## Loading Atlas data from a Participant

In [4]:
# Use pandas to read in the final TSV file in the list from the above code-cell.
# This TSV file contains 
rbcfile = contents[-1]
# We can alternatively create the file like so:
#    sub = 'A00008326'
#    ses = 'BAS1'
#    dataname = 'regionsurfacestats'
#    rbcroot = RBCPath('rbc://NKI_FreeSurfer/')
#    rbc_subroot = rbcroot / f'freesurfer/sub-{sub}_ses-{ses}'
#    rbcfile = rbc_subroot / f'sub-{sub}_ses-{ses}_{dataname}.tsv')

print(f"Loading {rbcfile} ...")
with rbcfile.open() as f:
    data = pd.read_csv(f, sep='\t')

data

Loading rbc://NKI_FreeSurfer/freesurfer/sub-A00008326_ses-BAS1/sub-A00008326_ses-BAS1_regionsurfacestats.tsv ...


Unnamed: 0,subject_id,session_id,atlas,hemisphere,StructName,NumVert,SurfArea,GrayVol,ThickAvg,ThickStd,...,StdDev_wgpct,Min_wgpct,Max_wgpct,Range_wgpct,SNR_wgpct,Mean_piallgi,StdDev_piallgi,Min_piallgi,Max_piallgi,Range_piallgi
0,sub-A00008326,ses-BAS1,aparc.DKTatlas,lh,caudalanteriorcingulate,1187,793,2194,2.699,0.512,...,4.9555,9.6039,44.0379,34.4340,5.0315,1.8048,0.0551,1.6733,1.9281,0.2548
1,sub-A00008326,ses-BAS1,aparc.DKTatlas,lh,caudalmiddlefrontal,2460,1673,4597,2.618,0.402,...,4.8932,7.7624,38.8681,31.1056,4.5725,2.8706,0.3948,2.1635,3.5799,1.4165
2,sub-A00008326,ses-BAS1,aparc.DKTatlas,lh,cuneus,2295,1475,3120,1.933,0.534,...,5.4967,-10.5589,38.9176,49.4765,3.3391,2.9339,0.0668,2.6635,3.0382,0.3746
3,sub-A00008326,ses-BAS1,aparc.DKTatlas,lh,entorhinal,542,340,1674,3.351,0.955,...,6.3479,-26.2322,31.2708,57.5030,2.8947,2.5333,0.1327,2.2914,2.8383,0.5469
4,sub-A00008326,ses-BAS1,aparc.DKTatlas,lh,fusiform,3069,2113,6719,2.797,0.590,...,4.9584,0.5954,35.8093,35.2140,4.5196,2.5661,0.1583,2.2976,2.9022,0.6047
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13731,sub-A00008326,ses-BAS1,Yeo2011_7Networks_N1000,rh,7Networks_3,10347,6852,18847,2.654,0.515,...,4.2728,-12.8948,39.2204,52.1152,5.1387,2.8263,0.2410,2.3550,4.0107,1.6556
13732,sub-A00008326,ses-BAS1,Yeo2011_7Networks_N1000,rh,7Networks_4,9932,6663,19705,2.664,0.626,...,5.5003,-25.6253,39.0625,64.6878,3.8252,3.0737,0.8213,1.6573,4.4777,2.8204
13733,sub-A00008326,ses-BAS1,Yeo2011_7Networks_N1000,rh,7Networks_5,8379,5614,20297,2.891,0.832,...,6.9942,-13.0905,58.8713,71.9618,3.0931,2.2784,0.2671,1.8429,3.8657,2.0228
13734,sub-A00008326,ses-BAS1,Yeo2011_7Networks_N1000,rh,7Networks_6,14278,9648,28479,2.583,0.619,...,5.5950,-3.6107,41.7263,45.3370,3.8605,2.7200,0.4850,1.6160,4.3316,2.7156


## Load in meta-data about the participants

In [5]:
# Participant meta-data is generally located in the BIDS repository for each study:
participants_file = RBCPath('rbc://NKI_BIDS/study-NKI_desc-participants.tsv')

print("Loading NKI participants TSV file...")
with participants_file.open() as f:
    participants = pd.read_csv(f, sep='\t')

participants

Loading NKI participants TSV file...


Unnamed: 0,participant_id,wave,study,study_site,session_id,sex,age,ethnicity,race,bmi,handedness,participant_education,parent_1_education,parent_2_education,p_factor_mcelroy_harmonized_all_samples,internalizing_mcelroy_harmonized_all_samples,externalizing_mcelroy_harmonized_all_samples,attention_mcelroy_harmonized_all_samples,cubids_acquisition_group
0,A00008326,BAS1,NKI,NKI,BAS1,Female,59.333333,not Hispanic or Latino,White,26.040364,Right,High School Diploma,Complete secondary,Complete primary,,,,,43
1,A00008326,BAS2,NKI,NKI,BAS2,Female,64.500000,not Hispanic or Latino,White,28.720636,Right,High School Diploma,Complete primary,Complete primary,,,,,1
2,A00008326,FLU1,NKI,NKI,FLU1,Female,65.500000,not Hispanic or Latino,White,28.321664,Right,High School Diploma,Complete primary,Complete primary,,,,,1
3,A00008399,BAS1,NKI,NKI,BAS1,Male,23.333333,not Hispanic or Latino,White,30.715135,Right,Some College,Complete tertiary,Complete secondary,,,,,232
4,A00010893,BAS1,NKI,NKI,BAS1,Male,28.916667,not Hispanic or Latino,Black,22.865049,Right,High School Diploma,Complete tertiary,Complete secondary,,,,,141
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2301,A00086474,BAS1,NKI,NKI,BAS1,Female,58.833333,not Hispanic or Latino,White,30.711185,Right,Some College,No/incomplete primary,No/incomplete primary,,,,,10
2302,A00086474,FLU1,NKI,NKI,FLU1,Female,59.750000,not Hispanic or Latino,White,30.853210,Right,Some College,No/incomplete primary,No/incomplete primary,,,,,25
2303,A00086551,BAS1,NKI,NKI,BAS1,Female,40.833333,not Hispanic or Latino,Black,24.558121,Right,Some College,Complete tertiary,,,,,,10
2304,A00086551,FLU1,NKI,NKI,FLU1,Female,41.666667,not Hispanic or Latino,Black,24.426548,Right,Some College,Complete tertiary,,,,,,12
