# Get Files and Put them in a CSV
- To prepare a master list, create a dictionary as instructed, where each dataset is a key
    - 'nifti_path': str, a wildcarded path to NIFTI files.
    - 'csv_path': str, the absolute path to a CSV file containing subject data.
    - 'subj_col': str, the column name in the CSV file that contains subject IDs.
    - 'indep_col': str, the column name in the CSV file that contains the independent variable.
    - 'covariate_col': dict, a dictionary where keys are common covariate names and values are the corresponding column names in the CSV file.

- Example:
```
data_dict = {
    'Dataset1': {
        'nifti_path': '/path/to/niftis/*.nii.gz',
        'csv_path': '/path/to/csv1.csv',
        'subj_col': 'sub',
        'indep_col': 'Indep. Var.',
        'covariate_col': {'age': 'Age', 'sex': 'Sex', 'baseline': 'ADAS-Cog11'}
    },
    'Dataset2': {
        'nifti_path': '/path/to/niftis/*.nii.gz',
        'csv_path': '/path/to/csv2.csv',
        'subj_col': 'sub',
        'indep_col': 'Indep. Var.',
        'covariate_col': {'age': 'pt_Age', 'sex': 'Sex', 'baseline': 'MDRS'}
    }
}
```

In [3]:
data_dict = {
    'adni_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/adni/neuroimaging/all_patients_atrophy_seeds/sub-*/ses*/unthresholded_tissue_segment_z_scores/*_grey_matter+cerebrospinal_fluid.nii',
        'csv_path': '/Volumes/OneTouch/datasets/adni/metadata/updated_master_list/master_dx_updated_fix_composite.csv',
        'subj_col': 'subid',
        'indep_col': 'Q4',
        'covariate_col': {'age': 'Age', 'sex': 'Sex', 'diagnosis': 'DIAGNOSIS_CURRENT_Str', 'overal_cognition': 'TOTAL11', 'lesion_size': ''}
    },
    'corbetta_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/CORBETTA_STROKE_MULTIFOCAL/BIDS_Dataset/sub-*/connectivity/sub-*-yeo1k_stat-t_conn.nii.gz',
        'csv_path': '/Volumes/OneTouch/datasets/CORBETTA_STROKE_MULTIFOCAL/Study_Metadata/3month_arm_1_with_basic_info.csv',
        'subj_col': 'subid',
        'indep_col': 'hvlt_perc',
        'covariate_col': {'age': 'age', 'sex': 'gender', 'diagnosis': 'stroke', 'overal_cognition': 'nihss_hospital', 'lesion_size': ''}
    },
    'grafmann_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/GRAFMAN_TBI_MULTIFOCAL/grafman_fc/sub-*/connectivity/sub-*-yeo1000udil_space-2mm_stat-t_conn.nii.gz',
        'csv_path': '/Volumes/OneTouch/datasets/GRAFMAN_TBI_MULTIFOCAL/metadata/master_list.csv',
        'subj_col': 'vhis_id',
        'indep_col': 'mmse6',
        'covariate_col': {'age': '', 'sex': '', 'diagnosis': 'diagnosis', 'tbi': 'mmse12', 'lesion_size': 'lesion_size'}
    },
    'manitoba_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/Manitoba_Epilepsy_PET/PET_Conn/sub-*-2mm_lesionMask_T.nii.gz',
        'csv_path': '/Volumes/OneTouch/datasets/Manitoba_Epilepsy_PET/metadata/master_list.csv',
        'subj_col': 'subject',
        'indep_col': 'Overall_Memory',
        'covariate_col': {'age': 'Age_At_Testing', 'sex': 'Sex', 'diagnosis': 'epilepsy', 'overal_cognition': 'Total (max=100)', 'lesion_size': ''}
    },
    'kahana_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/Kahana_Epilepsy_iEEG/derivatives/connectivity/sub*_T.nii.gz',
        'csv_path': '/Volumes/OneTouch/datasets/Kahana_Epilepsy_iEEG/master_list.csv',
        'subj_col': 'subject',
        'indep_col': 'deltarec',
        'covariate_col': {'age': '', 'sex': '', 'diagnosis': 'epilepsy', 'overal_cognition': '', 'lesion_size': ''}
    },
    'queensland_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/Queensland_PD_DBS_STN/BIDSdata/derivatives/leaddbs/sub-*/stimulations/MNI152NLin2009bAsym/gs_20200826142724/GSP 1000 (Yeo 2011)_Full Set (Yeo 2011)/vat_seed_compound_fMRI_efield_func_seed_T.nii',
        'csv_path': '/Volumes/OneTouch/datasets/Queensland_PD_DBS_STN/Clinical/queensland_cognition.csv',
        'subj_col': 'Subject',
        'indep_col': 'MOCA_Recall_change_PreToFU4_percent',
        'covariate_col': {'age': 'age_at_surgery', 'sex': '', 'diagnosis': 'parkinson', 'overal_cognition': 'FU4_MOCA_Total', 'lesion_size': ''}
    },
    'sante_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/SANTE_Epilepsy_DBS_ANT/derivatives/conn/sub-*/connectivity/sub-*-yeo1000udil_space-2mm_stat-t_conn.nii.gz',
        'csv_path': '/Volumes/OneTouch/datasets/SANTE_Epilepsy_DBS_ANT/metadata/sante_cognitive_scores_with_percent_change_WIDE.csv',
        'subj_col': 'subject',
        'indep_col': 'Trouble remembering-number Percent Change Corrected V2_Month 6',
        'covariate_col': {'age': '', 'sex': '', 'diagnosis': 'epilepsy', 'overal_cognition': 'Concentrating on reading-number Percent Change Corrected V2_Month 6', 'lesion_size': ''}
    },
    'fornix_memory': {
        'nifti_path': '/Volumes/OneTouch/datasets/AD_dataset/*/stimulations/MNI_ICBM_2009b_NLIN_ASYM/gs_20180403170745/GSP 1000 (Yeo 2011)_Full Set (Yeo 2011)/vat_seed_compound_fMRI_efield_func_seed_T.nii',
        'csv_path': '/Volumes/OneTouch/datasets/AD_dataset/ad_data.csv',
        'subj_col': 'Patient # CDR, ADAS',
        'indep_col': '% Change from baseline (ADAS-Cog11)',
        'covariate_col': {'age': 'Age at DOS', 'sex': '', 'diagnosis': 'alzheimer', 'overal_cognition': 'Baseline CDR (sum of squares)', 'lesion_size': ''}
    }
}

Generate the DataFrame

In [4]:
from calvin_utils.file_utils.csv_prep import CSVComposer
composer = CSVComposer(data_dict)
composer.compose_df()
display(composer.composed_df)

Processing dataset: adni_memory


processing subjects: 100%|██████████| 1388/1388 [00:00<00:00, 1402.72it/s]


Processing dataset: corbetta_memory


processing subjects: 100%|██████████| 178/178 [00:00<00:00, 1989.79it/s]


Processing dataset: grafmann_memory


processing subjects: 100%|██████████| 647/647 [00:00<00:00, 5518.69it/s]


Processing dataset: manitoba_memory


processing subjects: 100%|██████████| 59/59 [00:00<00:00, 2342.94it/s]


Processing dataset: kahana_memory


processing subjects: 100%|██████████| 108/108 [00:00<00:00, 1850.73it/s]


Processing dataset: queensland_memory


processing subjects: 100%|██████████| 60/60 [00:00<00:00, 42309.72it/s]


Processing dataset: sante_memory


processing subjects: 100%|██████████| 74/74 [00:00<00:00, 1290.03it/s]


Processing dataset: fornix_memory


KeyError: 'subid'

Save the CSV

In [None]:
output_csv_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/ccm_memory/metadata/master_list_v2.csv'

In [None]:
composer.save_csv(output_csv_path)

Save the Data Dict 

In [None]:
output_json_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/ccm_memory/metadata/data_dict.json'

In [None]:
composer.save_dict_as_json(output_json_path)

Enjoy 
- Calvin