## User Instructions
This is a program which will create R-Maps. These are used to correlate a voxel to a continuous outcome measure.
As it stands, this software employs Pearson Correlation Coefficients, which imply it will be best to have a continuous outcome on a percent scale. If you would like to do otherwise, a Spearman Correlation is possible. 

The software will walk you through everything. 

_____
# Nifti Configuration

**Files are expected to follow a BIDS naming convention.**

**Files are expected to have subject ID in them which is identical to subject ID in the CSV**

**Files are expected to be in 2x2x2 resolution**
_____
# CSV configuration:
**Subject IDs expected to be in the nifti names**

**Subject IDs expected to be in a column of your target CSV labelled "subject"**

Imports

In [1]:
import pandas as pd
import numpy as np

Save Information

-Enter the directory you would like to save to

In [2]:
out_dir = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/fourth_level/network_maps/uprds'

---

**Instructions**: Please fill out the `path` and `file_pattern` variables. 

The file_path is the shared base directory holding all files. ie) blah/blah/blah/BIDS

The file_pattern is the shared naming architectur in all files ie)  * / * / * subT1 * .nii

---

In [3]:
# What is the shared path to the folder/csv containing the nifti files/files paths for the neuroimaging files?
path_1 = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/third_level/vta_connectivity'

#What is the shared file architecture of your neuroimaging files after the base path?
file_pattern = '*_T*.nii*'

In [4]:
#-----------------DO NOT TOUCH--------------------------------------------------------
import os
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder
df_1 = import_matrices_from_folder(path_1, file_pattern=file_pattern)
df_1

I will search:  /Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/third_level/vta_connectivity/*_T*.nii*


Unnamed: 0,MDST43_seed_compound_fMRI_T.nii.gz,MDST04_seed_compound_fMRI_T.nii.gz,MDST12_seed_compound_fMRI_T.nii.gz,MDST01_seed_compound_fMRI_T.nii.gz,MDST28_seed_compound_fMRI_T.nii.gz,MDST17_seed_compound_fMRI_T.nii.gz,MDST18_seed_compound_fMRI_T.nii.gz,MDST31_seed_compound_fMRI_T.nii.gz,MDST27_seed_compound_fMRI_T.nii.gz,MDST34_seed_compound_fMRI_T.nii.gz,...,MDST03_seed_compound_fMRI_T.nii.gz,MDST15_seed_compound_fMRI_T.nii.gz,MDST37_seed_compound_fMRI_T.nii.gz,MDST21_seed_compound_fMRI_T.nii.gz,MDST24_seed_compound_fMRI_T.nii.gz,MDST02_seed_compound_fMRI_T.nii.gz,MDST14_seed_compound_fMRI_T.nii.gz,MDST07_seed_compound_fMRI_T.nii.gz,MDST38_seed_compound_fMRI_T.nii.gz,MDST11_seed_compound_fMRI_T.nii.gz
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
902624,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902626,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
902627,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**Extract Subject ID From File Names**
Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is " "
2) Proceeding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is "-mwp1"

In [5]:
from calvin_utils.file_utils.dataframe_utilities import extract_and_rename_subject_id

def preprocess_names(df, string_preceding_id, string_proceeding_id, cols=True):
    """
    Preprocess the given dataframe by extracting and renaming the subject ID, 
    then transposing the dataframe.

    Parameters:
    - df: The dataframe to preprocess.
    - string_preceding_id: String preceding the subject ID.
    - string_proceeding_id: String proceeding the subject ID.

    Returns:
    - The preprocessed dataframe.
    """
    split_command_dict = {string_preceding_id: 1, string_proceeding_id: 0}
    if cols:
        df = extract_and_rename_subject_id(dataframe=df, split_command_dict=split_command_dict).transpose()
    else:
        df = extract_and_rename_subject_id(dataframe=df, split_command_dict=split_command_dict)
    df.index.name = 'subject'
    return df


In [6]:
string_preceding_id = 'MDST'
string_proceeding_id = '_seed'

In [7]:
df_1 = preprocess_names(df_1, string_preceding_id, string_proceeding_id)
df_1

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,902619,902620,902621,902622,902623,902624,902625,902626,902627,902628
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
43,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
28,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
17,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
31,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
27,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
34,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Define the path to the CSV which has your clinical information

In [8]:
path_2 = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/metadata/STN_WU_BoSc_master.xlsx'
excel_sheet_name = 'first batch_anonymised' #Optional

In [9]:
# Import a CSV with the clinical data of interest
import pandas as pd
if os.path.basename(path_2).split('.')[1] == 'csv':
    df_2 = pd.read_csv(path_2)
else:
    df_2 = pd.read_excel(path_2, sheet_name=excel_sheet_name)
df_2

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,preop-UPDRS III (Med off),preop-UPDRS III (Med On),Dopa response before surgery,...,Stimulationsparameter STN links,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Stimulationsparameter STN rechts,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24
0,ID_sharedCohort_BosScSTN,Sex,disease duration at surgery (years),Age [years] at Implantation,Diagnosis,Hoehn-Yahr at Implantation,Followup after x months,total,total,,...,+,-,mA,us,Hz,+,-,mA,us,Hz
1,STN_WU_BoSc_01,m,12,57,PD,4,12,48,9,81.25,...,G,2,3.5,40,130,G,11,5.2,40,130
2,STN_WU_BoSc_02,f,9,50,PD,4,12,62,27,56.451613,...,G,3,3.4,40,149,13,12(70%) 14(30%),2,60,130
3,STN_WU_BoSc_03,f,12,62,PD,2,12,42,22,47.619048,...,G,2,3.4,40,130,G,11(50%) 10(50%),4,40,174
4,STN_WU_BoSc_04,m,18,50,PD,3,10,52,34,34.615385,...,G,4(50%) 5(50%),3.7,60,174,G,10,3.9,60,174
5,STN_WU_BoSc_05,m,7,60,PD,4,16,56,9,83.928571,...,G,2,4,30,130,G,11,3,20,174
6,STN_WU_BoSc_06,m,12,60,PD,3,12,49,20,59.183673,...,G,6,2.9,60,185,G,10,3.5,60,185
7,STN_WU_BoSc_07,m,10,73,PD,4,22,40,20,50.0,...,G,3(30%) 2(70%),4.1,60,174,G,10,4.9,60,174
8,STN_WU_BoSc_08,f,11,58,PD,3,12,44,10,77.272727,...,G,2,4,60,174,G,11,4.5,60,174
9,STN_WU_BoSc_09,m,5,64,PD,3,12,45,12,73.333333,...,G,3,3.1,60,130,G,12,4.8,60,130


Choose Specific Columns to Keep in the Second List
- Example: #df_2 = df_2.loc[:, ['Unnamed: 0', 'DBS response ratio']]

In [10]:
# df_2 = df_2.loc[:, ['Unnamed: 0', 'DBS response ratio']]

In [None]:
subject_column = 'Unnamed: 0'

In [12]:
string_preceding_id = 'STN_WU_BoSc_'
string_proceeding_id = ''

It is expected there is a columnc called which has subject information in it. The information in this column must correspond in the dataframe above. If it does not exist, add it to your CSV before proceeding. 

Define the column below using:

subject_colum = 

In [13]:
import os
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_r_map
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti

def set_subject_column_to_subject(df, subject_column):
    popped_column = df.pop(subject_column)
    df['subject'] = popped_column
    return df

def reset_index_if_subject_is_index(df, subject_column):
    if subject_column in df.index.names:
        df.reset_index(inplace=True)
        # Reorder columns to make 'subject' the first column
        cols = [subject_column] + [col for col in df.columns if col != subject_column]
        df = df[cols]
    return df

def process_and_generate_maps(df_1, df_2, subject_column='subject', out_dir='', string_preceding_id='', string_proceeding_id='', mask_path=None, method='pearson'):
    """
    Process the given dataframes, and generate maps based on the columns.

    Parameters:
    - df_1: First dataframe.
    - df_2: Second dataframe.
    - subject_column: The column name referring to the subject.
    - out_dir: The output directory to save the generated maps.
    """
    # Set DF_2 subject column to 'subject'
    df_2 = set_subject_column_to_subject(df_2, subject_column)
    
    # Set subject_column back to subject
    subject_column = 'subject'
    
    # Check if 'subject' is in the index or columns for df_1
    if subject_column in df_1.index.names:
        df_1 = reset_index_if_subject_is_index(df_1, subject_column)
    if subject_column in df_2.index.names:
        df_2 = reset_index_if_subject_is_index(df_2, subject_column)
        
    df_1[subject_column] = df_1[subject_column].astype(str)
    if all(df_2[subject_column].apply(lambda x: type(x) is str)):
        if string_proceeding_id != '':
            df_2[subject_column] = [name.split(string_proceeding_id)[0] for name in df_2[subject_column]]
        if string_preceding_id != '':
            df_2[subject_column] = [name.split(string_preceding_id)[1] for name in df_2[subject_column]]
        print('extracting subject ID')
    else:
        df_2[subject_column] = df_2[subject_column].astype(str)

    # Iterate over column, avoiding the one with subject id in it
    for colname in [col for col in df_2.columns if col != subject_column]:
        print(f'Working on {colname}')
        merged_df = df_2[[colname, subject_column]].merge(df_1, on=subject_column, how='inner').set_index(subject_column)
        
        # Remove any rows with NaN values
        copy_df = merged_df.copy()
        try:
            merged_df.dropna(inplace=True)
            r_df, p_df, r_squared_df = generate_r_map(merged_df, mask_path=mask_path, method=method)

            view_and_save_nifti(p_df, os.path.join(out_dir, 'p_map', colname))
            view_and_save_nifti(r_df, os.path.join(out_dir, 'r_map', colname))
            view_and_save_nifti(r_squared_df, os.path.join(out_dir, 'r_squared_map', colname))
        except Exception as e:
            if "x and y must have length at least 2" in str(e):
                print('Caught exception: NaNs or Infs suspected in input data. Trying workaround.')
                copy_df.replace([np.inf, -np.inf], np.nan, inplace=True)
                copy_df.fillna(0, inplace=True)
                r_df, p_df, r_squared_df = generate_r_map(copy_df, mask_path=mask_path)

                view_and_save_nifti(p_df, os.path.join(out_dir, 'p_map', colname))
                view_and_save_nifti(r_df, os.path.join(out_dir, 'r_map', colname))
                view_and_save_nifti(r_squared_df, os.path.join(out_dir, 'r_squared_map', colname))
            else:
                print(f'Error {e}')
    return merged_df


What is the name of the column that contains your subject labels?

In [16]:
import os
mask_path = None
merged_df = process_and_generate_maps(df_1.copy(), df_2.copy(), subject_column=subject_column, out_dir=out_dir, string_preceding_id=string_preceding_id, string_proceeding_id=string_proceeding_id, mask_path=mask_path, method='spearman')

extracting subject ID
Working on DBS response ratio
Dataframes have been masked such that their shapes are:  (26, 225222)


100%|██████████| 225222/225222 [01:31<00:00, 2465.21it/s]


Data has been unmasked to shape:  (1, 902629)
Data has been unmasked to shape:  (1, 902629)
Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/fourth_level/network_maps/uprds/p_map/DBS response ratio
Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/fourth_level/network_maps/uprds/r_map/DBS response ratio
Image saved to: 
 /Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/fourth_level/network_maps/uprds/r_squared_map/DBS response ratio


Your R-Maps have all been generated. Consider adding Calvin as a collaborator if this was useful!

-- Calvin

## Optional - Perform Delta R-Map and Permute it for Significance

**Calculate the Observed Delta-R Map Between 2 Populations**

In [17]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_delta_r_map
delta_matrix = merged_df.copy()
observed_delta_r_map = generate_delta_r_map(delta_matrix, threshold_of_interest=65, column_of_interest='Age at DOS')

KeyError: 'Age at DOS'

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(observed_delta_r_map, (out_dir+'/over_vs_under_65_delta_r_map'))

## Calculate the Empiric Delta-R Map Distribution 
### Note, this permutes the label of the population without permuting the neuroimaging data.
### Therefore, we are testing if the separation of the r-maps is significantly due to the variable of interest. 

In [None]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import permuted_patient_label_delta_r_map
from calvin_utils.file_utils.print_suppression import HiddenPrints
n_permutations = 2
column_of_interest = 'Age at DOS'
threshold_of_interest = 65
with HiddenPrints():
    p_count_df = permuted_patient_label_delta_r_map(dataframe_to_permute=merged_df, 
                                                observed_delta_r_map=observed_delta_r_map, 
                                                column_of_interest=column_of_interest, 
                                                threshold_of_interest=threshold_of_interest, 
                                                n_permutations=n_permutations)

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(p_values_df, (out_dir+'/over_vs_under_65_delta_r_map_p_values_df'))

Enjoy

--Calvin