## User Instructions
This is a program which will create R-Maps. These are used to correlate a voxel to a continuous outcome measure.
As it stands, this software employs Pearson Correlation Coefficients, which imply it will be best to have a continuous outcome on a percent scale. If you would like to do otherwise, a Spearman Correlation is possible. 

The software will walk you through everything. 

Files are expected to follow a BIDS naming convention. 
Subject IDs are expected in the input CSV and are expected to share the same naming convention as the nifti files themselves.

**Function Definitions**
Functions defined within noteook to avoid 

Imports

In [None]:
import pandas as pd
import numpy as np

Save Information

-Enter the directory you would like to save to

In [None]:
out_dir = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_AD_DBS_FORNIX/test'

---

**Instructions**: Please fill out the `path` and `file_pattern` variables. 

The file_path is the shared base directory holding all files. ie) blah/blah/blah/BIDS

The file_pattern is the shared naming architectur in all files ie)  * / * / * subT1 * .nii

---

In [None]:
# What is the shared path to the folder/csv containing the nifti files/files paths for the neuroimaging files?
path_1 = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_AD_DBS_FORNIX/connectivity_data/vta_published_t_connectivity'

#What is the shared file architecture of your neuroimaging files after the base path?
file_pattern = '*.nii'

In [None]:
#-----------------DO NOT TOUCH--------------------------------------------------------
import os
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder
df_1 = import_matrices_from_folder(path_1, file_pattern=file_pattern)
df_1

**Extract Subject ID From File Names**
Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is " "
2) Proceeding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is "-mwp1"

In [None]:
from calvin_utils.file_utils.dataframe_utilities import extract_and_rename_subject_id

def preprocess_df1(df_1, string_preceding_id, string_proceeding_id):
    """
    Preprocess the given dataframe by extracting and renaming the subject ID, 
    then transposing the dataframe.

    Parameters:
    - df_1: The dataframe to preprocess.
    - string_preceding_id: String preceding the subject ID.
    - string_proceeding_id: String proceeding the subject ID.

    Returns:
    - The preprocessed dataframe.
    """
    split_command_dict = {string_preceding_id: 1, string_proceeding_id: 0}
    df_1 = extract_and_rename_subject_id(dataframe=df_1, split_command_dict=split_command_dict).transpose()
    df_1.index.name = 'subject'
    display(df_1)
    return df_1


In [None]:
string_preceding_id = ' '
string_proceeding_id = '_vat'

In [None]:
df_1 = preprocess_df1(df_1, string_preceding_id, string_proceeding_id)

Define the path to the CSV which has your clinical information

In [None]:
path_2 = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_AD_DBS_FORNIX/clinical_analyses/ses-01/sub-all/all_data/all_metadata_spreadsheet.csv'

In [None]:
# Import a CSV with the clinical data of interest
df_2 = pd.read_csv(path_2)
df_2

It is expected there is a columnc called which has subject information in it. The information in this column must correspond in the dataframe above. If it does not exist, add it to your CSV before proceeding. 

Define the column below using:

subject_colum = 

In [None]:
import os
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_r_map
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti

def process_and_generate_maps(df_1, df_2, subject_column, out_dir):
    """
    Process the given dataframes, and generate maps based on the columns.

    Parameters:
    - df_1: First dataframe.
    - df_2: Second dataframe.
    - subject_column: The column name referring to the subject.
    - out_dir: The output directory to save the generated maps.
    """
    # Extract and set subject IDs from paths
    try:
        df_2['subject'] = df_2[subject_column].apply(lambda path: os.path.basename(path).split('_')[1])
    except:
        try:
            df_2['subject'] = df_2[subject_column].astype(str)
        except:
            print(f'There is something funky going on with the {subject_column} in your dataframe. \n I suggest evaluating and resolving it as it must match the subject column in df_1 subject-for-subject.')
            df_2['subject'] = df_2[subject_column]
    df_2.drop(subject_column, axis=1, inplace=True)

    # Iterate over columns starting from the second, as the first is the subject ID
    for colname in df_2.columns:
        print(f'Working on {colname}')
        merged_df = df_2[[colname, 'subject']].merge(df_1, on='subject', how='inner').set_index('subject')
        
        # Remove any rows with NaN values
        merged_df.dropna(inplace=True)

        r_df, p_df, r_squared_df = generate_r_map(merged_df, mask_path=None)

        view_and_save_nifti(p_df, os.path.join(out_dir, 'p_map', colname))
        view_and_save_nifti(r_df, os.path.join(out_dir, 'r_map', colname))
        view_and_save_nifti(r_squared_df, os.path.join(out_dir, 'r_squared_map', colname))
    return merged_df


In [None]:
import os
# What is the name of the column that contains your subject labels?
subject_column = 'subject_id'

In [None]:
merged_df = process_and_generate_maps(df_1, df_2, subject_column, out_dir)

Your R-Maps have all been generated. Consider adding Calvin as a collaborator if this was useful!

-- Calvin

## Optional - Perform Delta R-Map and Permute it for Significance

**Calculate the Observed Delta-R Map Between 2 Populations**

In [None]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_delta_r_map
delta_matrix = merged_df.copy()
observed_delta_r_map = generate_delta_r_map(delta_matrix, threshold_of_interest=65, column_of_interest='Age at DOS')

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(observed_delta_r_map, (out_dir+'/over_vs_under_65_delta_r_map'))

## Calculate the Empiric Delta-R Map Distribution 
### Note, this permutes the label of the population without permuting the neuroimaging data.
### Therefore, we are testing if the separation of the r-maps is significantly due to the variable of interest. 

In [None]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import permuted_patient_label_delta_r_map
from calvin_utils.file_utils.print_suppression import HiddenPrints
n_permutations = 2
column_of_interest = 'Age at DOS'
threshold_of_interest = 65
with HiddenPrints():
    p_count_df = permuted_patient_label_delta_r_map(dataframe_to_permute=merged_df, 
                                                observed_delta_r_map=observed_delta_r_map, 
                                                column_of_interest=column_of_interest, 
                                                threshold_of_interest=threshold_of_interest, 
                                                n_permutations=n_permutations)

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(p_values_df, (out_dir+'/over_vs_under_65_delta_r_map_p_values_df'))

Enjoy

--Calvin