## User Instructions
This is a program which will create R-Maps. These are used to correlate a voxel to a continuous outcome measure.
As it stands, this software employs Pearson Correlation Coefficients, which imply it will be best to have a continuous outcome on a percent scale. If you would like to do otherwise, a Spearman Correlation is possible. 

The software will walk you through everything. 

_____
# Nifti Configuration

**Files are expected to follow a BIDS naming convention.**

**Files are expected to have subject ID in them which is identical to subject ID in the CSV**

**Files are expected to be in 2x2x2 resolution**
_____
# CSV configuration:
**Subject IDs expected to be in the nifti names**

**Subject IDs expected to be in a column of your target CSV labelled "subject"**

Imports

In [None]:
import pandas as pd
import numpy as np

Save Information

-Enter the directory you would like to save to

In [None]:
out_dir = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/resources/published_networks/Alzheimer Cognition Maps'

---

**Instructions**: Please fill out the `path` and `file_pattern` variables. 

The file_path is the shared base directory holding all files. ie) blah/blah/blah/BIDS

The file_pattern is the shared naming architectur in all files ie)  * / * / * subT1 * .nii

---

In [None]:
# What is the shared path to the folder/csv containing the nifti files/files paths for the neuroimaging files?
path_1 = '/Volumes/Expansion/datasets/adni/neuroimaging/all_patients_atrophy_csfgm_connectivity/sub-*+cerebrospinalufluid/connectivity'

#What is the shared file architecture of your neuroimaging files after the base path?
file_pattern = 'sub-*+cerebrospinalufluid_tome-GSP1000uMF_space-2mm_stat-t_conn.nii.gz'

In [None]:
#-----------------DO NOT TOUCH--------------------------------------------------------
import os
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder
df_1 = import_matrices_from_folder(path_1, file_pattern=file_pattern, subject_id_index=5)
df_1

**Extract Subject ID From File Names**
Using the example filenames that have been printed above, please define a general string:
1) Preceding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is " "
2) Proceeding the subject ID. For example in 04-mwp1glanat_resampled.nii, this is "-mwp1"

In [None]:
from calvin_utils.file_utils.dataframe_utilities import extract_and_rename_subject_id

def preprocess_names(df, string_preceding_id, string_proceeding_id, cols=True):
    """
    Preprocess the given dataframe by extracting and renaming the subject ID, 
    then transposing the dataframe.

    Parameters:
    - df: The dataframe to preprocess.
    - string_preceding_id: String preceding the subject ID.
    - string_proceeding_id: String proceeding the subject ID.

    Returns:
    - The preprocessed dataframe.
    """
    split_command_dict = {string_preceding_id: 1, string_proceeding_id: 0}
    if cols:
        df = extract_and_rename_subject_id(dataframe=df, split_command_dict=split_command_dict).transpose()
    else:
        df = extract_and_rename_subject_id(dataframe=df, split_command_dict=split_command_dict)
    df.index.name = 'subject'
    return df


In [None]:
string_preceding_id = 'neuroimaging_sub-subh'
string_proceeding_id = 'ugreyumatter'

In [None]:
df_1 = preprocess_names(df_1, string_preceding_id, string_proceeding_id)
df_1

handle NaNs if you'd like

In [None]:
import numpy as np
df_1[:] = np.nan_to_num(df_1.values, nan=0, posinf=30, neginf=-30)


Define the path to the CSV which has your clinical information

In [None]:
path_2 = '/Volumes/Expansion/datasets/adni/metadata/updated_master_list/adas_cog_1year.csv'
excel_sheet_name = None #Optional

In [None]:
# Import a CSV with the clinical data of interest
import pandas as pd
if os.path.basename(path_2).split('.')[1] == 'csv':
    df_2 = pd.read_csv(path_2)
else:
    df_2 = pd.read_excel(path_2, sheet_name=excel_sheet_name)
df_2

Drop Nans

In [None]:
df_2.dropna(inplace=True)

In [None]:
df_2.columns

Choose Specific Columns to Keep in the Second List
- Example: #df_2 = df_2.loc[:, ['Unnamed: 0', 'DBS response ratio']]

In [None]:
list_of_cols_to_keep = ['subject', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6',
       'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q14', 'Total', 'TotalMOD']

In [None]:
df_2 = df_2.loc[:, list_of_cols_to_keep]

Fix Subject Names

In [None]:
def set_subject_column_to_subject(df, subject_column, string_preceding_id='', string_proceeding_id='',):
    popped_column = df.pop(subject_column)
    df['subject'] = popped_column
    
    if all(df[subject_column].apply(lambda x: type(x) is str)):
        if string_proceeding_id != '':
            df[subject_column] = [name.split(string_proceeding_id)[0] for name in df[subject_column]]
        if string_preceding_id != '':
            df[subject_column] = [name.split(string_preceding_id)[1] for name in df[subject_column]]
        print('extracting subject ID')
    else:
        df[subject_column] = df[subject_column].astype(str)
    return df

In [None]:
subject_column = 'subject'

In [None]:
string_preceding_id = ''
string_proceeding_id = ''

In [None]:
df_2

It is expected there is a columnc called which has subject information in it. The information in this column must correspond in the dataframe above. If it does not exist, add it to your CSV before proceeding. 

Define the column below using:

subject_colum = 

In [59]:
import os
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_r_map
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti

def reset_index_if_subject_is_index(df, subject_column):
    if subject_column in df.index.names:
        df.reset_index(inplace=True)
        # Reorder columns to make 'subject' the first column
        cols = [subject_column] + [col for col in df.columns if col != subject_column]
        df = df[cols]
    return df

def process_and_generate_maps(df_1, df_2, subject_column='subject', out_dir='', mask_path=None, method='pearson'):
    """
    Process the given dataframes, and generate maps based on the columns.

    Parameters:
    - df_1: First dataframe.
    - df_2: Second dataframe.
    - subject_column: The column name referring to the subject.
    - out_dir: The output directory to save the generated maps.
    """
    # Check if 'subject' is in the index or columns for df_1
    if subject_column in df_1.index.names:
        df_1 = reset_index_if_subject_is_index(df_1, subject_column)
    if subject_column in df_2.index.names:
        df_2 = reset_index_if_subject_is_index(df_2, subject_column)
        
    # Process subject column to string    
    df_1[subject_column] = df_1[subject_column].astype(int)
    df_2[subject_column] = df_2[subject_column].astype(int)
    df_1.fillna(0)
    
    # Iterate over column, avoiding the one with subject id in it
    for colname in [col for col in df_2.columns if col != subject_column]:
        print(f'Working on {colname}')
        merged_df = df_2[[colname, subject_column]].merge(df_1, on=subject_column, how='inner').set_index(subject_column)
        
        # Remove any rows with NaN values
        copy_df = merged_df.copy()
        try:
            merged_df.dropna(inplace=True)
            r_df, p_df, r_squared_df = generate_r_map(merged_df, mask_path=mask_path, method=method)

            view_and_save_nifti(p_df, os.path.join(out_dir, 'p_map', colname))
            view_and_save_nifti(r_df, os.path.join(out_dir, 'r_map', colname))
            view_and_save_nifti(r_squared_df, os.path.join(out_dir, 'r_squared_map', colname))
        except Exception as e:
            if "x and y must have length at least 2" in str(e):
                print('Caught exception: NaNs or Infs suspected in input data. Trying workaround.')
                copy_df.replace([np.inf, -np.inf], np.nan, inplace=True)
                copy_df.fillna(0, inplace=True)
                r_df, p_df, r_squared_df = generate_r_map(copy_df, mask_path=mask_path)

                view_and_save_nifti(p_df, os.path.join(out_dir, 'p_map', colname))
                view_and_save_nifti(r_df, os.path.join(out_dir, 'r_map', colname))
                view_and_save_nifti(r_squared_df, os.path.join(out_dir, 'r_squared_map', colname))
            else:
                print(f'Error: {e}')
    return merged_df


Run it

In [60]:
df_2 = set_subject_column_to_subject(df_2, subject_column=subject_column, string_preceding_id=string_preceding_id, string_proceeding_id=string_proceeding_id)
merged_df = process_and_generate_maps(df_1.copy(), df_2.copy(), 
                                      subject_column=subject_column, 
                                      out_dir=out_dir, 
                                      mask_path='/Users/cu135/hires_backdrops/MNI/MNI152_T1_2mm_brain_mask.nii', 
                                      method='spearman')

extracting subject ID


ValueError: invalid literal for int() with base 10: '002uSu0295'

Your R-Maps have all been generated. Consider adding Calvin as a collaborator if this was useful!

-- Calvin

## Optional - Perform Delta R-Map and Permute it for Significance

**Calculate the Observed Delta-R Map Between 2 Populations**

In [None]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_delta_r_map
delta_matrix = merged_df.copy()
observed_delta_r_map = generate_delta_r_map(delta_matrix, threshold_of_interest=65, column_of_interest='Age at DOS')

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(observed_delta_r_map, (out_dir+'/over_vs_under_65_delta_r_map'))

## Calculate the Empiric Delta-R Map Distribution 
### Note, this permutes the label of the population without permuting the neuroimaging data.
### Therefore, we are testing if the separation of the r-maps is significantly due to the variable of interest. 

In [None]:
from calvin_utils.statistical_utils.voxelwise_statistical_testing import permuted_patient_label_delta_r_map
from calvin_utils.file_utils.print_suppression import HiddenPrints
n_permutations = 2
column_of_interest = 'Age at DOS'
threshold_of_interest = 65
with HiddenPrints():
    p_count_df = permuted_patient_label_delta_r_map(dataframe_to_permute=merged_df, 
                                                observed_delta_r_map=observed_delta_r_map, 
                                                column_of_interest=column_of_interest, 
                                                threshold_of_interest=threshold_of_interest, 
                                                n_permutations=n_permutations)

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(p_values_df, (out_dir+'/over_vs_under_65_delta_r_map_p_values_df'))

# 03 - Spatial Correlation of R-Maps

**Prepare a Second Set of Data**

Set Neuroimaging Path Information

In [None]:
# What is the shared path to the folder/csv containing the nifti files/files paths for the neuroimaging files?
path_1 = '/Users/cu135/Dropbox (Partners HealthCare)/resources/datasets/BIDS_PD_DBS_STN_WURZBURG/derivatives/third_level/vta_connectivity'

#What is the shared file architecture of your neuroimaging files after the base path?
file_pattern = '*fMRI_T.nii*'

In [None]:
#-----------------DO NOT TOUCH--------------------------------------------------------
import os
from calvin_utils.file_utils.import_matrices import import_matrices_from_folder
df_1B = import_matrices_from_folder(path_1, file_pattern=file_pattern, subject_id_index=5)
df_1B

Clean subject names

In [None]:
string_preceding_id = 'datasets_MDST'
string_proceeding_id = '_seed'

In [None]:
df_1B = preprocess_names(df_1B, string_preceding_id, string_proceeding_id)
df_1B

Set Clinical CSV Information

In [None]:
path_2 = '/Users/cu135/Dropbox (Partners HealthCare)/studies/cognition_2023/metadata/master_list_proper_subjects.xlsx'
excel_sheet_name = 'master_list_proper_subjects' #Optional

In [None]:
# Import a CSV with the clinical data of interest
import pandas as pd
if os.path.basename(path_2).split('.')[1] == 'csv':
    df_2B = pd.read_csv(path_2)
else:
    df_2B = pd.read_excel(path_2, sheet_name=excel_sheet_name)
df_2B = df_2B[df_2B['City']=='Wurzburg']

In [None]:
df_2B

In [None]:
df_2B = df_2B.loc[:, ['subject', 'Z-Scored Percent Cognitive Improvement']]

In [None]:
subject_column = 'subject'
string_preceding_id = ''
string_proceeding_id = ''

Process df_2 subject names

In [None]:
df_2B = set_subject_column_to_subject(df_2B, subject_column=subject_column, string_preceding_id=string_preceding_id, string_proceeding_id=string_proceeding_id)
df_2B

**Run the Spatial Correlatio Function**

In [None]:
import numpy as np
import scipy.stats
from tqdm import tqdm
import sys
from contextlib import contextmanager
from calvin_utils.statistical_utils.voxelwise_statistical_testing import generate_r_map

# # Define a dummy tqdm function
# def dummy_tqdm(*args, **kwargs):
#     if 'iterable' in kwargs:
#         return kwargs['iterable']
#     return args[0] if args else range(0)

# @contextmanager
# def suppress_print():
#     # Save the original tqdm and stdout
#     original_tqdm = tqdm
#     original_stdout = sys.stdout

#     # Replace tqdm with the dummy function and stdout with a null device
#     tqdm = dummy_tqdm
#     sys.stdout = open(os.devnull, 'w')

#     try:
#         yield
#     finally:
#         # Restore the original tqdm and stdout
#         tqdm = original_tqdm
#         sys.stdout = original_stdout
        
@contextmanager
def suppress_print():
    # Open a null device
    with open(os.devnull, 'w') as devnull:
        # Save the current stdout
        old_stdout = sys.stdout
        # Redirect the current stdout to the null device
        sys.stdout = devnull
        try:
            # Yield back to the calling function
            yield
        finally:
            # Restore the original stdout
            sys.stdout = old_stdout

class SpatialCorrelRMaps:
    def __init__(self, df_1, df_2, df_1B, df_2B, subject_column='subject', out_dir=None, mask_path=None, method='pearson'):
        self.df_1 = df_1
        self.df_2 = df_2
        self.df_1B = df_1B
        self.df_2B = df_2B
        self.subject_column = subject_column
        self.out_dir = out_dir
        self.mask_path = mask_path
        self.method = method


    def spatial_correlation(self, r_map1, r_map2):
        r_map1 = np.nan_to_num(r_map1.to_numpy().flatten(), neginf=1, posinf=0, nan=0)
        r_map2 = np.nan_to_num(r_map2.to_numpy().flatten(), neginf=1, posinf=0, nan=0)
        correlation_coefficient, _ = scipy.stats.pearsonr(r_map1, r_map2)
        return correlation_coefficient

    def permute_subjects(self, df):
        df_permuted = df.copy()
        df_permuted[self.subject_column] = np.random.permutation(df[self.subject_column])
        return df_permuted
    
    def r_map(self, df_1, df_2):
        """
        Process the given dataframes, and generate maps based on the columns.

        Parameters:
        - df_1: First dataframe.
        - df_2: Second dataframe.
        - subject_column: The column name referring to the subject.
        - out_dir: The output directory to save the generated maps.
        """
        # Check if 'subject' is in the index or columns for df_1
        if subject_column in df_1.index.names:
            df_1 = reset_index_if_subject_is_index(df_1, subject_column)
        if subject_column in df_2.index.names:
            df_2 = reset_index_if_subject_is_index(df_2, subject_column)
            
        # Process subject column to string    
        df_1[subject_column] = df_1[subject_column].astype(str)
        df_2[subject_column] = df_2[subject_column].astype(str)
        
        # Iterate over column, avoiding the one with subject id in it
        for colname in [col for col in df_2.columns if col != subject_column]:
            merged_df = df_2[[colname, subject_column]].merge(df_1, on=subject_column, how='inner').set_index(subject_column)
            
            # Remove any rows with NaN values
            copy_df = merged_df.copy()
            try:
                merged_df.dropna(inplace=True)
                with suppress_print():
                    r_df, _, _ = generate_r_map(merged_df, mask_path=self.mask_path, method=self.method, tqdm_on=False)
            except Exception as e:
                if "x and y must have length at least 2" in str(e):
                    print('Caught exception: NaNs or Infs suspected in input data. Trying workaround.')
                    copy_df.replace([np.inf, -np.inf], np.nan, inplace=True)
                    copy_df.fillna(0, inplace=True)
                    r_df, _, _ = generate_r_map(copy_df, mask_path=self.mask_path, tqdm_on=False)
                else:
                    print(f'Error {e}')
        return r_df

    def observed_distribution(self):
        r_map1 = self.r_map(self.df_1, self.df_2)
        r_map2 = self.r_map(self.df_1B, self.df_2B)
        observed_corr = self.spatial_correlation(r_map1, r_map2)
        return observed_corr

    def empiric_distribution(self, n_permutations):
        empiric_corrs = []
        for _ in tqdm(range(n_permutations)):
            permuted_df_2 = self.permute_subjects(self.df_2)
            permuted_df_2B = self.permute_subjects(self.df_2B)
            r_map1 = self.r_map(self.df_1, permuted_df_2)
            r_map2 = self.r_map(self.df_1B, permuted_df_2B)
            corr = self.spatial_correlation(r_map1, r_map2)
            empiric_corrs.append(corr)
        return empiric_corrs
    
    def p_value(self, observed_corr, empiric_corrs):
        """
        Calculate the p-value for the observed spatial correlation.

        Parameters:
            observed_corr (float): The observed spatial correlation coefficient.
            empiric_corrs (list): A list of spatial correlation coefficients from permuted data.

        Returns:
            float: The p-value representing the statistical significance of the observed correlation.
        """
        # Count how many empiric correlations are greater than or equal to the observed correlation
        count_greater = sum(emp_corr >= observed_corr for emp_corr in empiric_corrs)

        # Calculate the p-value (proportion of empiric correlations greater than or equal to the observed)
        p_val = count_greater / len(empiric_corrs)
        return p_val

    def run(self, n_permutations=100):
        """
        Execute the entire process of calculating spatial correlations and p-value.

        Parameters:
            n_permutations (int): Number of permutations for the empirical distribution calculation.

        Returns:
            tuple: A tuple containing the observed correlation, the empirical correlations, and the p-value.
        """
        # Calculate the observed spatial correlation
        observed_corr = self.observed_distribution()

        # Calculate the empirical distribution of spatial correlations from permuted data
        empiric_corrs = self.empiric_distribution(n_permutations)

        # Calculate the p-value for the observed spatial correlation
        p_val = self.p_value(observed_corr, empiric_corrs)
        print(f"Observed Correlation: {observed_corr}")
        print(f"P-Value: {p_val}")
        return observed_corr, empiric_corrs, p_val


    # Placeholder for any additional methods, such as p-value calculation or utility functions


Run It/

In [None]:
# Assuming df_1, df_2, df_1B, df_2B are predefined DataFrames
# Initialize the SpatialCorrelRMaps instance
spatial_correl = SpatialCorrelRMaps(df_1, df_2, df_1B, df_2B, 
                                    subject_column='subject', 
                                    out_dir=out_dir, 
                                    mask_path=None, 
                                    method='pearson')
obsv, empir, pval = spatial_correl.run()

In [None]:
spatial_correl.p_value

# 05 - FWE Corrected R Map 

Import Covariates

In [None]:
input_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/metadata/experiment_metadata/q4_regression.csv'
out_dir = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/Figures/r_maps_to_praxis'

In [None]:
data_df = pd.read_csv(input_csv_path, index_col=0)
data_df = data_df.dropna(axis=1)
data_df

Pick the columns to keep

Import Niftis

In [None]:
import_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/smoothed_atrophy_seeds'
file_target = '*/*/unthresholded_tissue_segment_z_scores/*cerebrospinal_fluid_generated_nifti_no*'

In [None]:
from calvin_utils.file_utils.import_functions import GiiNiiFileImport
giinii = GiiNiiFileImport(import_path=import_path, file_column=None, file_pattern=file_target)
nimg_df = giinii.run()
nimg_df

Fix names

In [None]:
pre = 'sub-'
post = '_cerebro'

In [None]:
nimg_df = giinii.splice_colnames(nimg_df, pre, post)
nimg_df

In [None]:
import numpy as np
import pandas as pd
from tqdm import tqdm
import nibabel as nib
from typing import Tuple
from sklearn.linear_model import LinearRegression
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti

class CalvinFWEMap():
    """
    This is a class to orchestrate a simple association between some Y variable of interest and voxelwise data (X variable)
    It will run FWE correction via the Maximum Statistic Correction method. 
    """
    def __init__(self, neuroimaging_dataframe: pd.DataFrame, variable_dataframe: pd.DataFrame, method: str='spearman', mask_path=None, mask_threshold: int=0.0, out_dir=''):
        """
        Need to provide the dataframe dictionaries and dataframes of importance. 
        
        Args:
        - neuroimaging_dataframe (df): DF with neuroimaging data (voxelwise dataframe) column represents represents a subject,
                                        and each row represents a voxel.
        - variable_dataframe (pd.DataFrame): DataFrame where each column represents represents a subject,
                                        and each row represents the variable to regress upon. 
        - method (str): the association method to relate the voxelwise data to. Defaults to spearman correlation
                                        options: spearman | pearson | regression
        - mask_path (str): the path to the mask you want to use. 
                                        If None, will threshold the voxelwise image itself by mask_threshold.
        - mask_threshold (int): The threshold to mask the neuroimaging data at.
        """
        self.method = method
        self.mask_path = mask_path
        self.mask_threshold = mask_threshold
        neuroimaging_dataframe, self.variable_dataframe = self.sort_dataframes(covariate_df=variable_dataframe, voxel_df=neuroimaging_dataframe)
        self.original_mask, self.nonzero_mask, self.neuroimaging_dataframe = self.mask_dataframe(neuroimaging_dataframe)
        self.out_dir = out_dir

        
    def sort_dataframes(self, voxel_df: pd.DataFrame, covariate_df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
        """
        Will sort the rows of the voxelwise DF and the covariate DF to make sure they are identically organized.
        Then will check that the columns are equivalent. 
        """
        # Force Columns to Match
        voxel_cols = set(voxel_df.columns.astype(str).sort_values().values)
        covariate_cols = set(covariate_df.columns.astype(str).sort_values().values)
        shared_columns = list(voxel_cols.intersection(covariate_cols))
        
        # This will occur when columns have strange naming, such as subject 1 being 0001 verus 1. 
        if len(shared_columns) == 0:
            voxel_cols = voxel_df.columns.astype(int).astype(str).sort_values().values
            covariate_cols = covariate_df.columns.astype(int).astype(str).sort_values().values
            
            voxel_df.columns = voxel_cols
            covariate_df.columns = covariate_cols
            
            shared_columns = list(set(voxel_cols).intersection(set(covariate_cols)))
            
        return voxel_df.loc[:, shared_columns], covariate_df.loc[:, shared_columns]
    
    def threshold_probabilities(self, df: pd.DataFrame) -> pd.Series:
        """
        Apply a threshold to mask raw voxelwise data. 
        Finds all voxels which are nonzero across all rows and create a mask from them. 
        
        Parameters:
        df (pd.DataFrame): DataFrame with voxelwise data.
        
        Returns:
        pd.Series: Mask of nonzero voxels.
        """
        if self.mask_path is not None: 
            mask_data = nib.load(self.mask_path).get_fdata()
            mask_data = pd.DataFrame(mask_data, index=df.index, columns=df.columns)
            mask_data = mask_data.where(df > self.mask_threshold, 0)
        else:
            mask_data = df.where(df > self.mask_threshold, 0)

        mask = mask_data.sum(axis=1) > 0
        return mask
    
    def mask_dataframe(self, neuroimaging_df: pd.DataFrame):
        """
        Apply a mask to the neuroimaging DataFrame based on nonzero voxels.
        
        Parameters:
        neuroimaging_df (pd.DataFrame): DataFrame with neuroimaging data.
        
        Returns:
        pd.Index: Index of the whole DataFrame.
        pd.Series: Mask of nonzero voxels.
        pd.DataFrame: Masked neuroimaging DataFrame.
        """
        # Now you can use the function to apply a threshold to patient_df and control_df
        mask = self.threshold_probabilities(neuroimaging_df)
        
        original_mask = neuroimaging_df.index
        masked_neuroimaging_df = neuroimaging_df.loc[mask, :]
        return original_mask, mask, masked_neuroimaging_df
    
    def unmask_dataframe(self, df:pd.DataFrame):
        """
        Simple unmasking function.
        """
        # Initialize a new DF
        empty_mask = pd.DataFrame(index=self.original_mask, columns=['voxels'], data=0)

        # Insert data into the DF 
        empty_mask.loc[self.nonzero_mask, :] = df.values.reshape(-1, 1)
        return empty_mask
    
    def mask_by_p_values(self, results_df:pd.DataFrame, p_values_df:pd.DataFrame):
        """Simple function to perform the thresholding by FWE corrected p-values"""
        unmasked_df = results_df.copy()
        
        mask = p_values_df.where(p_values_df < 0.05, 0)
        mask = mask.sum(axis=1) == 0
        
        unmasked_df.loc[mask, :] = 0
        return unmasked_df
    
    def permute_covariates(self):
        """Permute the patient data by randomly assigning patient data (columnar data) to new patients (columns)"""
        return self.variable_dataframe.sample(frac=1, axis=1, random_state=None)
    
    def linear_regression(self, permuted_variable_df: pd.DataFrame=None, use_intercept: bool=True, debug: bool=False) -> pd.DataFrame:
        """
        Calculate voxelwise relationship to Y variable with linear regression.
        It is STRONGLY advised to set mask=True when running this.

        This function performs a linear regression using sklearn's LinearRegression
        The regression is done once across all voxels simultaneously, 
        treating each voxel's values across subjects as independent responses. 
        This vectorized approach efficiently handles the calculations by leveraging matrix operations, 
        which are computationally optimized in libraries like numpy and sklearn.

        Args:
            use_intercept (bool): if true, will add intercept to the regression
            debug (bool): if true, prints out summary metrics

        Returns:
            pd.DataFrame:
        """
        # Design matrix X for control group, outcomes Y for control group
        if permuted_variable_df is not None:
            X = permuted_variable_df.T
        else:
            X = self.variable_dataframe.T 
        Y = self.neuroimaging_dataframe.T.values
        
        # Fit model on control data across all voxels
        model = LinearRegression(fit_intercept=use_intercept)
        model.fit(X, Y)

        # Predict on experimental group and calculate R-squared
        Y_HAT = model.predict(X)
        Y_BAR = np.mean(Y, axis=0, keepdims=True)
        SSE = np.sum( (Y_HAT - Y_BAR)**2, axis=0)
        SST = np.sum( (Y     - Y_BAR)**2, axis=0)
        R2 = SSE/SST
 
        if debug:
            print(X.shape, Y.shape, Y_HAT.shape, Y_BAR.shape, SSE.shape, SST.shape, R2.shape)
            print('Observed R2 max: ', np.max(R2))
            
        # Reshape R2 to DataFrame format
        R2_df = pd.DataFrame(R2.T, index=self.neuroimaging_dataframe.index, columns=['R2'])
        return R2_df
    
    def maximum_stat_fwe(self, n_permutations=100, debug=False):
        """
        Perform maximum statistic Family-Wise Error (FWE) correction using permutation testing.

        This method calculates the maximum voxelwise R-squared values across multiple permutations
        of the covariates. It then uses these maximum statistics to correct for multiple comparisons,
        ensuring robust and conservative statistical inference.

        Args:
            n_permutations (int): Number of permutations to perform. Defaults to 100.

        Returns:
            list: A list of maximum R-squared values from each permutation.
        """
        max_stats = []
        for i in tqdm(range(0, n_permutations), desc='Permuting'):
            permuted_covariates = self.permute_covariates()
            permuted_R2_df = self.linear_regression(permuted_covariates, debug=False)
            max_stat = np.max(permuted_R2_df)
            max_stats.append(max_stat)
            if debug:
                print('Permutation max stat: ', max_stat)
        return max_stats
            
    def p_value_calculation(self, uncorrected_df, max_stat_dist, debug=False):
        """
        Calculate p-values for the uncorrected statistic values using the distribution of maximum statistics.

        Args:
            uncorrected_df (pd.DataFrame): DataFrame of uncorrected statistic values.
            max_stat_dist (list): Distribution of maximum statistic values from each permutation.

        Returns:
            np.ndarray: Array of p-values corresponding to the uncorrected statistic values.
        """
        max_stat_dist = np.array(max_stat_dist)
        max_stat_dist = max_stat_dist[:, np.newaxis]
        p_values = np.mean(max_stat_dist >= uncorrected_df.values, axis=0)
        p_values_df = uncorrected_df.copy()
        p_values_df.loc[:,:] =p_values
        if debug:
            print(p_values_df.shape)
        return p_values_df

    def save_single_nifti(self, nifti_df, out_dir, name='generated_nifti', silent=True):
        """Saves NIFTI images to directory."""
        preview = view_and_save_nifti(matrix=nifti_df,
                            out_dir=out_dir,
                            output_name=name,
                            silent=silent)
        return preview
        
    def save_results(self, voxelwise_results, unmasked_p_values, voxelwise_results_fwe):
        """
        Saves the generated files. 
        """
        self.uncorrected_img = self.save_single_nifti(nifti_df=voxelwise_results, out_dir=self.out_dir, name='uncorrected_results', silent=False)
        self.p_img = self.save_single_nifti(nifti_df=unmasked_p_values, out_dir=self.out_dir, name='p_values', silent=False)
        self.corrected_img = self.save_single_nifti(nifti_df=voxelwise_results_fwe, out_dir=self.out_dir, name='fwe_corrected_results', silent=False)

    def run(self, n_permutations=100, debug=False):
        """
        Orchestration method. 
        """
        #Can be abstracted to run the analysis of choice and return it and the p-values
        voxelwise_results = self.linear_regression(debug=debug)
        max_stat_dist = self.maximum_stat_fwe(n_permutations=n_permutations, debug=debug)
        p_values = self.p_value_calculation(voxelwise_results, max_stat_dist, debug=debug)
        # 
        voxelwise_results = self.unmask_dataframe(voxelwise_results)
        unmasked_p_values = self.unmask_dataframe(p_values)
        voxelwise_results_fwe = self.mask_by_p_values(results_df=voxelwise_results, p_values_df=unmasked_p_values)
        self.save_results(voxelwise_results, unmasked_p_values, voxelwise_results_fwe)
        if debug:
            print(np.max(voxelwise_results), np.max(unmasked_p_values), np.max(voxelwise_results_fwe))
            print(voxelwise_results.shape, unmasked_p_values.shape, voxelwise_results_fwe.shape)

In [None]:
calvin_fwe = CalvinFWEMap(neuroimaging_dataframe=nimg_df, variable_dataframe=data_df, mask_threshold=0, out_dir=out_dir)
calvin_fwe.run(n_permutations=1000, debug=False)

In [None]:
calvin_fwe.corrected_img

In [None]:
# calvin_fwe.save_results(out_dir=out_dir)
calvin_fwe.uncorrected_img

In [None]:
np.max(calvin_fwe.voxelwise_results)

Enjoy

--Calvin