# Run Any Kind of Voxelwise Permutation Test (Regression, Correlation, etc.)

### Authors: Calvin Howard.

#### Last updated: June 1, 2024

- Only run this if you are: 
    - on a server and have the files available on the server
    - on a computer strong enough to run the assessment of interest

# 00 - Import CSV with All Data
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

Prep Output Direction

In [None]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Dropbox (Partners HealthCare)/studies/voxelwise_lin_reg'

Import Data

In [None]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/voxelwise_lin_reg/experimental_group_master_list.csv'
sheet = None

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet=sheet)
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()


# 01 - Preprocess Your Data

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [None]:
data_df.columns

In [None]:
drop_list = ['Age', 'Q4']

In [None]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

In [None]:
data_df.columns

Set the parameters for dropping rows

In [None]:
# column = 'City'  # The column you'd like to evaluate
# condition = 'not'  # The condition to check ('equal', 'above', 'below', 'not')
# value = 'Toronto' # The value to drop if found

In [None]:
# data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
# display(data_df)

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [None]:
# # Remove anything you don't want to standardize
# cols_not_to_standardize = None # ['Z_Scored_Percent_Cognitive_Improvement_By_Origin_Group', 'Z_Scored_Subiculum_T_By_Origin_Group_'] #['Age']

In [None]:
# data_df = cal_palm.standardize_columns(cols_not_to_standardize)
# data_df

In [None]:
# data_df.columns

# 02 - Define Your Formula

This is the formula relating outcome to predictors, and takes the form:
- y = B0 + B1 + B2 + B3 + . . . BN

It is defined using the columns of your dataframe instead of the variables above:
- 'Apples_Picked ~ hours_worked + owns_apple_picking_machine'

____
**ANOVA**
- Tests differences in means for one categorical variable.
- formula = 'Outcome ~ C(Group1)'

**2-Way ANOVA**
- Tests differences in means for two categorical variables without interaction.
- formula = 'Outcome ~ C(Group1) + C(Group2)'

**2-Way ANOVA with Interaction**
- Tests for interaction effects between two categorical variables.
- formula = 'Outcome ~ C(Group1) * C(Group2)'

**ANCOVA**
- Similar to ANOVA, but includes a covariate to control for its effect.
- formula = 'Outcome ~ C(Group1) + Covariate'

**2-Way ANCOVA**
- Extends ANCOVA with two categorical variables and their interaction, controlling for a covariate.
- formula = 'Outcome ~ C(Group1) * C(Group2) + Covariate'

**Multiple Regression**
- Assesses the impact of multiple predictors on an outcome.
- formula = 'Outcome ~ Predictor1 + Predictor2'

**Simple Linear Regression**
- Assesses the impact of a single predictor on an outcome.
- formula = 'Outcome ~ Predictor'

**MANOVA**
- Assesses multiple dependent variables across groups.
- Note: Not typically set up with a formula in statsmodels. Requires specialized functions.

____
Use the printout below to design your formula. 
- Left of the "~" symbol is the thing to be predicted. 
- Right of the "~" symbol are the predictors. 
- ":" indicates an interaction between two things. 
- "*" indicates and interactions AND it accounts for the simple effects too. 
- "+" indicates that you want to add another predictor. 

In [None]:
data_df.columns

In [None]:
formula = "Q4 ~ CSF_Z6_PATH + Age + Sex"

# 02 - Visualize Your Design Matrix

This is the explanatory variable half of your regression formula
_______________________________________________________
Create Design Matrix: Use the create_design_matrix method. You can provide a list of formula variables which correspond to column names in your dataframe.

- voxelwise_variable = name of the variable in your formula which contains nifti paths.
- By default, an intercept will be added unless you set intercept=False
- **don't explicitly add the 'intercept' column. I'll do it for you.**

In [None]:
voxelwise_variable='CSF_Z6_PATH'

In [None]:
# Define the design matrix
outcome_df, design_matrix_df = cal_palm.define_design_matrix(formula, data_df, voxelwise_variable=voxelwise_variable)
design_matrix_df

# 03 - Visualize Your Dependent Variable

I have generated this for you based on the formula you provided

In [None]:
outcome_df

# 04 - Generate Dataframes

In [None]:
from calvin_utils.file_utils.dataframe_utilities import save_design_matrix_to_csv
design_matrix_path = save_design_matrix_to_csv(design_matrix_df, out_dir = (out_dir+"/server_prep"))
print(design_matrix_path)

In [None]:
from calvin_utils.file_utils.import_functions import GiiNiiFileImport
GiiNii = GiiNiiFileImport(import_path=design_matrix_path, file_column=voxelwise_variable, file_pattern=None)
voxelwise_df = GiiNii.run()
voxelwise_df

Mask

In [None]:
mask, mask_ind, voxelwise_df = GiiNii.mask_dataframe(voxelwise_df)
voxelwise_df

# 05 - Save Data For Access by Script

In [None]:
from calvin_utils.file_utils.dataframe_utilities import save_dataframes_to_csv

where_to_save = f"{out_dir}/server_prep"
#----------------------------------------------------------------
df_paths_dict = save_dataframes_to_csv(outcome_dfs = [outcome_df], 
                                       covariate_dfs = [design_matrix_df.drop(voxelwise_variable, axis=1)],
                                       voxelwise_dfs = [voxelwise_df], 
                                       path_to_dataframes = where_to_save)
print("CSVs saved to: ", df_paths_dict)

# 06 - Choose the Python Script you Want to Execute

In [None]:
from calvin_utils.file_utils.script_printer import ScriptInfo
from calvin_utils.permutation_analysis_utils.scripts_for_submission.script_descriptions import script_dict
info = ScriptInfo(script_dict)
info.print_all_info()

To select a script, copy the value of the 'Method' field that you want to use. 

In [None]:
method_choice = 'Voxelwise_Fit_Test'

# 07 - Transfer Files to a New Directory
- Make sure remote_path_to_save_to exists on the machine you are using

In [None]:
directory_to_save_to = '/Users/cu135/Dropbox (Partners HealthCare)/studies/atrophy_seeds_2023/Figures/correlation_to_memory/analysis'

In [None]:
import importlib
from calvin_utils.server_utils.file_transfer_helper import LocalTransfer
df_paths_dict['python_script'] = [importlib.import_module(info.get_module_by_method(method_choice)).__file__]
file_transfer = LocalTransfer.transfer_files_in_dict(local_files=df_paths_dict, dest_path=directory_to_save_to)

# 08 - Prepare Script Inputs
- below argument will output require arguments for your choice

In [None]:
inputs = info.get_inputs_by_method(method_choice)
inputs

**Enter Arguments for a Script**

Copy the dictionary printed above into the cell below and fill it out. 
- You do not need to edit that keys with _paths/_path in them.

Example:
```
 script_inputs_dict =
  {
 'n_cores': 16,
 'out_dir': "/PHShome/cu135/permutation_tests/f_test/age_by_stim_ad_dbs_redone/results/tmp",
 'job_name': 'ftest_bm',
 'memory_per_job': 8,
 'outcome_data_path': remote_df_paths_dict['outcomes'],
 'clinical_covariate_paths': remote_df_paths_dict['covariates'],
 'voxelwise_data_paths': remote_df_paths_dict['voxelwise']
 }
 ```

In [None]:
script_inputs_dict = {
 'n_cores': 5,
 'out_dir': f"{remote_path_to_save_to}/results/raw_results",
 'job_name': 't_test_coef',
 'memory_per_job': 8,
 'outcome_data_path': remote_df_paths_dict['outcomes'],
 'clinical_covariate_paths': remote_df_paths_dict['covariates'],
 'neuroimaging_df_paths': remote_df_paths_dict['voxelwise']
 }

# 09 - Submit Jobs to Server
- user_email = You need to enter the email associated with the server
- num_permutations = You need to enter the amount of times this will be permuted
- queue_name = you need to enter the LSF queue to use
- server_env_activation_string = the string that engages your environment in the server. 
```
Example:
user_email = "choward12@bwh.harvard.edu"
num_permutations = 10000
queue_name = "big-multi"
server_env_activation_string = "conda activate nimlab"
```
Want more information on server submission?
- https://rc.partners.org/kb/article/1462

In [None]:
user_email = "choward12@bwh.harvard.edu"
num_permutations = 10000
queue_name = "short"
server_env_activation_string = "conda activate nimlab"

In [None]:
import numpy as np
from calvin_utils.server_utils.job_submission_helper import LSFServer, LSFJob, JobSubmitter
lsf_job = LSFJob(job_name=script_inputs_dict['job_name'],
                 user_email=user_email,
                 output_dir="~/terminal_outputs",
                 error_dir="~/terminal_outputs",
                 queue=queue_name,
                 n_jobs=int(np.round(num_permutations/script_inputs_dict['n_cores'])),
                 cpus=script_inputs_dict['n_cores'],
                 gb_requested=script_inputs_dict['memory_per_job'],
                 wait_time=None,
                 script_path=remote_df_paths_dict['python_script'][0],
                 environment_activation_string=server_env_activation_string,
                 options=script_inputs_dict
                 )

lsf_server = LSFServer(server_name=server, 
                       username=username)

job_submitter = JobSubmitter(lsf_server, lsf_job)
job_command = job_submitter.submit_jobs(print_job=True)

# 10 - Get the Observed Data
- Call the function of interest the docstring which will be printed below. 
- The function will be the final part of the import

In [None]:
import_statement = info.get_script_import(method_choice)
docstring_statement = info.get_docstring(method_choice)
print(import_statement)
print(docstring_statement)
exec(import_statement)
exec(docstring_statement)

Use the paths to the local CSVs to enter your arguments

Example
```
results = voxelwise_r_squared(outcome_df, [voxelwise_df], [design_matrix_df.drop(voxelwise_variable, axis=1)])
```

In [None]:
import os
results = voxelwise_r_squared(outcome_df, [voxelwise_df], [design_matrix_df.drop(voxelwise_variable, axis=1)], get_coefficients=True)
results.to_csv(os.path.join(out_dir, 'results/results.csv'))

**Unmask, Save, and Visualize Results**

In [None]:
from calvin_utils.nifti_utils.generate_nifti import view_and_save_nifti
view_and_save_nifti(GiiNii.unmask_dataframe(results.loc[:, ['R_squared']]), os.path.join(out_dir, 'results'))

# 11 - Get the permutation data
- This code uses a file-staging approach to large-scale computation. The resultant files have been saved to your output directory. You must now recompose them. 
- From this point forward, you will want to upload this notebook to the server and run it from there. Could also download the remote files to local via SCP if desired.

In [None]:

import os
import pandas as pd

def combine_csvs(directory, output_filename):
    """
    Combine all CSV files in a directory into a single CSV file.

    Parameters:
    - directory (str): The path to the directory containing the CSV files.
    - output_filename (str): The path to the output CSV file.
    """
    # Initialize an empty DataFrame
    combined_df = pd.DataFrame()

    # Get a list of all CSV files in the directory
    csv_files = [f for f in os.listdir(directory) if f.endswith(".csv")]

    # Loop through the CSV files and append each one to the combined DataFrame
    for csv_file in csv_files:
        df = pd.read_csv(os.path.join(directory, csv_file))
        combined_df = pd.concat([combined_df, df])

    # Save the combined DataFrame as a new CSV file
    combined_df.to_csv(output_filename, index=False)
    return output_filename


In [None]:
recomposed_csv_path = combine_csvs("/path/to/your/directory", "combined.csv")
recomposed_csv_df = pd.read_csv(recomposed_csv_path)
recomposed_csv_df

# 12 -  Calculate FWE-Corrected P-Values

In [None]:
from calvin_utils.permutation_analysis_utils.statistical_utils.p_value_statistics import PermutationPValueCalculator

first_stage_dir = 'permutation_tests/f_test/age_by_stim_pd_dbs_redone/inputs/results/raw_results'
job_name = 'f_test_pd'
observed_nifti_path = '/PHShome/cu135/permutation_tests/f_test/age_by_stim_pd_dbs_redone/inputs/observed/f_statistic_generated_nifti.nii'

In [None]:
calculator = PermutationPValueCalculator(None, None)
fwe_p_values = calculator.fwe_calculate(directory=first_stage_dir, basename=job_name, nifti_path=None, use_nifti=True, multiprocess=False)