# Notebook to Compare 2 R-Maps using Permutation
- Notes on controlling a regression:
    - Adding covariates to a regression will 'control' for them, but will almost always increase the R-squared. 
    - To 'remove' a covariate from the regression, you will want to regress a nuisance covariate OUT of the covariate of interest. 
        - This means your regressor will become the residuals from the regression of cov_1 ~ nuisance_cov1

# Get Dataset One

Import Niftis
- These are EXPECTED to have subject IDs which are IDENTICAL to the subject IDs that go in the covarite DF column names below
- Column labels are subject IDs. 
- This is expected to ultimately have the form:

|        |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 |  10 | ... |  40 |  41 |  42 |  43 |  45 |  46 |  47 |  48 |  49 |  50 |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|-----|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| Voxel 1     | 3          | 4         | 7         | 2         | 2         | 2         | 9         | 4         | 7         | 5          | ... | 5           | 2           | 7           | 7           | 3           | 8           | 8           | 1           | 1           | 3           |
| . . .      | ...         | ...        | ...         | ...         | ...         | ...         | ...         | ...         | ...         | ...          | ... | ...           | ...           | ...           | ...           | ...           | ...           | 7           | ...           | ...           | ...           |
| Voxel N     | 2          | 1         | 0         | 1         | 3         | 4         | 9         | 5         | 8         | 6          | ... | 6           | 3           | 8           | 8           | 4           | 9           | 9           | 2           | 2           | 4           |

In [1]:
import_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/unsmoothed_atrophy_seeds_v2'
file_target = '*/*/unthresholded_tissue_segment_z_scores/*_cerebrospinal_fluid.nii'

In [None]:
from calvin_utils.file_utils.import_functions import GiiNiiFileImport
giinii = GiiNiiFileImport(import_path=import_path, file_column=None, file_pattern=file_target)
nimg_df = giinii.run()
nimg_df

Fix names

In [3]:
pre = 'sub-'
post = '_cerebros'

In [None]:
nimg_df = GiiNiiFileImport.splice_colnames(nimg_df, pre, post)
nimg_df

Import Covariates

**The CSV is expected to be in this format**
- sub column contents MUST match the names of the neuroimaging files above. 
    - ID column 
```
+-----+----------------------------+--------------+--------------+--------------+
| sub | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [5]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/metadata/atrophy_roi_scores/master_list_w_only_unthresholded.csv'
sheet= None #'FCS_Demographics_and_Behavior.c' #'Memory'

In [6]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/Figures/correlation_to_memory/comparison_of_hpc_peaks'

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet=sheet)
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()
data_df

In [8]:
variable_of_interest = 'Q4'

In [None]:
# data_df['subject'] = data_df['subject'].str[4:]
data_df

**Preprocess Your Data**

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [10]:
# data_df.columns

In [11]:
drop_list = [variable_of_interest]

In [None]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)
display(data_df)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

Set the parameters for dropping rows

In [13]:
# column = 'redcap_event_name'  # The column you'd like to evaluate
# condition = 'not'  # Thecondition to check ('equal', 'above', 'below', 'not')
# value = '1year_arm_1' # The value to compare against

In [14]:
# data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
# data_df

Regress out a Covariate

In [15]:
# lis = []
# for col in data_df.columns:
#     if 'surface' in col.lower():
#         lis.append(col)
# print(lis)

In [16]:
# from calvin_utils.statistical_utils.regression_utils import RegressOutCovariates
## use this code block to regress out covariates. Generally better to just include as covariates in a model..
# dependent_variable_list = lis
# regressors = ['Age', 'Sex']

# data_df, adjusted_dep_vars_list = RegressOutCovariates.run(df=data_df, dependent_variable_list=dependent_variable_list, covariates_list=regressors)
# print(adjusted_dep_vars_list)

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [17]:
## Remove anything you don't want to standardize
# cols_not_to_standardize = ['Age',  'Subiculum_Connectivity_T']

In [18]:
# data_df = cal_palm.standardize_columns(cols_not_to_standardize)
# data_df

Choose Rows to Keep
- Keep your subject row and your dependent variable

In [19]:
col_to_keep_list = [variable_of_interest, 'subject']

- The final DF is EXPECTED to have subject IDs which are IDENTICAL to the subject IDs that go in the neuroimaging DF column names above
- There should only be 1 variable  the row

|        |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 |  10 | ... |  40 |  41 |  42 |  43 |  45 |  46 |  47 |  48 |  49 |  50 |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|-----|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| Indep. Var.    | 3          | 4         | 7         | 2         | 2         | 2         | 9         | 4         | 7         | 5          | ... | 5           | 2           | 7           | 7           | 3           | 8           | 8           | 1           | 1           | 3           |

In [None]:
data_df=data_df.loc[:, col_to_keep_list]
data_df = data_df.T
data_df.columns = data_df.loc['subject', :]
data_df = data_df.drop('subject')
data_df.dropna(inplace=True, axis=1)
data_df

# Get Dataset Two

Import Niftis
- These are EXPECTED to have subject IDs which are IDENTICAL to the subject IDs that go in the covarite DF column names below
- Column labels are subject IDs. 
- This is expected to ultimately have the form:

|        |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 |  10 | ... |  40 |  41 |  42 |  43 |  45 |  46 |  47 |  48 |  49 |  50 |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|-----|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| Voxel 1     | 3          | 4         | 7         | 2         | 2         | 2         | 9         | 4         | 7         | 5          | ... | 5           | 2           | 7           | 7           | 3           | 8           | 8           | 1           | 1           | 3           |
| . . .      | ...         | ...        | ...         | ...         | ...         | ...         | ...         | ...         | ...         | ...          | ... | ...           | ...           | ...           | ...           | ...           | ...           | 7           | ...           | ...           | ...           |
| Voxel N     | 2          | 1         | 0         | 1         | 3         | 4         | 9         | 5         | 8         | 6          | ... | 6           | 3           | 8           | 8           | 4           | 9           | 9           | 2           | 2           | 4           |

In [21]:
import_path2 = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/shared_analysis/niftis_for_elmira/unsmoothed_atrophy_seeds_v2'
file_target2 = '*/*/unthresholded_tissue_segment_z_scores/*_grey_matter.nii'

In [None]:
from calvin_utils.file_utils.import_functions import GiiNiiFileImport
giinii2 = GiiNiiFileImport(import_path=import_path2, file_column=None, file_pattern=file_target2)
nimg_df2 = giinii2.run()
nimg_df2

Fix names

In [23]:
pre2 = 'sub-'
post2 = '_grey'

In [None]:
nimg_df2 = GiiNiiFileImport.splice_colnames(nimg_df2, pre2, post2)
nimg_df2

Import Covariates

**The CSV is expected to be in this format**
- sub column contents MUST match the names of the neuroimaging files above. 
    - ID column 
```
+-----+----------------------------+--------------+--------------+--------------+
| sub | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [25]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path2 = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/metadata/atrophy_roi_scores/master_list_w_only_unthresholded.csv'
sheet2 = None

In [26]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm2 = CalvinStatsmodelsPalm(input_csv_path=input_csv_path2, output_dir=out_dir, sheet=sheet2)
# Call the process_nifti_paths method
data_df2 = cal_palm2.read_and_display_data()

**Preprocess Your Data**

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [27]:
# data_df2.columns

In [28]:
drop_list2 = ['Q4']

In [None]:
data_df2 = cal_palm2.drop_nans_from_columns(columns_to_drop_from=drop_list2)
display(data_df2)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

Set the parameters for dropping rows

In [30]:
# column2 = 'City'  # The column you'd like to evaluate
# condition2 = 'not'  # Thecondition to check ('equal', 'above', 'below', 'not')
# value2 = 'Toronto' # The value to compare against

In [31]:
# data_df2, other_df2 = cal_palm2.drop_rows_based_on_value(column2, condition2, value2)
# data_df2

Regress out a Covariate

In [32]:
# lis = []
# for col in data_df2.columns:
#     if 'surface' in col.lower():
#         lis.append(col)
# print(lis)

In [33]:
from calvin_utils.statistical_utils.regression_utils import RegressOutCovariates
## use this code block to regress out covariates. Generally better to just include as covariates in a model..
# dependent_variable_list2 = lis
# regressors2 = ['Age', 'Sex']

# data_df2, adjusted_dep_vars_list2 = RegressOutCovariates.run(df=data_df2, dependent_variable_list=dependent_variable_list2, covariates_list=regressors2)
# print(adjusted_dep_vars_list2)

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [34]:
## Remove anything you don't want to standardize
# cols_not_to_standardize2 = ['Age',  'Subiculum_Connectivity_T']

In [35]:
# data_df2 = cal_palm2.standardize_columns(cols_not_to_standardize2)
# data_df2

Choose Rows to Keep
- Keep subject and dependent variable row

In [36]:
col_to_keep_list2 = ['Q4', 'subject']

- The final DF is EXPECTED to have subject IDs which are IDENTICAL to the subject IDs that go in the neuroimaging DF column names above
- There should only be 1 variable  the row

|        |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 |  10 | ... |  40 |  41 |  42 |  43 |  45 |  46 |  47 |  48 |  49 |  50 |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|------------|-----|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| Indep. Var.    | 3          | 4         | 7         | 2         | 2         | 2         | 9         | 4         | 7         | 5          | ... | 5           | 2           | 7           | 7           | 3           | 8           | 8           | 1           | 1           | 3           |

In [None]:
data_df2=data_df2.loc[:, col_to_keep_list2]
data_df2 = data_df2.T
data_df2.columns = data_df2.loc['subject']
data_df2 = data_df2.drop('subject')
data_df2.dropna(inplace=True, axis=1)
data_df2

# Define an Already Existing Map to Compare Similarity To
- if not using, set to None

In [38]:
map_path = None

# Test 2 Maps

# Prepare the Arguments for Permutation Testing

Is there a particular mask you want to use?
- MUST match the resolution of voxelwise data being analyzed. 
- If you set None, the voxelwise data will be used for thresholding. 
    - Values below mask_threshold (float) will be set to 0. 
- Warning: bad masking may result in failed experiments. Erroneous voxels outside the brain will influence the correction. 

In [39]:
mask_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/resources/atlases/memory/hippocampus_2mm.nii'
mask_threshold = 0

Correlation method
- spearman or pearson

In [40]:
method = 'spearman'

Choose Max Stat Correction Method
- None | pseudo_var_smooth | var_smooth

In [41]:
max_stat_method = 'pseudo_var_smooth'

ROI to analyze within

In [42]:
roi_path = None
roi_threshold = 0

Initialize the Permutation testing Class

In [43]:
from calvin_utils.permutation_analysis_utils.correlation_fwe_comparison import CalvinFWEWrapper
wrapper = CalvinFWEWrapper(neuroimaging_dataframe1=nimg_df, 
                           variable_dataframe1=data_df, 
                           neuroimaging_dataframe2=nimg_df2, 
                           variable_dataframe2=data_df2, 
                           mask_threshold=mask_threshold, 
                           mask_path=mask_path, 
                           out_dir=out_dir, 
                           method=method, 
                           max_stat_method=max_stat_method,
                           roi_path=roi_path, roi_threshold=0,
                           map_path=map_path, use_spearman=True,
                           two_tail=True)

Analyze the Similarity of the 2 maps

In [None]:
# Running Pearson correlation analysis with ROI mask
observed_correlation, permuted_correlations = wrapper.run_pearson_analysis(n_permutations=10)

Analyze the Distance between the Peaks of the Two Maps

In [45]:
# Running peak voxel finding analysis with ROI mask
# observed_peak_distance, permuted_peak_distances = wrapper.run_peak_voxel_analysis(n_permutations=1000)

Analyze the Magnitude between the Peaks of the Two Maps

In [None]:
# Running peak voxel finding analysis with ROI mask
observed_peak_delta, permuted_peak_deltas = wrapper.run_peak_corr_analysis(n_permutations=1000)

Bootstrap the Magnitude Between the Peaks of the Two maps

In [None]:
# observed_peak_difference, bootstrapped_dist = wrapper.bootstrap_peak_corr(n_permutations=1000)

In [58]:
import pandas as pd
csf_wm = pd.DataFrame(permuted_peak_deltas)

In [None]:
import seaborn as sns
sns.violinplot(csf_wm + 0.5)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Data
dvals = {'CSF': [0.57], 
         'GM': [0.29],
         'WM': [0.24],
         'CTh': [0.24]}
df = pd.DataFrame(dvals).T.reset_index()
df.columns = ['Region', 'Value']

# Plot
plt.figure(figsize=(10, 5))  # Adjust width and height for the desired length
sns.barplot(data=df, x='Value', y='Region', palette='tab10')
plt.xlim(0, 0.6)  # Set x-axis range
plt.xlabel('Correlation of Connectivity to Cognitive Outcomes')
plt.ylabel('Brain Region')
plt.title('Correlation of Brain Regions to Cognitive Improvement')
plt.grid(False)
plt.savefig('/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/atrophy_seeds_2023/Figures/correlation_to_memory/barplot.svg')

plt.show()


CSF vs GM: Observed: [[0.00518039]], p-value [0.054], using 2-tail: True.

CSF vs WM: Observed: [[0.23151403]], p-value [0.994], using 2-tail: True.

CSF vs CTh: Observed: [[0.2300000]], p-value [0.992], using 2-tail: True.