# Run Any Kind of OLS Regression (ANOVA, GLM, etc.)

### Authors: Calvin Howard.

#### Last updated: February 1, 2025

Use this to run/test a statistical model (e.g., regression or T-tests) on a spreadsheet.

Notes:
- To best use this notebook, you should be familar with GLM design and Contrast Matrix design. See this webpage to get started:
[FSL's GLM page](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM)

Prepare output directory

In [2]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/ccm_memory/results/notebook_05'

# 00 - Import CSV with All Data
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

Import Data

In [3]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Partners HealthCare Dropbox/Calvin Howard/studies/ccm_memory/results/notebook_00/master_list_working_v3.csv'
sheet = None

In [4]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet=sheet)
# Call the process_nifti_paths method
data_df = cal_palm.read_data()
data_df

Unnamed: 0,Dataset,Subject,Roi_File_Path,Nifti_File_Path,age,sex,diagnosis,overall_cognition,lesion_size,Unnamed__9,memory_higher_is_better_for_stim_but_high_is_worse_for_lesion,memory_higher_is_better,var_1yr_memory,Memory_Measure,Cause_of_Change,Group,Reference,Localization_Approach
0,adni_Normal,002uSu0295,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,84.898630,,Normal,3.00,,,-90.000000,90.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
1,adni_Normal,002uSu0413,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,76.397260,,Normal,3.33,,,-90.000000,90.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
2,adni_Normal,002uSu0559,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,79.410959,,Normal,6.00,,,-80.000000,80.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
3,adni_Alzheimer,002uSu0619,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,77.512329,,Alzheimer,19.33,,,0.000000,0.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
4,adni_Normal,002uSu0685,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,89.698630,,Normal,3.67,,,-70.000000,70.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3467,wang_tms,CNDR014,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,-0.142857,-0.142857,,,,,,
3468,wang_tms,CNDR019,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.200000,0.200000,,,,,,
3469,wang_tms,CNDR020,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.250000,0.250000,,,,,,
3470,wang_tms,CNDR021,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.000000,0.000000,,,,,,


# 01 - Preprocess Your Data

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [5]:
data_df.columns

Index(['Dataset', 'Subject', 'Roi_File_Path', 'Nifti_File_Path', 'age', 'sex',
       'diagnosis', 'overall_cognition', 'lesion_size', 'Unnamed__9',
       'memory_higher_is_better_for_stim_but_high_is_worse_for_lesion',
       'memory_higher_is_better', 'var_1yr_memory', 'Memory_Measure',
       'Cause_of_Change', 'Group', 'Reference', 'Localization_Approach'],
      dtype='object')

In [6]:
drop_list = ['Roi_File_Path', 'diagnosis', 'memory_higher_is_better']

In [7]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)
display(data_df)

Unnamed: 0,Dataset,Subject,Roi_File_Path,Nifti_File_Path,age,sex,diagnosis,overall_cognition,lesion_size,Unnamed__9,memory_higher_is_better_for_stim_but_high_is_worse_for_lesion,memory_higher_is_better,var_1yr_memory,Memory_Measure,Cause_of_Change,Group,Reference,Localization_Approach
0,adni_Normal,002uSu0295,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,84.898630,,Normal,3.00,,,-90.000000,90.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
1,adni_Normal,002uSu0413,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,76.397260,,Normal,3.33,,,-90.000000,90.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
2,adni_Normal,002uSu0559,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,79.410959,,Normal,6.00,,,-80.000000,80.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
3,adni_Alzheimer,002uSu0619,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,77.512329,,Alzheimer,19.33,,,0.000000,0.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
4,adni_Normal,002uSu0685,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,89.698630,,Normal,3.67,,,-70.000000,70.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3467,wang_tms,CNDR014,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,-0.142857,-0.142857,,,,,,
3468,wang_tms,CNDR019,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.200000,0.200000,,,,,,
3469,wang_tms,CNDR020,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.250000,0.250000,,,,,,
3470,wang_tms,CNDR021,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,/Volumes/Expansion/datasets/VOSS_STUDIES_TMS/W...,,,Normal,,,,0.000000,0.000000,,,,,,


**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

In [8]:
data_df.columns

Index(['Dataset', 'Subject', 'Roi_File_Path', 'Nifti_File_Path', 'age', 'sex',
       'diagnosis', 'overall_cognition', 'lesion_size', 'Unnamed__9',
       'memory_higher_is_better_for_stim_but_high_is_worse_for_lesion',
       'memory_higher_is_better', 'var_1yr_memory', 'Memory_Measure',
       'Cause_of_Change', 'Group', 'Reference', 'Localization_Approach'],
      dtype='object')

Set the parameters for dropping rows

In [9]:
column = 'diagnosis'  # The column you'd like to evaluate
condition = 'not'  # The condition to check ('equal', 'above', 'below', 'not')
value = 'Alzheimer' # The value to drop if found

In [10]:
data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
display(data_df)

Unnamed: 0,Dataset,Subject,Roi_File_Path,Nifti_File_Path,age,sex,diagnosis,overall_cognition,lesion_size,Unnamed__9,memory_higher_is_better_for_stim_but_high_is_worse_for_lesion,memory_higher_is_better,var_1yr_memory,Memory_Measure,Cause_of_Change,Group,Reference,Localization_Approach
3,adni_Alzheimer,002uSu0619,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,77.512329,,Alzheimer,19.33,,,0.000000,0.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
5,adni_Alzheimer,002uSu0729,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,65.224658,,Alzheimer,6.67,,,-20.000000,20.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
7,adni_Alzheimer,002uSu0816,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,70.838356,F,Alzheimer,16.00,,,-20.000000,20.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
8,adni_Alzheimer,002uSu0938,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,82.276712,F,Alzheimer,21.67,,,0.000000,0.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
9,adni_Alzheimer,002uSu0954,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,69.452055,,Alzheimer,10.67,,,-10.000000,10.000000,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3385,fornix_memory,146,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,76.000000,,Alzheimer,,,-54.545455,-54.545455,-54.545455,,,,,,
3386,fornix_memory,147,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,59.000000,,Alzheimer,,,-4.761905,-4.761905,-4.761905,,,,,,
3387,fornix_memory,148,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,51.000000,,Alzheimer,,,-207.692308,-207.692308,-207.692308,,,,,,
3388,fornix_memory,149,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,77.000000,,Alzheimer,,,-90.000000,-90.000000,-90.000000,,,,,,


**Standardize Data**
- Enter Columns you Don't want to standardize into a list
- group_col is the column containing a category for each dataset. It ensures standardization is performed within each group.

In [11]:
# Remove anything you don't want to standardize
cols_not_to_standardize = ['Roi_File_Path', 'Subject']
group_col = 'Dataset'

In [12]:
data_df = cal_palm.standardize_columns(cols_not_to_standardize, group_col=group_col)
data_df

Unable to standardize column Nifti_File_Path.
Unable to standardize column sex.
Unable to standardize column diagnosis.
Unable to standardize column Memory_Measure.
Unable to standardize column Cause_of_Change.
Unable to standardize column Group.
Unable to standardize column Reference.
Unable to standardize column Localization_Approach.
Unable to standardize column Nifti_File_Path.
Unable to standardize column diagnosis.


Unnamed: 0,Dataset,Subject,Roi_File_Path,Nifti_File_Path,age,sex,diagnosis,overall_cognition,lesion_size,Unnamed__9,memory_higher_is_better_for_stim_but_high_is_worse_for_lesion,memory_higher_is_better,var_1yr_memory,Memory_Measure,Cause_of_Change,Group,Reference,Localization_Approach
3,adni_Alzheimer,002uSu0619,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,0.250576,,Alzheimer,0.705993,,,1.115801,-1.115801,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
5,adni_Alzheimer,002uSu0729,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,-1.245588,,Alzheimer,-1.282537,,,0.250839,-0.250839,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
7,adni_Alzheimer,002uSu0816,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,-0.562056,F,Alzheimer,0.182944,,,0.250839,-0.250839,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
8,adni_Alzheimer,002uSu0938,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,0.830694,F,Alzheimer,1.073542,,,1.115801,-1.115801,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
9,adni_Alzheimer,002uSu0954,/Volumes/Expansion/datasets/adni/neuroimaging/...,/Volumes/Expansion/datasets/adni/neuroimaging/...,-0.730854,,Alzheimer,-0.654249,,,0.683320,-0.683320,,ADAS Cog 11 Question 4,Atrophy,Lesion,Petersen et al. 2010,MRI
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3385,fornix_memory,146,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,1.151077,,Alzheimer,,,-0.355183,-0.355183,-0.355183,,,,,,
3386,fornix_memory,147,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,-0.997234,,Alzheimer,,,0.656095,0.656095,0.656095,,,,,,
3387,fornix_memory,148,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,-2.008204,,Alzheimer,,,-3.466132,-3.466132,-3.466132,,,,,,
3388,fornix_memory,149,/Volumes/Expansion/datasets/AD_dataset/derivat...,/Volumes/Expansion/datasets/AD_dataset/derivat...,1.277448,,Alzheimer,,,-1.075389,-1.075389,-1.075389,,,,,,


# 02 - Define Your Formula

**Critical: the dependent (y) variable should always be the column with the neuroimaging files in it**

This is the formula relating outcome to predictors, and takes the form:
- y = B0 + B1 + B2 + B3 + . . . BN

It is defined using the columns of your dataframe instead of the variables above:
- 'Apples_Picked ~ hours_worked + owns_apple_picking_machine'

____
**ANOVA**
- Tests differences in means for one categorical variable.
- formula = 'Outcome ~ C(Group1)'

**2-Way ANOVA**
- Tests differences in means for two categorical variables without interaction.
- formula = 'Outcome ~ C(Group1) + C(Group2)'

**2-Way ANOVA with Interaction**
- Tests for interaction effects between two categorical variables.
- formula = 'Outcome ~ C(Group1) * C(Group2)'

**ANCOVA**
- Similar to ANOVA, but includes a covariate to control for its effect.
- formula = 'Outcome ~ C(Group1) + Covariate'

**2-Way ANCOVA**
- Extends ANCOVA with two categorical variables and their interaction, controlling for a covariate.
- formula = 'Outcome ~ C(Group1) * C(Group2) + Covariate'

**Multiple Regression**
- Assesses the impact of multiple predictors on an outcome.
- formula = 'Outcome ~ Predictor1 + Predictor2'

**Simple Linear Regression**
- Assesses the impact of a single predictor on an outcome.
- formula = 'Outcome ~ Predictor'

**MANOVA**
- Assesses multiple dependent variables across groups.
- Note: Not typically set up with a formula in statsmodels. Requires specialized functions.

____
Use the printout below to design your formula. 
- Left of the "~" symbol is the thing to be predicted. 
- Right of the "~" symbol are the predictors. 
- ":" indicates an interaction between two things. 
- "*" indicates and interactions AND it accounts for the simple effects too. 
- "+" indicates that you want to add another predictor. 

In [13]:
data_df.columns

Index(['Dataset', 'Subject', 'Roi_File_Path', 'Nifti_File_Path', 'age', 'sex',
       'diagnosis', 'overall_cognition', 'lesion_size', 'Unnamed__9',
       'memory_higher_is_better_for_stim_but_high_is_worse_for_lesion',
       'memory_higher_is_better', 'var_1yr_memory', 'Memory_Measure',
       'Cause_of_Change', 'Group', 'Reference', 'Localization_Approach'],
      dtype='object')

** the left side of the equation is expected to be called 'Nifti_File_Path'. This should be in your CSV as a column, spelled the same way. **

In [14]:
formula = "Nifti_File_Path ~  diagnosis*memory_higher_is_better + Dataset"

# 02 - Visualize Your Design Matrix

This is the explanatory variable half of your regression formula
_______________________________________________________
Create Design Matrix: Use the create_design_matrix method. You can provide a list of formula variables which correspond to column names in your dataframe.

- design_matrix = palm.create_design_matrix(formula_vars=["var1", "var2", "var1*var2"])
- To include interaction terms, use * between variables, like "var1*var2".
- By default, an intercept will be added unless you set intercept=False
- **don't explicitly add the 'intercept' column. I'll do it for you.**
- If you want to compare specific datasets within a column, leave 'coerce_str'=False

In [15]:
# Define the design matrix
outcome_matrix, design_matrix = cal_palm.define_design_matrix(formula, data_df=data_df, voxelwise_variable_list=['Nifti_File_Path'], coerce_str=False)
design_matrix

Unnamed: 0,Intercept,Dataset[T.fornix_memory],memory_higher_is_better
3,1.0,0.0,-1.115801
5,1.0,0.0,-0.250839
7,1.0,0.0,-0.250839
8,1.0,0.0,-1.115801
9,1.0,0.0,-0.683320
...,...,...,...
3385,1.0,1.0,-0.355183
3386,1.0,1.0,0.656095
3387,1.0,1.0,-3.466132
3388,1.0,1.0,-1.075389


# 03 - Visualize Your Dependent Variable

I have generated this for you based on the formula you provided

In [22]:
outcome_matrix.iloc[0,0]

'/Volumes/Expansion/datasets/adni/neuroimaging/all_patients_atrophy_csfgm_connectivity/sub-subh002uSu0619ugreyumatter+cerebrospinalufluid/connectivity/sub-subh002uSu0619ugreyumatter+cerebrospinalufluid_tome-GSP1000uMF_space-2mm_stat-t_conn.nii.gz'

# 04 - Generate Contrasts

Generate a Contrast Matrix
- This is different from the contrast matrices used in cell-means regressions such as in PALM, but it is much more powerful. 



For more information on contrast matrices, please refer to this: https://cran.r-project.org/web/packages/codingMatrices/vignettes/codingMatrices.pdf

Generally, these drastically effect the results of ANOVA. However, they are mereley a nuisance for a regression.
In essence, they assess if coefficients are significantly different

________________________________________________________________
A coding matrix (a contrast matrix if it sums to zero) is simply a way of defining what coefficients to evaluate and how to evaluate them. 
If a coefficient is set to 1 and everything else is set to zero, we are taking the mean of the coefficient's means and assessing if they significantly
deviate from zero--IE we are checking if it had a significant impact on the ability to predict the depdendent variable.
If a coefficient is set to 1, another is -1, and others are 0, we are assessing how the means of the two coefficients deviate from eachother. 
If several coefficients are 1 and several others are -1, we are assessing how the group-level means of the two coefficients deviate from eachother.
If a group of coefficients are 1, a group is -1, and a group is 0, we are only assessing how the groups +1 and -1 have differing means. 

1: This value indicates that the corresponding variable's coefficient in the model is included in the contrast. It means you are interested in estimating the effect of that variable.

0: This value indicates that the corresponding variable's coefficient in the model is not included in the contrast. It means you are not interested in estimating the effect of that variable.

-1: This value indicates that the corresponding variable's coefficient in the model is included in the contrast, but with an opposite sign. It means you are interested in estimating the negative effect of that variable.

----------------------------------------------------------------
The contrast matrix is typically a matrix with dimensions (number of contrasts) x (number of regression coefficients). Each row of the contrast matrix represents a contrast or comparison you want to test.

For example, let's say you have the following regression coefficients in your model:

Intercept, Age, connectivity, Age_interaction_connectivity
A contrast matric has dimensions of [n_predictors, n_experiments] where each experiment is a contrast

If you want to test the hypothesis that the effect of Age is significant, you can set up a contrast matrix with a row that specifies this contrast (actually an averaging vector):
```
[0,1,0,0]. This is an averaging vector because it sums to 1
```
This contrast will test the coefficient corresponding to the Age variable against zero.


If you want to test the hypothesis that the effect of Age is different from the effect of connectivity, you can set up a contrast matrix with two rows:
```
[0,1,−1,0]. This is a contrast because it sums to 0
```

Thus, if you want to see if any given effect is significant compared to the intercept (average), you can use the following contrast matrix:
```
[1,0,0,0]
[-1,1,0,0]
[-1,0,1,0]
[-1,0,0,1] actually a coding matrix of averaging vectors
```

The first row tests the coefficient for Age against zero, and the second row tests the coefficient for connectivity against zero. The difference between the two coefficients can then be assessed.
_____
You can define any number of contrasts in the contrast matrix to test different hypotheses or comparisons of interest in your regression analysis.

It's important to note that the specific contrasts you choose depend on your research questions and hypotheses. You should carefully consider the comparisons you want to make and design the contrast matrix accordingly.

- Examples:
    - [Two Sample T-Test](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM#Two-Group_Difference_.28Two-Sample_Unpaired_T-Test.29)
    - [One Sample with Covariate](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/GLM#Single-Group_Average_with_Additional_Covariate)

In [17]:
contrast_matrix = cal_palm.generate_basic_contrast_matrix(design_matrix)

Here is a basic contrast matrix set up to evaluate the significance of each variable.
Here is an example of what your contrast matrix looks like as a dataframe: 


Unnamed: 0,Intercept,Dataset[T.fornix_memory],memory_higher_is_better
0,1,0,0
1,0,1,0
2,0,0,1


Below is the same contrast matrix, but as an array.
Copy it into a cell below and edit it for more control over your analysis.
[
    [1, 0, 0],
    [0, 1, 0],
    [0, 0, 1],
]


Edit Contrast Matrix Here
- The generic contrast matrix will simply check if your Betas are significantly different from the intercept (average)

In [18]:
# contrast_matrix = [
#     [1, 0, 0, 0, 0],
#     [0, 1, 0, 0, 0],
#     [0, 0, 1, 0, 0],
#     [0, 0, 0, 1, 0],
#     [0, 0, 0, 0, 1],
# ]

Finalize Contrast Matrix

In [19]:
contrast_matrix_df = cal_palm.finalize_contrast_matrix(design_matrix=design_matrix, 
                                                    contrast_matrix=contrast_matrix) 
contrast_matrix_df

Unnamed: 0,Intercept,Dataset[T.fornix_memory],memory_higher_is_better
0,1,0,0
1,0,1,0
2,0,0,1


# 06 - Save the Files

Standardization during regression is critical. 
- data_transform_method='standardize' will ensure the voxelwise values are standardized
    - if you design matrix has a column called 'Dataset', the standardization will standardize values within each dataset individually, which is as should be done normally.
    - If you call data_transform_method='standardize' without having a 'Dataset' column in your design matrix, the entire collection of images will be standardized. This is potentially dangerous and misleading. Be careful, and consider not standardizing at all, or going back and adding a 'Dataset' column. 

Mask Path
- set mask_path to the path of your local brain mask which matches the resolution of the files you have collected. Typically this is an MNI 152 brain mask. 
    - download one here: https://nilearn.github.io/dev/modules/generated/nilearn.datasets.load_mni152_brain_mask.html

In [20]:
mask_path = '/Users/cu135/hires_backdrops/MNI152_T1_2mm_brain.nii'
data_transform_method='standardize'

In [21]:
from calvin_utils.ccm_utils.npy_utils import RegressionNPYPreparer
preparer = RegressionNPYPreparer(
    design_matrix=design_matrix,
    contrast_matrix=contrast_matrix_df,
    outcome_matrix=outcome_matrix,
    out_dir=out_dir,
    mask_path=mask_path,
    exchangeability_blocks=None,   # or your DataFrame
    data_transform_method=data_transform_method
)
# dataset_dict, json_path = preparer.run()

Done

-Calvin