# Run A Mixed Effects Model

### Authors: Calvin Howard.

#### Last updated: July 6, 2023

Use this to assess if a predictors relationship to the predictee is different between two groups. 

Notes:
- To best use this notebook, you should be familar with mixed effects models

# 00 - Import CSV with All Data
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [None]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/cognition_2023/metadata/master_list_proper_subjects.xlsx'

In [None]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Library/CloudStorage/OneDrive-Personal/OneDrive_Documents/Research/2023/subiculum_cognition_and_age/figures/Figures/retrospective_cohorts_figure/analyses'

In [None]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet='master_list_proper_subjects')
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()


# 01 - Preprocess Your Data

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [None]:
data_df.columns

In [None]:
drop_list = ['Age', 'Z_Scored_Percent_Cognitive_Improvement_By_Origin_Group', 'Z_Scored_Subiculum_T_By_Origin_Group_']

In [None]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)
display(data_df)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

In [None]:
data_df.columns

Set the parameters for dropping rows

In [None]:
column = 'City'  # The column you'd like to evaluate
condition = 'not'  # The condition to check ('equal', 'above', 'below')
value = 'Wurzburg'  # The value to compare against

In [None]:
data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
data_df

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [None]:
# Remove anything you don't want to standardize
cols_not_to_standardize = ['Age']

In [None]:
# data_df = cal_palm.standardize_columns(cols_not_to_standardize)
data_df

Descriptive Stats

In [None]:
data_df.describe()

# 01 - Model Data

In [None]:
formula = 

In [None]:
results = 

# 02 - Identification of a Saddle Point


# Partial Derivative Explanation for the Equation $ y = B_1x + B_2z + B_3xz $

When taking the partial derivative of the equation $ y = B_1x + B_2z + B_3xz $ with respect to $ x $, the logic is as follows:

- Treat $ z $ as a constant since we are differentiating with respect to $ x $. 
- Derivatives of constants are zero. Derivatives of first-order polynomials ($ x $) are one. 
- All terms with $ z $ are treated as constants.
    - This means both $ B_2z $ and $ B_3z $ are considered constants.
    - When differentiated with respect to $ x $:
        - $ B_2z $ does not have $ x $. Thus its derivative is zero.
        - $ B_3z $ has an $ x $ term in $ B_3zx $, thus its derivative is the constant $ B_3z $. 
            - This is due to the special situation of the product rule wherein the derivative of a constant and a differentiable variable is = constant * derivative of differentiable variable.

Hence, the partial derivative of $ y $ with respect to $ x $ is given by:

$$ {\partial y}/{\partial x} = {\partial y}/{\partial x}(B_1x) + {\partial y}/{\partial x}(B_2z) + {\partial y}/{\partial x}(B_3xz) $$

The product rule is applied to the interaction term, which expanding provides:

$$ {\partial y}/{\partial x} = {\partial y}/{\partial x}(B_1x) + {\partial y}/{\partial x}(B_2z) + {\partial y}/{\partial x}(B_3x) * {\partial y}/{\partial x}(B_3z) $$

Which applying the product rule, is equivalent to:

$$ {\partial y}/{\partial x} = {\partial y}/{\partial x}(B_1x) + {\partial y}/{\partial x}(B_2z) + 1 * {\partial y}/{\partial x}(B_3z) $$

The derivative of a constant (z) is equivalent to zero. Thus, simplifying this, we get:

$$ {\partial y}{\partial x} = B_1 + 0 + B_3z $$

Therefore, the resulting equation for the partial derivative is:

$$ {\partial y}{\partial x} = B_1 + B_3z $$

This equation represents the rate of change of $ y $ with respect to $ x $, while holding $ z $ constant.


Get coefficients

In [None]:
results.coefficients

# Split & Visualize Data By Saddle Point

This code is designed to create an interaction plot to visualize the effects of two factors and their interaction on the outcome variable.

The interaction_plot function takes as input a dataframe, two factors (x_one and x_two), two corresponding labels for the conditions when the values of these factors are under the mean (x_one_under_mean and x_two_under_mean) and over the mean (x_one_over_mean and x_two_over_mean), and the response variable (outcome). If binarize is set to True, it converts the two factors into binary variables based on whether their values are above or below the mean. The function then creates a mapping for the x_two variable to numerical values for the purpose of plotting.

It uses the interaction_plot function from the statsmodels package to create the plot. In the plot, x_two is represented on the x-axis, x_one is used to color the lines, and the outcome variable is plotted on the y-axis. The function also sets the labels for the x and y axes and the tick labels on the x-axis according to the inputs provided.

The function also allows for saving the plot to an output directory specified by the user. If save is set to True, it saves the plot in both PNG and SVG formats.

In [None]:
from calvin_utils.statistical_utils.calculus_utils import saddle_binarization
# Running the function to display the interaction plot
save = False
#----------------------------------------------------------------
interaction_figure = saddle_binarization(data_df.copy(), 
                 x_one='Age', x_one_under_mean='Young', x_one_over_mean='Old', x_one_split_point=65,
                 x_two='Z_Scored_Subiculum_Connectivity', x_two_under_mean='Low Connectivity', x_two_over_mean='High Connectivity', x_two_split_point=-0.1,
                 response='Z_Scored_Percent_Cognitive_Improvement', 
                 x_label='Subiculum Connectivity', 
                 y_label='Average Cognitive Improvement (Standardized)',
                 plot_error_bars=False,
                 save=True, out_dir=out_dir)

interaction_figure