# Run A Mixed Effects Model

### Authors: Calvin Howard.

#### Last updated: July 6, 2023

Use this to assess if a predictors relationship to the predictee is different between two groups. 

Notes:
- To best use this notebook, you should be familar with mixed effects models

# 00 - Import CSV with All Data
**The CSV is expected to be in this format**
- ID and absolute paths to niftis are critical
```
+-----+----------------------------+--------------+--------------+--------------+
| ID  | Nifti_File_Path            | Covariate_1  | Covariate_2  | Covariate_3  |
+-----+----------------------------+--------------+--------------+--------------+
| 1   | /path/to/file1.nii.gz      | 0.5          | 1.2          | 3.4          |
| 2   | /path/to/file2.nii.gz      | 0.7          | 1.4          | 3.1          |
| 3   | /path/to/file3.nii.gz      | 0.6          | 1.5          | 3.5          |
| 4   | /path/to/file4.nii.gz      | 0.9          | 1.1          | 3.2          |
| ... | ...                        | ...          | ...          | ...          |
+-----+----------------------------+--------------+--------------+--------------+
```

In [110]:
# Specify the path to your CSV file containing NIFTI paths
input_csv_path = '/Users/cu135/Dropbox (Partners HealthCare)/studies/review_pyper/metadata/metrics.csv'

In [111]:
# Specify where you want to save your results to
out_dir = '/Users/cu135/Dropbox (Partners HealthCare)/studies/review_pyper/figures/mansucript_figures/data_extraction'

In [112]:
from calvin_utils.permutation_analysis_utils.statsmodels_palm import CalvinStatsmodelsPalm
# Instantiate the PalmPrepararation class
cal_palm = CalvinStatsmodelsPalm(input_csv_path=input_csv_path, output_dir=out_dir, sheet=None)
# Call the process_nifti_paths method
data_df = cal_palm.read_and_display_data()


Unnamed: 0,Unnamed__0,Variable,Category,Sensitivity,Specificity,Precision,PPV,NPV,Accuracy,F1_Score
0,0,Case,Data in Document,0.857143,1.0,1.0,1.0,0.125,0.86,0.923077
1,1,Amnesia Type,Data in Document,0.905263,0.4,0.966292,0.966292,0.181818,0.88,0.934783
2,2,Confounding Disease,Diagnostic Inference,0.833333,0.136364,0.116279,0.116279,0.857143,0.22,0.204082
3,3,Amnesia Measured,Data in Document,0.965116,0.642857,0.943182,0.943182,0.75,0.92,0.954023
4,4,Isolated Amnesia,Diagnostic Inference,1.0,0.024691,0.193878,0.193878,1.0,0.21,0.324786
5,5,Neuroimaging,Data in Document,0.989796,1.0,1.0,1.0,0.666667,0.99,0.994872
6,6,Neurodegeneration,Diagnostic Inference,1.0,0.032258,0.072165,0.072165,1.0,0.1,0.134615
7,7,Atypical Amnesia,Diagnostic Inference,0.714286,0.182796,0.061728,0.061728,0.894737,0.22,0.113636
8,8,English,Data in Document,1.0,1.0,1.0,1.0,1.0,1.0,1.0
9,9,Bias,Case Quality,0.090909,0.641026,0.066667,0.066667,0.714286,0.52,0.076923


# 01 - Preprocess Your Data

**Handle NANs**
- Set drop_nans=True is you would like to remove NaNs from data
- Provide a column name or a list of column names to remove NaNs from

In [None]:
data_df.columns

In [None]:
drop_list = ['Age', 'Z_Scored_Percent_Cognitive_Improvement']

In [None]:
data_df = cal_palm.drop_nans_from_columns(columns_to_drop_from=drop_list)
display(data_df)

**Drop Row Based on Value of Column**

Define the column, condition, and value for dropping rows
- column = 'your_column_name'
- condition = 'above'  # Options: 'equal', 'above', 'below'

In [None]:
data_df.columns

Set the parameters for dropping rows

In [None]:
column = 'Cohort'  # The column you'd like to evaluate
condition = 'equal'  # The condition to check ('equal', 'above', 'below')
value = 3  # The value to compare against

In [None]:
data_df, other_df = cal_palm.drop_rows_based_on_value(column, condition, value)
display(data_df)

**Standardize Data**
- Enter Columns you Don't want to standardize into a list

In [None]:
# Remove anything you don't want to standardize
cols_not_to_standardize = ['Age']

In [None]:
data_df = cal_palm.standardize_columns(cols_not_to_standardize)
data_df

Descriptive Stats

In [None]:
data_df.describe()

# 02 Plot

**Grouped Barplot**
- Expects a Dataframe with a category the grouping variable that sets colour. 
- variable represents each thig to be plotted, like 'neuroimaging, bias, etc'. 
- metric is the value of the variable to be plotted.

In [113]:
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import pandas as pd

def plotly_grouped_radial_bar(dataframe, metric='Accuracy', category_col='Category', variable_col='Variable', title='Title Here'):
    """
    Creates a grouped radial bar chart with Plotly, where each 'category' is a group with a unique color,
    and each 'variable' within that category is an individual bar, evenly spaced around the plot.
    
    Args:
        dataframe (pandas.DataFrame): DataFrame containing the data to plot.
        metric (str): The metric to plot. Defaults to 'Accuracy'.
        category_col (str): The column name for the grouping category.
        variable_col (str): The column name for the individual variables.

    Returns:
        plotly.graph_objs._figure.Figure: Plotly Figure object for the radial bar chart.
    """
    # Assign colors to each category for grouping
    color_palette = px.colors.qualitative.T10
    category_list = dataframe[category_col].unique().tolist()
    color_dict = {category: color_palette[i % len(color_palette)] for i, category in enumerate(category_list)}

    # Determine the total number of unique variables across all categories
    total_variables = len(dataframe[variable_col].unique())

    # Initialize the figure
    fig = go.Figure()
    
    i = 0
    for category in category_list:
        category_data = dataframe[dataframe[category_col] == category]
        category_data.sort_values(by=[metric], inplace=True)
        fig.add_trace(go.Barpolar(
            r=[0],
            theta=[0],
            name=category,
            marker_color=color_dict[category],
            showlegend=True
            ))
        for j, (idx, row) in enumerate(category_data.iterrows()):
            # Calculate the angle for the current variable
            angle = (i * 360) / total_variables

            # Add a bar for the current variable
            fig.add_trace(go.Barpolar(
                r=[row[metric]],
                theta=[angle],
                width=[360 / total_variables],  # Slightly reduce width for spacing between bars
                name=f"{row[variable_col]}",
                marker_color=color_dict[row[category_col]],
                marker_line_color='black',
                marker_line_width=1,
                opacity=0.9,
                showlegend=False
            ))
            i = i+1

    # Handle Legend
    fig.update_layout(
        legend=dict(
            title=dict(text=variable_col),
            itemsizing='constant',
            orientation='h',
            traceorder='normal',
            font=dict(
                size=12,
            ),
            x=0.5,  # Center the title
            xanchor='center',  # Ensure the title is centered
            yanchor='bottom'  # Position the title at the top of the plot
            )
    )
    # Handle Title
    fig.update_layout(
    title=dict(
        text=title,  # Replace with your title
        x=0.5,  # Center the title
        xanchor='center',  # Ensure the title is centered
        yanchor='top'  # Position the title at the top of the plot
        )
    )
    # Handle Colour
    fig.update_layout(
        polar=dict(
            bgcolor="white",  # Set the polar background color to white
            radialaxis=dict(showgrid=True, gridcolor='gray'),  # Show radial axis grid lines
            angularaxis=dict(showgrid=True, gridcolor='gray')  # Show angular axis grid lines
        )
    )

    return fig

In [114]:
data_df.columns

Index(['Unnamed__0', 'Variable', 'Category', 'Sensitivity', 'Specificity',
       'Precision', 'PPV', 'NPV', 'Accuracy', 'F1_Score'],
      dtype='object')

In [115]:
# Usage with your dataframe
fig = plotly_grouped_radial_bar(data_df, metric='Accuracy', category_col='Category', variable_col='Variable', title='Accuracy of GPT Manuscript Read')
fig.show()



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



**Example of Polar Barplot**

In [None]:
import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Barpolar(
    r=[77.5, 72.5, 70.0, 45.0, 22.5, 42.5, 40.0, 62.5],
    name='11-14 m/s',
    marker_color='rgb(106,81,163)'
))
fig.add_trace(go.Barpolar(
    r=[57.5, 50.0, 45.0, 35.0, 20.0, 22.5, 37.5, 55.0],
    name='8-11 m/s',
    marker_color='rgb(158,154,200)'
))
fig.add_trace(go.Barpolar(
    r=[40.0, 30.0, 30.0, 35.0, 7.5, 7.5, 32.5, 40.0],
    name='5-8 m/s',
    marker_color='rgb(203,201,226)'
))
fig.add_trace(go.Barpolar(
    r=[20.0, 7.5, 15.0, 22.5, 2.5, 2.5, 12.5, 22.5],
    name='< 5 m/s',
    marker_color='rgb(242,240,247)'
))

fig.update_traces(text=['North', 'N-E', 'East', 'S-E', 'South', 'S-W', 'West', 'N-W'])
fig.update_layout(
    title='Wind Speed Distribution in Laurel, NE',
    font_size=16,
    legend_font_size=16,
    polar_radialaxis_ticksuffix='%',
    polar_angularaxis_rotation=90,

)
fig.show()