# Imaginary Coherence Analysis of Sleep EEG Data

This Jupyter notebook presents a comprehensive approach to imaginary coherence analysis of sleep EEG data in GRIN2B-related disorders. The steps performed in this code are:

1. **Data Loading**: Importing necessary Python libraries and reading in the EEG data from Excel files, which are categorized into three brain states: Awake, Non-REM, and REM.

2. **Data Cleaning and Merging**: The data from the same sheet names across different brain states are concatenated. New columns for 'Electrode Distance', 'Genotype', and 'Brain State' are added for clarity.

3. **Data Consolidation**: All data from different sheets and states are combined into a single DataFrame and saved as a .csv file.

4. **Data Visualization**: Box plots, line plots, and bar plots are created to provide visual representations of the imaginary coherence values across different frequency bins for each genotype (Wild Type and Knock-Out) and brain state (Awake, Non-REM, and REM). Plots are saved as .png files.

5. **Statistical Analysis**: The Shapiro-Wilk test, Levene's test, and Mann-Whitney U test are performed to analyze the data statistically. The results are saved in separate .csv files.

Below is the complete Python code:

In [110]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
import itertools
from statsmodels.stats.multicomp import MultiComparison
from statsmodels.stats.multicomp import pairwise_tukeyhsd


# Paths to the Excel files
awake_file = '/Users/valentinreateguirangel/Python/Imaginary coherence analysis /Awake/z_Individual_Coh_Wake_3.0_2023_03_15_11_12_47_imag.xlsx'
nonrem_file = '/Users/valentinreateguirangel/Python/Imaginary coherence analysis /NonRem/z_Individual_Coh_NonREM_3.0_2023_03_14_16_50_31_imag.xlsx'
rem_file = '/Users/valentinreateguirangel/Python/Imaginary coherence analysis /Rem/z_Individual_Coh_REM_3.0_2023_03_22_11_59_07_imag.xlsx'


# Read the Excel files
awake_data = pd.read_excel(awake_file, sheet_name=['ShortWT', 'LongWT', 'ShortKO', 'LongKO'])
nonrem_data = pd.read_excel(nonrem_file, sheet_name=['ShortWT', 'LongWT', 'ShortKO', 'LongKO'])
rem_data = pd.read_excel(rem_file, sheet_name=['ShortWT', 'LongWT', 'ShortKO', 'LongKO'])

# Combine the data from the same sheet names across the files
combined_data = {}
for sheet_name in awake_data.keys():
    combined_data[sheet_name] = pd.concat([awake_data[sheet_name], nonrem_data[sheet_name], rem_data[sheet_name]], ignore_index=True)

# Add a column for electrode distance (long or short), genotype (wild type or mutated), and brain state (REM, non-REM, wake)
for sheet_name, data in combined_data.items():
    distance, genotype = sheet_name[:-2], sheet_name[-2:]
    data['Electrode_Distance'] = distance
    data['Genotype'] = genotype
    data['Brain_State'] = 'Unknown'  # Initialize the Brain_State column with a placeholder value

# Assign the brain states separately
awake_rows = len(awake_data['ShortWT'])
nonrem_rows = len(nonrem_data['ShortWT'])
rem_rows = len(rem_data['ShortWT'])

for sheet_name, data in combined_data.items():
    data.loc[:awake_rows - 1, 'Brain_State'] = 'wake'
    data.loc[awake_rows:awake_rows + nonrem_rows - 1, 'Brain_State'] = 'non-REM'
    data.loc[awake_rows + nonrem_rows:awake_rows + nonrem_rows + rem_rows - 1, 'Brain_State'] = 'REM'

# Combine all sheets into a single DataFrame
all_data = pd.concat(combined_data.values(), ignore_index=True)

# Save the combined data to a new Excel file
all_data.to_csv('/Users/valentinreateguirangel/Python/Imaginary coherence analysis /combined_data.csv', index=False)
all_data = all_data.drop(columns=['Unnamed: 0'])

In [113]:
###########Ploting#################

sns.set_theme(style="whitegrid")

def save_and_close_figure(filename):
    plt.tight_layout()
    plt.savefig(filename)
    plt.close()

def create_box_plot(data_mean, electrode_distance, brain_state):
    # Define the frequency bins
    bins = [1, 10, 20, 30, 40, 60]
    labels = ['1-10', '10-20', '20-30', '30-40', '40-60']
    data_mean['Freq_Bin'] = pd.cut(data_mean['Freqs'], bins=bins, labels=labels, right=False)

    # Box plot
    plt.figure()
    sns.boxplot(data=data_mean, x='Freq_Bin', y='Coherence', hue='Genotype', hue_order=['WT', 'KO'])
    plt.title(f'Box Plot of Imaginary Coherence Values across Frequency Bins - {electrode_distance} - {brain_state}')
    plt.xlabel('Frequency Bin')
    plt.ylabel('Coherence')
    save_and_close_figure(f"Boxplot_{electrode_distance}_{brain_state}.png")

def create_plots(data):
    data['Brain_State'] = data['Brain_State'].str.upper()

    for brain_state in data['Brain_State'].unique():
        for electrode_distance in data['Electrode_Distance'].unique():
            data_subset = data[(data['Electrode_Distance'] == electrode_distance) & (data['Brain_State'] == brain_state)]

            data_melted = data_subset.melt(id_vars=['Freqs', 'Electrode_Distance', 'Genotype', 'Brain_State'], var_name='Rat', value_name='Coherence')
            data_mean = data_melted.groupby(['Freqs', 'Genotype', 'Rat'])['Coherence'].mean().reset_index()

            # Modified box plot
            create_box_plot(data_mean, electrode_distance, brain_state)

             # Line plot
            plt.figure()
            sns.lineplot(data=data_mean, x='Freqs', y='Coherence', hue='Genotype', ci=None, palette={'WT': 'blue', 'KO': 'orange'})
            plt.title(f'Imaginary Coherence Values across Frequencies - {electrode_distance} - {brain_state}')
            plt.xlabel('Frequency')
            plt.ylabel('Coherence')
            save_and_close_figure(f"Lineplot_{electrode_distance}_{brain_state}.png")

            # Bar plot
            create_bar_plot(data_mean, electrode_distance)

# Assuming you have already read the data into a DataFrame named all_data
create_plots(all_data)

In [112]:
###########Statistical analysis for distance an genotype################
def perform_shapiro_test(data, electrode_distances):
    shapiro_results = []
    for electrode_distance in electrode_distances:
        data_subset = data[data['Electrode_Distance'] == electrode_distance]
        groups = data_subset['Genotype'] + "_" + data_subset['Brain_State']
        
        for group in groups.unique():
            group_data = data_subset.loc[groups == group, 'Coherence']
            shapiro_test = stats.shapiro(group_data)
            shapiro_results.append([electrode_distance, group, shapiro_test.statistic, shapiro_test.pvalue])

    shapiro_df = pd.DataFrame(shapiro_results, columns=['Electrode_Distance', 'Group', 'Statistic', 'p-value'])
    shapiro_df.to_csv('shapiro_test_results.csv', index=False)
    return shapiro_df

def perform_mann_whitney_u_test(data, electrode_distances):
    mw_results = []
    for electrode_distance in electrode_distances:
        data_subset = data[data['Electrode_Distance'] == electrode_distance]
        groups = data_subset['Genotype'] + "_" + data_subset['Brain_State']
        group_combinations = list(itertools.combinations(groups.unique(), 2))
        
        for group1, group2 in group_combinations:
            if group1.split('_')[1] == group2.split('_')[1]: # Only compare the same brain states
                group1_data = data_subset.loc[groups == group1, 'Coherence']
                group2_data = data_subset.loc[groups == group2, 'Coherence']
                mw_test = stats.mannwhitneyu(group1_data, group2_data)
                mw_results.append([electrode_distance, group1, group2, mw_test.statistic, mw_test.pvalue])

    mw_df = pd.DataFrame(mw_results, columns=['Electrode_Distance', 'Group1', 'Group2', 'Statistic', 'p-value'])
    mw_df.to_csv('mann_whitney_u_test_results.csv', index=False)
    return mw_df

def perform_levenes_test_for_groups(data, electrode_distances):
    levenes_results = []
    for electrode_distance in electrode_distances:
        data_subset = data[data['Electrode_Distance'] == electrode_distance]
        groups = data_subset['Genotype'] + "_" + data_subset['Brain_State']
        levene_test = stats.levene(*[data_subset.loc[groups == group, 'Coherence'] for group in groups.unique()])
        levenes_results.append([electrode_distance, levene_test.statistic, levene_test.pvalue])

    levenes_df = pd.DataFrame(levenes_results, columns=['Electrode_Distance', 'Statistic', 'p-value'])
    levenes_df.to_csv('levenes_test_results.csv', index=False)
    return levenes_df

# Perform Shapiro-Wilk test for each group and electrode distance and save the results in a table
shapiro_df = perform_shapiro_test(mean_data, mean_data['Electrode_Distance'].unique())

# Perform Levene's test for specified groups and save the results in a table
levenes_df = perform_levenes_test_for_groups(mean_data, mean_data['Electrode_Distance'].unique())

# Perform Mann-Whitney U test for each electrode distance and save the results in a table
mw_df = perform_mann_whitney_u_test(mean_data, mean_data['Electrode_Distance'].unique())

# Conclusion

In this study, we investigated the influence of genotype and brain state on imaginary coherence values, aiming to better understand the underlying neural mechanisms in GRIN2B-related disorders. The data were meticulously pre-processed, divided into wake, NonREM, and REM sleep stages, and further categorized based on electrode distance and genotype into Short distance Wild Type (ShortWT), Long distance Wild Type (LongWT), Short distance Knockout (ShortKO), and Long distance Knockout (LongKO). 

Through the use of graphical representations, box plots, and linear plots, we embarked on a journey to identify any significant differences in coherence measures between Knockout (KO) and Wild Type (WT) mice across various brain states and electrode distances. 

The findings from our analysis (summarized in Table 1) highlighted that both genotype and electrode distance indeed have an impact on imaginary coherence values, which underlines the importance of considering these factors in future studies on neural connectivity and communication.

Further statistical analysis revealed that our data were non-normally distributed (most p-values < 0.05 in the Shapiro test), leading us to employ non-parametric tests. Levene's test indicated unequal variances between Long and Short electrode distance groups (both p-values < 0.05). Subsequently, we employed the Mann-Whitney U test for comparing the central tendencies of different groups.

Our analysis unveiled statistically significant differences between all pairwise group comparisons (Mann-Whitney U test: p-values ranging from 5.32e-05 to 1.77e-69). This suggests that coherence measures differ significantly between KO and WT rats across all brain states and both long and short electrode distances.

In conclusion, this study underscores the significant impact of genotype and brain state on imaginary coherence values, demonstrating notable differences between KO and WT rats across various brain states and electrode distances. The insights gained from this study open avenues for more detailed and nuanced investigations in the future.
