# Analysis of Aiforia Phosphorylated $\alpha$-Synuclein Raw Data

## Part 0 - Outline
This code handles the automatic processing of raw data from mouse brains analyzed using the Aiforia model “Lewy (PSYN) Pathology Detector”. The data are stored in a local folder on the computer, specified within the code. We typically start with CSV files. To automatically change the format of multiple files, please refer to the notebook Change_Name_Format_Input_Data.ipynb.
This notebook is organized into three sections:

**1) Define Functions for Part 2 & 3**

**2) Automatic Analysis of X * N Slides Across N Brains**
In this section, we automate the analysis of all X*N slides corresponding to N brains (X slides per brain, which is a parameter that the user can choose in the code) contained in the folder with raw data. The approach is as follows:

1) All X*N filenames are collected from the folder and stored in a list.

2) From this list, we extract the N filenames containing '_S1', corresponding to the first slide of each brain.

3) We loop over these N '_S1' slides and perform the following steps for each brain:

    a) We retrieve the second slide (with '_S2' in the filename) belonging to this brain. We do the same for the third ('_S3'), fourth ('_S4'), ..., X'th ('_SX')  slides of this brain.   
    b) We perform the analysis steps on each slide individually (S1, S2, …, SX) and on the combined dataset (S1+S2+…+SX).
    c) We export the results to an excel file for this specific brain.
    
After each iteration, the individual brain’s results are added to a summary table containing data for all brains. Once all brains are processed, this overview table is also exported to Excel.

**3) Automatic Analysis of X * N Slides Across N Brains After Determining the (Un)Injected Areas**

In this section, we extend the analysis by identifying which brain regions are on the injected versus non-injected side. The comparison between both sides follows the same analysis workflow as in Section 2.

## Part 1 - Define the necessary functions

### Part 1.1 - Load all necessary Python packages

In [None]:
import functools
import glob
import math
import os

import pandas as pd
from IPython.display import display

# Pandas display options
pd.options.display.float_format = '{:.2f}'.format

### Part 1.2 - Data Locations

**TO DO:**

Specify the following paths before running the analysis:

1) **Raw data format:** choose the file format of the raw data (e.g.: excel, csv).
2) **Some experimental parameters:** spacing between sections and section thickness.
3) **Raw data folder:** the folder containing the original data files exported from Aiforia.
4) **Results folders:** the folders where the Excel files with processed results will be saved.
5) **Region mapping files:** the location of the Excel files that specify which brain regions have to be replaced and were (un)injected.

Use the following format to define each path: <font color='darkred'>r'file_location'</font>

In [None]:
# Specify what data format you want to use for your raw data: excel, csv or feather. Do this by uncommenting the data_format that you want.
data_format = 'csv'
# data_format = 'excel'
# data_format = 'feather'

# Specify the maximum number of slides per brain. For example, if set to 4, filenames should include '_S1', '_S2', '_S3', and '_S4'. 
# If some brains have fewer slides, the code will generate empty data files for the missing ones to ensure proper execution.
amount_of_slides = 2

# Specify the experimental parameters (section_thickness in micrometers!!):
spacing=12
section_thickness = 40  

# Specify folder locations
folder_raw_data = r'C:\Users\Dieter\...\Raw_data_PSYN'
folder_output_results = r'C:\Users\Dieter\...\Output_Results_PSYN'
folder_output_results_injected = r'C:\Users\Dieter\...\Output_Results_Injected_PSYN'
file_brainregions_to_replace =  r'C:\Users\Dieter\...\Brainregions_To_Replace_PSYN.xlsx'
file_brainregions_injected =  r'C:\Users\Dieter\...\Brainregions_Hemisphere_PSYN.xlsx'

In [None]:
# Create output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

# Create a list of expected slide suffixes. For example, if amount_of_slides = 4, then appendices_list = ['_S1', '_S2', '_S3', '_S4'].
appendices_list = [f"_S{i}" for i in range(1, amount_of_slides + 1)]

### Part 1.3 – Function to Load All Image Files for Analysis

In [None]:
def load_all_file_locations_S1(folder_raw_data: str, data_format: str) -> list[str]:
    """
    Retrieve and list all file locations for S1 images in the specified raw data folder.

    This function searches for all raw data files (based on the specified data format) 
    within the given folder and filters those containing '_S1' in their filename. 
    These represent the first slides (S1) of each brain. 
    If '_S1'/'_S2' naming is not used, '_S1' should be appended manually to filenames.

    Parameters
    ----------
    folder_raw_data : str
        Path to the folder containing raw data files.
    data_format : str
        Format of the raw data files ('excel', 'csv', or 'feather').

    Returns
    -------
    all_raw_data_file_locations_S1: list[str]
        Sorted list of full file paths for S1 images.
    """

    # Determine the file pattern based on the data format
    if data_format == 'excel':
        pattern = "*.xlsx"
    elif data_format == 'csv':
        pattern = "*.csv"
    elif data_format == 'feather':
        pattern = "*.feather"
    else:
        raise ValueError(
            "Invalid data format specified. Please set 'data_format' to 'excel', 'csv', or 'feather'."
        )

    # Retrieve and sort all matching raw data files
    all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, pattern))
    all_raw_data_file_locations.sort()

    print("All raw data file locations:")
    for file_location in all_raw_data_file_locations:
        print(f" - {file_location}")

    # Filter filenames that contain '_S1' (first slides of each brain)
    all_raw_data_file_locations_S1 = [path for path in all_raw_data_file_locations if '_S1' in path]

    print("\nRaw data file locations for S1 images:")
    for file_location_S1 in all_raw_data_file_locations_S1:
        print(f" - {file_location_S1}")

    return all_raw_data_file_locations_S1

### Part 1.4 – Function to Load the Brain Region Correction File

In [None]:
def load_data_brainregions_to_replace(file_brainregions_to_replace: str) -> pd.DataFrame:
    """
    Load and clean the file containing corrections for brain regions that need to be replaced for each image.

    Parameters
    ----------
    file_brainregions_to_replace : str
        Path to the Excel file with columns: 'Image', 'Brainregion_Wrong', 'Brainregion_Correct'.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with brain regions to replace for each image. 
        All brain region names are stripped of spaces and converted to uppercase.
    """
    import pandas as pd
    import os

    if not os.path.exists(file_brainregions_to_replace):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_to_replace}")

    # Load the relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_to_replace,
        usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
        dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion_Wrong'] = df['Brainregion_Wrong'].str.upper().str.strip()
    df['Brainregion_Correct'] = df['Brainregion_Correct'].str.upper().str.strip()

    print("The modified table of brain regions to replace for each image:")
    display(df)
    
    return df


### Part 1.5 – Function to Load the File Indicating Which Brain Regions Were Injected

In [None]:
def load_data_brainregions_injected(file_brainregions_injected: str) -> pd.DataFrame:
    """
    Load and clean the file specifying which brain regions were on the injected side for each specific image.

    Parameters
    ----------
    file_brainregions_injected : str
        Path to the Excel file containing injected brain region information.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with columns:
        - 'Image': image identifier
        - 'Brainregion': uppercase, stripped brain region name
        - 'Daughter1_Injected': uppercase, stripped hemisphere 

    """
    if not os.path.exists(file_brainregions_injected):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_injected}")

    # Load relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_injected,
        usecols=['Image', 'Brainregion', 'Hemisphere'],
        dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion'] = df['Brainregion'].str.upper().str.strip()
    df['Daughter1_Injected'] = df['Hemisphere'].str.upper().str.strip()

    # Drop the original 'Hemisphere' column
    df.drop(columns=['Hemisphere'], inplace=True)

    print("The modified table of injected brain regions for each image:")
    display(df)
    
    return df

### Part 1.6 – Function to Load a DataFrame and Clean It

In [None]:
def dataframe_cleaning(file_location: str, df_brainregions_to_replace: pd.DataFrame, data_format: str) -> pd.DataFrame:
    """
    Load a data file from the given location and clean it using a brain regions replacement table.
    If the file does not exist, an empty file is created with the correct columns.
    Replaces incorrect brain regions, merges names by removing numbers, filters based on area thresholds,
    and calculates derived metrics like Area/Perimeter and Circularity.
    Returns a cleaned dataframe.

    Parameters
    ----------
    file_location : str
        Path to the raw data file for a single image.
    df_brainregions_to_replace : pd.DataFrame
        DataFrame specifying which brain regions need to be replaced for each image.
    data_format : str
        Format of the raw data file: 'excel', 'csv', or 'feather'.

    Returns
    -------
    pd.DataFrame
        Cleaned dataframe with replaced brain regions, merged name columns, and derived metrics.
    """

    # ------------------------
    # Define expected columns and dtypes
    # ------------------------
    columns = ['Image', 'Parent area name', 'Area/object name', 'Class label', 
               'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)']
    dtypes = {
        'Image': 'str',
        'Parent area name': 'str',
        'Area/object name': 'str',
        'Class label': 'str',
        'Class confidence (%)': 'float64',
        'Area (μm²)': 'float64',
        'Circumference (µm)': 'float64'
    }

    # ------------------------
    # Load or create the data file
    # ------------------------
    try:
        if data_format == 'csv':
            df = pd.read_csv(file_location, sep='\t', usecols=columns, dtype=dtypes, keep_default_na=True)
        elif data_format == 'excel':
            df = pd.read_excel(file_location, usecols=columns, dtype=dtypes, keep_default_na=True)
        elif data_format == 'feather':
            df = pd.read_feather(file_location).astype(dtypes)
        else:
            raise ValueError("Invalid data format. Choose 'excel', 'csv', or 'feather'.")
    except FileNotFoundError:
        # Create empty file if missing
        df_empty = pd.DataFrame(columns=columns)
        df_empty.reset_index(inplace=True)
        if data_format == 'csv':
            df_empty.to_csv(file_location, sep='\t', index=False)
            df = pd.read_csv(file_location, sep='\t', usecols=columns, dtype=dtypes)
        elif data_format == 'excel':
            df_empty.to_excel(file_location, index=False)
            df = pd.read_excel(file_location, usecols=columns, dtype=dtypes)
        elif data_format == 'feather':
            df_empty.to_feather(file_location)
            df = pd.read_feather(file_location).astype(dtypes)
        print(f"\nA dataframe at location {file_location} did not exist, so an empty dataframe was created.")

    # ------------------------
    # Remove rows with missing essential values
    # ------------------------
    df.dropna(subset=['Area (μm²)', 'Area/object name'], inplace=True)

    # ------------------------
    # Standardize text columns
    # ------------------------
    df['Parent area name'] = df['Parent area name'].str.upper()
    df['Area/object name'] = df['Area/object name'].str.upper()
    df['Class label'] = df['Class label'].str.upper()

    # ------------------------
    # Extract image name from file path
    # ------------------------
    image_name = os.path.splitext(os.path.basename(file_location))[0]
    print('The present image =', image_name)

    # Make sure the image name across the whole first Image column is correct
    df['Image'] = image_name

    print('The full raw data =')
    display(df)

    # ------------------------
    # Replace incorrect brain regions
    # ------------------------
    # Create the dictionary of brain regions that should be replaced for this specific image
    df_replacements = df_brainregions_to_replace[df_brainregions_to_replace['Image'] == image_name]
    dict_replace = pd.Series(df_replacements.Brainregion_Correct.values, index=df_replacements.Brainregion_Wrong).to_dict()
    print(f"The dictionary of brain regions to replace for {image_name} is:", dict_replace)

    # Apply replacements in both parent and Area/object name columns
    df['Parent area name'] = df['Parent area name'].replace(dict_replace, regex=False)
    df['Area/object name'] = df['Area/object name'].replace(dict_replace, regex=False)

    # ------------------------
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the numbers are deleted from the original columns:
    # ------------------------
    df['Parent area name merged'] = df['Parent area name'].str.replace(r'\d+$', '', regex=True).str.strip()
    df['Area/object name merged'] = df['Area/object name'].str.replace(r'\d+$', '', regex=True).str.strip()

    # ------------------------
    # Propagate EMPTY from parent to its descendants
    # ------------------------
    # Identify rows where the parent area was replaced with 'EMPTY'.
    # Any area/object whose parent became EMPTY should also be considered EMPTY.
    # For example, if a parent area is EMPTY, its child (and potentially further descendants) should also be marked for deletion.
    df_empty_parent = df[df['Parent area name'] == 'EMPTY']
    list_of_area_objects_that_should_be_empty = df_empty_parent['Area/object name'].to_list()
    print('list_of_area_objects_that_should_be_empty =', list_of_area_objects_that_should_be_empty)

    df.loc[df['Parent area name'].isin(list_of_area_objects_that_should_be_empty), "Parent area name"] = "EMPTY"
    df = df[(df['Parent area name'] != 'EMPTY') & (df['Area/object name'] != 'EMPTY')]

    # ------------------------
    # Filter based on area thresholds for inclusions
    # ------------------------
    mask_exclude = (
        ((df['Area (μm²)'] > 600) & df['Area/object name merged'].str.contains('CELLULAR DIFFUSE INCLUSION')) |
        ((df['Area (μm²)'] < 25) & df['Area/object name merged'].str.contains('CELLULAR DIFFUSE INCLUSION')) |
        ((df['Area (μm²)'] < 4) & df['Area/object name merged'].str.contains('A | NEURITIC SEEDED INCLUSION')) |
        ((df['Area (μm²)'] < 15) & df['Area/object name merged'].str.contains('CELLULAR SEEDED INCLUSION'))
    )
    df = df[~mask_exclude]

    # ------------------------
    # Derived metrics
    # ------------------------
    df['Area/Perimeter (μm)'] = df['Area (μm²)'] / df['Circumference (µm)']
    df['Circularity'] = (4 * math.pi * df['Area (μm²)']) / (df['Circumference (µm)'] ** 2)

    print('The fully cleaned table with "Area/Perimeter", "Circularity" and so on:')
    display(df)

    return df


### Part 1.7 - Function to Create Hierarchical Dataframes

The hierarchy: 

One type of **Parent:** BRAIN TISSUE 1, 2, 3, … \
Many types of **Daughter 1:** AMYGDALA 1, 2, 3, …, STRIATUM 1, 2, 3, …, ...  \
Three types of **Daughter 2:** A | CELLULAR DIFFUSE INCLUSION X, A | CELLULAR SEEDED INCLUSION X, A | CELLULAR SEEDED INCLUSION \
Two types of **Daughter 3:** O | CELLULAR DIFFUSE INCLUSION X, O | CELLULAR SEEDED INCLUSION X 

Note: “X” denotes a number identifying the specific object.

In [None]:
def make_hierarchy(df: pd.DataFrame) -> tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame, pd.DataFrame]:
    """
    Constructs hierarchical relationships in a dataframe.
    The column 'Parent area name' is always the parent of the 'Area/object name' in the same row. 
    The Area (μm²) column in the row always belongs to the 'Area/object name'

    The function builds four dataframes representing progressively deeper hierarchy levels:
      1. Top-level parents (rows without their own parent)
      2. Parent + first daughter
      3. Parent + first + second daughter
      4. Parent + first + second + third daughter

    Parameters:
        df (pd.DataFrame): Input dataframe with columns 
            ['Image', 'Parent area name', 'Area/object name', 'Area/object name merged',
             'Area (μm²)', 'Area/Perimeter (μm)', 'Circularity', 'Class label',
             'Parent area name merged']

    Returns:
        tuple: (df_parent, df_parent_daughter1, df_parent_daughter2, df_parent_daughter3)
            - df_parent: top-level parent areas
            - df_parent_daughter1: parent + first daughter
            - df_parent_daughter2: parent + first + second daughter
            - df_parent_daughter3: parent + first + second + third daughter
    """

    # Select top-level parents (rows without their own parent)
    df_parent = df[df['Parent area name'].isna()].copy()
    df_parent.rename(
        columns={
            'Area/object name': 'Parent name',
            'Area/object name merged': 'Parent name merged',
            'Area (μm²)': 'Area Parent (μm²)',
            'Area/Perimeter (μm)': 'Area/Perimeter Parent (μm)',
            'Circularity': 'Circularity Parent'
        },
        inplace=True
    )
    df_parent.drop(columns=['Parent area name', 'Class label', 'Parent area name merged',
                            'Area/Perimeter Parent (μm)', 'Circularity Parent'], inplace=True)

    # Merge to add first daughter (child of top-level parents)
    df_parent_daughter1 = df_parent.merge(
        df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)']],
        left_on='Parent name', right_on='Parent area name', how='inner'
    )
    df_parent_daughter1.rename(
        columns={
            'Parent area name': 'Parent name copy',
            'Area/object name': 'Daughter1',
            'Area/object name merged': 'Daughter1 merged',
            'Area (μm²)': 'Area Daughter1 (μm²)'
        },
        inplace=True
    )

    # Now there can be for instance 2 Striatum 4 rows in df_parent_daughter1. The first one is the original Striatum 4 row.
    # The second one could originate from e.g. changing Amygdala 1 to Striatum 4 in the brainregion corrections replacement 
    # of section 1.6.
    # We need to aggregate these 2 Striatum 4 rows into one unique row because otherwise 
    # we will double in the next join when making df_parent_daughter2.
    df_parent_daughter1 = df_parent_daughter1.groupby('Daughter1', as_index=False).agg({
        'Image': 'first',
        'Parent name': 'first',
        'Area Parent (μm²)': 'first',
        'Parent name merged': 'first',
        'Parent name copy': 'first',
        'Daughter1': 'first',
        'Daughter1 merged': 'first',
        'Area Daughter1 (μm²)': 'sum'
    })

    # Merge to add second daughter (child of Daughter1)
    df_parent_daughter2 = df_parent_daughter1.merge(
        df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)',
            'Area/Perimeter (μm)', 'Circularity']],
        left_on='Daughter1', right_on='Parent area name', how='inner'
    )
    df_parent_daughter2.rename(
        columns={
            'Parent area name': 'Daughter1 copy',
            'Area/object name': 'Daughter2',
            'Area/object name merged': 'Daughter2 merged',
            'Area (μm²)': 'Area Daughter2 (μm²)',
            'Area/Perimeter (μm)': 'Area/Perimeter Daughter2 (μm)',
            'Circularity': 'Circularity Daughter2'
        },
        inplace=True
    )

    # Merge to add third daughter (child of Daughter2)
    df_parent_daughter3 = df_parent_daughter2.merge(
        df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)',
            'Area/Perimeter (μm)', 'Circularity']],
        left_on='Daughter2', right_on='Parent area name', how='inner'
    )
    df_parent_daughter3.rename(
        columns={
            'Parent area name': 'Daughter2 copy',
            'Area/object name': 'Daughter3',
            'Area/object name merged': 'Daughter3 merged',
            'Area (μm²)': 'Area Daughter3 (μm²)',
            'Area/Perimeter (μm)': 'Area/Perimeter Daughter3 (μm)',
            'Circularity': 'Circularity Daughter3'
        },
        inplace=True
    )

    return df_parent, df_parent_daughter1, df_parent_daughter2, df_parent_daughter3


### Part 1.8 - Function to Calculate all Statistics Subdivided in the Three A Types

In [None]:
def all_calculations_per_a_type(
    df1: pd.DataFrame,
    df2: pd.DataFrame,
    groupby_column1: str = 'Area/object name merged',
    groupby_column2: str = 'Daughter1 merged'
) -> dict[str, pd.DataFrame]:
    """
    Compute summary statistics for all A-type inclusions across two hierarchical dataframes.

    A-type inclusions include:
        - 'A | CELLULAR DIFFUSE INCLUSION'
        - 'A | CELLULAR SEEDED INCLUSION'
        - 'A | NEURITIC SEEDED INCLUSION'

    Parameters:
        df1 : pd.DataFrame
            DataFrame containing at least 'Area/object name merged and 'Area (μm²)' for region-level area aggregation.
        df2 : pd.DataFrame
            DataFrame containing at least 'Daughter1 merged, 'Area Daughter2 (μm²)', 'Area/Perimeter Daughter2 (μm)', 
            and 'Circularity Daughter2'.
        groupby_column1 : str, optional
            Column name used to group region areas (default 'Area/object name merged').
        groupby_column2 : str, optional
            Column name used to group inclusion-level features (default 'Daughter1 merged').

    Returns:
        dict: {A-type string: dataframe with calculated statistics per region}
    """

    # Calculate the total region area of each Daughter 1 merged by grouping over all Area/object name merged 
    # in the original (cleaned) dataframe.
    # This returns e.g. the total area of Amygdala = Area Amygdala 1 + Area Amygdala 2 + ... Area Amygdala 7)
    df_region_areas_merged = (
        df1.groupby(groupby_column1, dropna=False, as_index=False)['Area (μm²)']
        .sum()
        .rename(columns={groupby_column1: 'Merged area name', 'Area (μm²)': 'Total Region Area (μm²)'})
    )
    
    # List of A-types to calculate
    A_type_list = [
        'A | CELLULAR DIFFUSE INCLUSION',
        'A | CELLULAR SEEDED INCLUSION',
        'A | NEURITIC SEEDED INCLUSION'
    ]
    
    A_type_dictionary = {}

    for A_type in A_type_list:
        # Filter df2 for the current A-type
        df_A_type = df2[df2['Daughter2 merged'] == A_type]

        # Calculate the total Area (μm²), average Area (μm²), average Area/Perimeter (μm), average Circularity  
        # of the Inclusions belonging to each Daughter1 merged
        agg_dict = {
            'Area Daughter2 (μm²)': ['sum', 'mean'],
            'Area/Perimeter Daughter2 (μm)': 'mean',
            'Circularity Daughter2': 'mean'
        }
        df_stats = df_A_type.groupby(groupby_column2, as_index=False).agg(agg_dict)
        df_stats.columns = ['Merged area name',
                            A_type + ' Total Inclusion Area (μm²)',
                            A_type + ' Average Inclusion Area (μm²)',
                            A_type + ' Average Area/Perimeter (μm)',
                            A_type + ' Average Circularity']

        # Count the number of inclusions per region
        df_counts = df_A_type[groupby_column2].value_counts(sort=True) \
                            .rename_axis('Merged area name').reset_index(name=A_type + ' Counts')

        # Merge with region areas and counts
        df_merged_A_type = functools.reduce(
            lambda left, right: pd.merge(left, right, on='Merged area name', how='outer'),
            [df_region_areas_merged, df_stats, df_counts]
        )

        # Derived calculations
        df_merged_A_type[A_type + ' Percentage PSYN Positive Area'] = \
            100 * df_merged_A_type[A_type + ' Total Inclusion Area (μm²)'] / df_merged_A_type['Total Region Area (μm²)']
        df_merged_A_type[A_type + ' Extrapolated Inclusion Count'] = \
            df_merged_A_type[A_type + ' Counts'] * spacing
        df_merged_A_type[A_type + ' Inclusions/Region Area (per μm²)'] = \
            df_merged_A_type[A_type + ' Counts'] / df_merged_A_type['Total Region Area (μm²)']
        df_merged_A_type[A_type + ' Inclusions/Region Volume (per μm³)'] = \
            df_merged_A_type[A_type + ' Inclusions/Region Area (per μm²)'] / section_thickness
        df_merged_A_type[A_type + ' Inclusions/Region Area (mm²)'] = \
            df_merged_A_type[A_type + ' Inclusions/Region Area (per μm²)'] * 1e6
        df_merged_A_type[A_type + ' Inclusions/Region Volume (mm³)'] = \
            df_merged_A_type[A_type + ' Inclusions/Region Volume (per μm³)'] * 1e9

        # Drop intermediate per μm²/μm³ columns
        df_merged_A_type.drop(columns=[A_type + ' Inclusions/Region Area (per μm²)',
                                        A_type + ' Inclusions/Region Volume (per μm³)'], inplace=True)

        # Define the column order dynamically for the current A_type
        desired_order = [
            "Merged area name",
            "Total Region Area (μm²)",
            f"{A_type} Total Inclusion Area (μm²)",
            f"{A_type} Average Inclusion Area (μm²)",
            f"{A_type} Percentage PSYN Positive Area",
            f"{A_type} Counts",
            f"{A_type} Extrapolated Inclusion Count",
            f"{A_type} Inclusions/Region Area (mm²)",
            f"{A_type} Inclusions/Region Volume (mm³)",
            f"{A_type} Average Area/Perimeter (μm)",
            f"{A_type} Average Circularity",
        ]

        # Reorder columns 
        df_merged_A_type = df_merged_A_type[desired_order]

        # Display results
        print(f'The total {A_type} calculations of each {groupby_column2}')
        display(df_merged_A_type)

        # Save in dictionary
        A_type_dictionary[A_type] = df_merged_A_type

    return A_type_dictionary

### Part 1.9 - Function to Calculate all Statistics Subdivided in the Two O Types

In [None]:
def all_calculations_per_o_type(
    df1: pd.DataFrame,
    df2: pd.DataFrame,
    groupby_column1: str = 'Area/object name merged',
    groupby_column2: str = 'Daughter1 merged'
) -> dict[str, pd.DataFrame]:
    """
    Compute summary statistics for all O-type inclusions across two hierarchical dataframes.

    O-type inclusions include:
        - 'O | CELLULAR DIFFUSE INCLUSION'
        - 'O | CELLULAR SEEDED INCLUSIONS'

    Parameters:
        df1 : pd.DataFrame
            DataFrame containing at least 'Area/object name merged and 'Area (μm²)' for region-level area aggregation.
        df2 : pd.DataFrame
            DataFrame containing at least 'Daughter1 merged, 'Area Daughter3 (μm²)', 'Area/Perimeter Daughter3 (μm)', 
            and 'Circularity Daughter3'.
        groupby_column1 : str, optional
            Column name used to group region areas (default 'Area/object name merged').
        groupby_column2 : str, optional
            Column name used to group inclusion-level features (default 'Daughter1 merged').

    Returns:
        dict: {O-type string: dataframe with calculated statistics per region}
    """

    # Calculate the total region area of each Daughter 1 merged by grouping over all Area/object name merged 
    # in the original (cleaned) dataframe.
    # This returns e.g. the total area of Amygdala = Area Amygdala 1 + Area Amygdala 2 + ... Area Amygdala 7)
    df_region_areas_merged = (
        df1.groupby(groupby_column1, dropna=False, as_index=False)['Area (μm²)']
        .sum()
        .rename(columns={groupby_column1: 'Merged area name', 'Area (μm²)': 'Total Region Area (μm²)'})
    )

    # List of O-types to calculate
    O_type_list = [
        'O | CELLULAR DIFFUSE INCLUSION',
        'O | CELLULAR SEEDED INCLUSIONS'
    ]

    O_type_dictionary = {}

    for O_type in O_type_list:
        # Filter df2 for the current O-type
        df_O_type = df2[df2['Daughter3 merged'] == O_type]

        # Calculate the total Area (μm²), average Area (μm²), average Area/Perimeter (μm), average Circularity
        # of the Inclusions belonging to each Daughter1 merged
        agg_dict = {
            'Area Daughter3 (μm²)': ['sum', 'mean'],
            'Area/Perimeter Daughter3 (μm)': 'mean',
            'Circularity Daughter3': 'mean'
        }
        df_stats = df_O_type.groupby(groupby_column2, as_index=False).agg(agg_dict)
        df_stats.columns = ['Merged area name',
                            O_type + ' Total Inclusion Area (μm²)',
                            O_type + ' Average Inclusion Area (μm²)',
                            O_type + ' Average Area/Perimeter (μm)',
                            O_type + ' Average Circularity']

        # Count the number of inclusions per region
        df_counts = df_O_type[groupby_column2].value_counts(sort=True) \
                        .rename_axis('Merged area name').reset_index(name=O_type + ' Counts')

        # Merge with region areas and counts
        df_merged_O_type = functools.reduce(
            lambda left, right: pd.merge(left, right, on='Merged area name', how='outer'),
            [df_region_areas_merged, df_stats, df_counts]
        )

        # Derived calculations
        df_merged_O_type[O_type + ' Percentage PSYN Positive Area'] = \
            100 * df_merged_O_type[O_type + ' Total Inclusion Area (μm²)'] / df_merged_O_type['Total Region Area (μm²)']
        df_merged_O_type[O_type + ' Extrapolated Inclusion Count'] = \
            df_merged_O_type[O_type + ' Counts'] * spacing
        df_merged_O_type[O_type + ' Inclusions/Region Area (per μm²)'] = \
            df_merged_O_type[O_type + ' Counts'] / df_merged_O_type['Total Region Area (μm²)']
        df_merged_O_type[O_type + ' Inclusions/Region Volume (per μm³)'] = \
            df_merged_O_type[O_type + ' Inclusions/Region Area (per μm²)'] / section_thickness
        df_merged_O_type[O_type + ' Inclusions/Region Area (mm²)'] = \
            df_merged_O_type[O_type + ' Inclusions/Region Area (per μm²)'] * 1e6
        df_merged_O_type[O_type + ' Inclusions/Region Volume (mm³)'] = \
            df_merged_O_type[O_type + ' Inclusions/Region Volume (per μm³)'] * 1e9

        # Drop intermediate per μm²/μm³ columns
        df_merged_O_type.drop(columns=[O_type + ' Inclusions/Region Area (per μm²)',
                                       O_type + ' Inclusions/Region Volume (per μm³)'], inplace=True)
        
        # Define the column order dynamically for the current O_type
        desired_order = [
            "Merged area name",
            "Total Region Area (μm²)",
            f"{O_type} Total Inclusion Area (μm²)",
            f"{O_type} Average Inclusion Area (μm²)",
            f"{O_type} Percentage PSYN Positive Area",
            f"{O_type} Counts",
            f"{O_type} Extrapolated Inclusion Count",
            f"{O_type} Inclusions/Region Area (mm²)",
            f"{O_type} Inclusions/Region Volume (mm³)",
            f"{O_type} Average Area/Perimeter (μm)",
            f"{O_type} Average Circularity",
        ]

        # Reorder columns 
        df_merged_O_type = df_merged_O_type[desired_order]

        # Display results
        print(f'The total {O_type} calculations of each {groupby_column2}')
        display(df_merged_O_type)

        # Save in dictionary
        O_type_dictionary[O_type] = df_merged_O_type

    return O_type_dictionary


### Part 1.10 - Function to Merge the A and O Dataframes and Dictionaries

In [None]:
def merge_all_a_and_o_calculations(
    dict_A: dict[str, pd.DataFrame],
    dict_O: dict[str, pd.DataFrame]
) -> tuple[dict[str, pd.DataFrame], pd.DataFrame]:
    """
    Merge dictionaries containing all A-type and O-type calculations.

    This function performs the following steps:
        1. Combines the two input dictionaries into a single dictionary.
        2. Sorts the merged dictionary in a preferred order for Excel output.
        3. Merges all dataframes in the dictionary into a single dataframe 
           using an outer join, which might be useful later.

    Parameters:
        dict1 (dict): Dictionary of A-type calculations {A-type string: dataframe}.
        dict2 (dict): Dictionary of O-type calculations {O-type string: dataframe}.

    Returns:
        tuple:
            - dict: Sorted dictionary of all A- and O-type calculations.
            - pd.DataFrame: Outer-joined dataframe containing all merged calculations.
    """

    # Step 1: Combine the dictionaries
    A_and_O_type_dictionary = dict_A.copy()
    A_and_O_type_dictionary.update(dict_O)

    # Step 2: Sort the dictionary in a specific order for convenience
    preferred_order = [
        'A | NEURITIC SEEDED INCLUSION',
        'A | CELLULAR SEEDED INCLUSION',
        'O | CELLULAR SEEDED INCLUSIONS',
        'A | CELLULAR DIFFUSE INCLUSION',
        'O | CELLULAR DIFFUSE INCLUSION'
    ]
    A_and_O_type_dictionary_sorted = {key: A_and_O_type_dictionary[key] for key in preferred_order}

    # Step 3: Merge all dataframes in the sorted dictionary using an outer join
    dfs_to_merge = [A_and_O_type_dictionary_sorted[key] for key in A_and_O_type_dictionary_sorted.keys()]
    df_all_calcs_merged = functools.reduce(
        lambda left, right: pd.merge(
            left, right, 
            on=['Merged area name', 'Total Region Area (μm²)'], 
            how='outer'
        ),
        dfs_to_merge
    )

    return A_and_O_type_dictionary_sorted, df_all_calcs_merged


## Part 2 - Automatic Analysis of all X*N Slides of all N Brains


In [None]:
%%time
# Measure execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Get all file names containing '_S1' (first images of all N brains)
all_raw_data_file_locations_S1 = load_all_file_locations_S1(folder_raw_data, data_format)

# Initialize dictionary to store overview dataframes
dictionary_overview_dataframes = {}

# Loop over all S1 images in the raw_data folder
for count, file_location_S1 in enumerate(all_raw_data_file_locations_S1):

    print(f'\nAnalysis of {file_location_S1}')

    # Extract image name from file path
    image_name_S1 = os.path.splitext(os.path.basename(file_location_S1))[0]

    # Dictionaries to store cleaned and hierarchical dataframes for all appendices
    dict_df_SX_final = {}              # {'_S1': df_S1_final, ..., '_SX': df_SX_final}
    dict_df_SX_parent = {}             # {'_S1': df_S1_parent, ..., '_SX': df_SX_parent}
    dict_df_SX_parent_daughter1 = {}   # {'_S1': df_S1_parent_daughter1, ..., '_SX': df_SX_parent_daughter1}
    dict_df_SX_parent_daughter2 = {}   # {'_S1': df_S1_parent_daughter2, ..., '_SX': df_SX_parent_daughter2}
    dict_df_SX_parent_daughter3 = {}   # {'_S1': df_S1_parent_daughter3, ..., '_SX': df_SX_parent_daughter3}
    dict_df_A_and_O_type_dictionary = {}  # {'_S1': {...}, '_S2': {...}, ...}

    # Loop over all appendices ('_S1', '_S2', ..., '_SX')
    for appendix in appendices_list:
        file_location = file_location_S1.replace('_S1', appendix)

        # Clean data and generate hierarchical dataframes
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace, data_format)
        (
            dict_df_SX_parent[appendix],
            dict_df_SX_parent_daughter1[appendix],
            dict_df_SX_parent_daughter2[appendix],
            dict_df_SX_parent_daughter3[appendix]
        ) = make_hierarchy(dict_df_SX_final[appendix])

        # Perform A-type and O-type calculations
        # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for appendix =_S1).
        try:
            A_type_dict_SX = all_calculations_per_a_type(dict_df_SX_final[appendix], dict_df_SX_parent_daughter2[appendix])
            O_type_dict_SX = all_calculations_per_o_type(dict_df_SX_final[appendix], dict_df_SX_parent_daughter3[appendix])
            dict_df_A_and_O_type_dictionary[appendix], dict_df_SX_all_calcs_merged = merge_all_a_and_o_calculations(A_type_dict_SX, O_type_dict_SX)
            print(f'All A-type and O-type calculations merged for {appendix} of {file_location}')
            display(dict_df_SX_all_calcs_merged)
        except:
            pass

    # Concatenate all cleaned and hierarchical dataframes (S1+S2+...+SX)
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Perform overall A-type and O-type calculations for concatenated SX data
    try:
        A_type_dict_concat = all_calculations_per_a_type(df_SX_final_concat, df_SX_parent_daughter2_concat)
        O_type_dict_concat = all_calculations_per_o_type(df_SX_final_concat, df_SX_parent_daughter3_concat)
        A_and_O_type_dict_SX, df_SX_all_calcs_concat = merge_all_a_and_o_calculations(A_type_dict_concat, O_type_dict_concat)
        print(f'All A-type and O-type calculations merged for S1+S2+...+SX of {file_location_S1}')
        display(df_SX_all_calcs_concat)
    except:
        pass

    # Save per-image SX results to Excel
    output_file_SX = os.path.join(
        folder_output_results,
        image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    )

    with pd.ExcelWriter(output_file_SX) as writer:
        # Individual SX results
        for appendix, results_dict in dict_df_A_and_O_type_dictionary.items():
            start_row = 0
            for df_key in results_dict:
                results_dict[df_key].to_excel(
                    writer, sheet_name=appendix[1:] + '_Results',
                    index=False, float_format="%.3f", startrow=start_row
                )
                start_row += results_dict[df_key].shape[0] + 2

        # Concatenated results
        start_row = 0
        for df_key in A_and_O_type_dict_SX:
            A_and_O_type_dict_SX[df_key].to_excel(
                writer, sheet_name='SX_Results',
                index=False, float_format="%.3f", startrow=start_row
            )
            start_row += A_and_O_type_dict_SX[df_key].shape[0] + 2

    # Add overview dataframes per a_o_type_name
    # For the overview excel file, only the 5 dataframes for the 5 a_o_type_names in A_and_O_type_dictionary_SX are needed. 
    # We will make 5 overview excelfiles with 8 tabpages that we store in dictionary_overview_dataframes:
    # dictionary_overview_dataframes = {'A | NEURITIC SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                   'A | CELLULAR SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                    ... }
    for a_o_type_name in A_and_O_type_dict_SX:
        if count == 0:
            dictionary_overview_dataframes[a_o_type_name] = {}

        # Define columns for each metric type
        list_calculation_results = [
            'Total Region Area (μm²)',
            f'{a_o_type_name} Total Inclusion Area (μm²)',
            f'{a_o_type_name} Average Inclusion Area (μm²)',
            f'{a_o_type_name} Percentage PSYN Positive Area',
            f'{a_o_type_name} Counts',
            f'{a_o_type_name} Extrapolated Inclusion Count',
            f'{a_o_type_name} Inclusions/Region Area (mm²)',
            f'{a_o_type_name} Inclusions/Region Volume (mm³)',
            f'{a_o_type_name} Average Area/Perimeter (μm)',
            f'{a_o_type_name} Average Circularity'
        ]
        for calc in list_calculation_results:
            df_calc = A_and_O_type_dict_SX[a_o_type_name][['Merged area name', calc]].copy()
            df_calc.rename(columns={calc: image_name_S1.replace('_S1', '').replace('_PSYN', '')}, inplace=True)

            if count == 0:
                dictionary_overview_dataframes[a_o_type_name][calc] = df_calc.copy()
            else:
                dictionary_overview_dataframes[a_o_type_name][calc] = dictionary_overview_dataframes[a_o_type_name][calc].merge(
                    df_calc, how='outer', on='Merged area name'
                )

    # Clean up memory for next iteration
    del dict_df_SX_final, dict_df_SX_parent, dict_df_SX_parent_daughter1
    del dict_df_SX_parent_daughter2, dict_df_SX_parent_daughter3
    del df_SX_final_concat, df_SX_all_calcs_concat

# After all loops, export overview results
all_a_o_type_names = list(dictionary_overview_dataframes.keys()) # ['A | NEURITIC SEEDED INCLUSION','A | CELLULAR SEEDED INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS', 'A | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR DIFFUSE INCLUSION']

for a_o_type_name, calc_dict in dictionary_overview_dataframes.items():
    a_o_type_name_clean = a_o_type_name.replace(' | ', '_').replace(' ', '_')
    output_overview_file = os.path.join(
        folder_output_results, f'Overview_PSYN_Results_{a_o_type_name_clean}.xlsx'
    )

    list_calculation_results = [
        'Total Region Area (μm²)',
        f'{a_o_type_name} Total Inclusion Area (μm²)',
        f'{a_o_type_name} Average Inclusion Area (μm²)',
        f'{a_o_type_name} Percentage PSYN Positive Area',
        f'{a_o_type_name} Counts',
        f'{a_o_type_name} Extrapolated Inclusion Count',
        f'{a_o_type_name} Inclusions/Region Area (mm²)',
        f'{a_o_type_name} Inclusions/Region Volume (mm³)',
        f'{a_o_type_name} Average Area/Perimeter (μm)',
        f'{a_o_type_name} Average Circularity'
    ]
    
    with pd.ExcelWriter(output_overview_file) as writer:
        for calc in list_calculation_results:
            clean_name = (
                calc.replace(f'{a_o_type_name} ', '')
                .replace('Inclusions/', 'Incl/')
                .replace('/', ' per ')
                .replace('Volume', 'Vol')
            )

            if calc != 'Total Region Area (μm²)':
                # Remove rows where 'Merged area name' equals any of the entries in all_a_o_type_names, because for these entries 
                # only the 'Total Region Area (μm²)' makes sense and has been calculated
                calc_dict[calc] = calc_dict[calc][~calc_dict[calc]['Merged area name'].isin(all_a_o_type_names + ['TISSUE'])]
       
            print(f'Overview dataframe with all {clean_name} for {a_o_type_name} for all brains')
            display(calc_dict[calc])
            calc_dict[calc].to_excel(writer, sheet_name=clean_name, index=False, float_format="%.3f")


## Part 3 – Automatic Analysis of all X*N Slides of all N Brains (Including Injected and Uninjected Hemispheres)

In [None]:
%%time
# Measure execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the file specifying hemisphere (injected/uninjected) classification for each image
df_brainregions_injected = load_data_brainregions_injected(file_brainregions_injected)

# Get all file names containing '_S1' (first images of all N brains)
all_raw_data_file_locations_S1 = load_all_file_locations_S1(folder_raw_data, data_format)

# Initialize dictionary to store overview dataframes
dictionary_overview_dataframes_injected = {}

# Loop over all S1 images in the raw_data folder
for count, file_location_S1 in enumerate(all_raw_data_file_locations_S1):

    print(f'\nAnalysis of {file_location_S1}')

    # Extract image name from file path
    image_name_S1 = os.path.splitext(os.path.basename(file_location_S1))[0]

    # Dictionaries to store cleaned and hierarchical dataframes for all appendices
    dict_df_SX_final = {}
    dict_df_SX_parent = {}
    dict_df_SX_parent_daughter1 = {}
    dict_df_SX_parent_daughter2 = {}
    dict_df_SX_parent_daughter3 = {}

    # Loop over all appendices ('_S1', '_S2', ..., '_SX')
    for appendix in appendices_list:
        file_location = file_location_S1.replace('_S1', appendix)
        print(f'\nAnalysis of {file_location}')

        # Clean data and generate hierarchical dataframes
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace, data_format)
        (
            dict_df_SX_parent[appendix],
            dict_df_SX_parent_daughter1[appendix],
            dict_df_SX_parent_daughter2[appendix],
            dict_df_SX_parent_daughter3[appendix]
        ) = make_hierarchy(dict_df_SX_final[appendix])

    # Concatenate all cleaned and hierarchical dataframes (S1+S2+...+SX)
    print(f'\nAnalysis of concatenated SX files of {image_name_S1}')
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Merge with hemisphere injection information
    # For each dataframe, this determines whether the Daughter1 region was injected or uninjected
    # Brain regions not relevant to the injected analysis are removed through the inner join
    df_SX_final_concat_injected = df_SX_final_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Area/object name'], right_on=['Image', 'Brainregion'], how='inner'
    )
    df_SX_parent_daughter1_injected = df_SX_parent_daughter1_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner'
    )
    df_SX_parent_daughter2_injected = df_SX_parent_daughter2_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner'
    )
    df_SX_parent_daughter3_injected = df_SX_parent_daughter3_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner'
    )

    # Perform A-type and O-type calculations for concatenated SX injected data
    A_type_dict_SX_injected = all_calculations_per_a_type(
        df_SX_final_concat_injected, df_SX_parent_daughter2_injected, 'Daughter1_Injected', 'Daughter1_Injected'
    )
    O_type_dict_SX_injected = all_calculations_per_o_type(
        df_SX_final_concat_injected, df_SX_parent_daughter3_injected, 'Daughter1_Injected', 'Daughter1_Injected'
    )

    A_and_O_type_dict_SX_injected, df_SX_all_calcs_injected = merge_all_a_and_o_calculations(
        A_type_dict_SX_injected, O_type_dict_SX_injected
    )

    print('All A-type and O-type calculations merged for concatenated SX injected')
    display(df_SX_all_calcs_injected)

  # Save individual SX hemisphere results to Excel
    output_file_SX = os.path.join(
        folder_output_results_injected,
        image_name_S1.replace('_S1', '_SX') + '_Hemisphere_Results.xlsx'
    )

    with pd.ExcelWriter(output_file_SX) as writer:
        start_row = 0
        for df_key in A_and_O_type_dict_SX_injected:
            A_and_O_type_dict_SX_injected[df_key].sort_values('Merged area name', ascending=False, inplace=True)
            A_and_O_type_dict_SX_injected[df_key].to_excel(
                writer, sheet_name='SX_Hemisphere_Results',
                index=False, float_format="%.3f", startrow=start_row
            )
            start_row += A_and_O_type_dict_SX_injected[df_key].shape[0] + 2

    # Build overview dataframes per A/O type
    for a_o_type_name in A_and_O_type_dict_SX_injected:
        if count == 0:
            dictionary_overview_dataframes_injected[a_o_type_name] = {}

        list_calculation_results = [
            'Total Region Area (μm²)',
            f'{a_o_type_name} Total Inclusion Area (μm²)',
            f'{a_o_type_name} Average Inclusion Area (μm²)',
            f'{a_o_type_name} Percentage PSYN Positive Area',
            f'{a_o_type_name} Counts',
            f'{a_o_type_name} Extrapolated Inclusion Count',
            f'{a_o_type_name} Inclusions/Region Area (mm²)',
            f'{a_o_type_name} Inclusions/Region Volume (mm³)',
            f'{a_o_type_name} Average Area/Perimeter (μm)',
            f'{a_o_type_name} Average Circularity'
        ]

        for calc in list_calculation_results:
            df_calc = A_and_O_type_dict_SX_injected[a_o_type_name][['Merged area name', calc]].copy()
            df_calc.rename(columns={calc: image_name_S1.replace('_S1', '').replace('_PSYN', '')}, inplace=True)

            if count == 0:
                dictionary_overview_dataframes_injected[a_o_type_name][calc] = df_calc.copy()
            else:
                dictionary_overview_dataframes_injected[a_o_type_name][calc] = dictionary_overview_dataframes_injected[
                    a_o_type_name
                ][calc].merge(df_calc, how='outer', on='Merged area name')

    # Clean up memory for next iteration
    del dict_df_SX_final, dict_df_SX_parent, dict_df_SX_parent_daughter1
    del dict_df_SX_parent_daughter2, dict_df_SX_parent_daughter3
    del df_SX_final_concat, df_SX_all_calcs_injected

# After all loops, export overview results
for a_o_type_name, calc_dict in dictionary_overview_dataframes_injected.items():
    a_o_type_name_clean = a_o_type_name.replace(' | ', '_').replace(' ', '_')
    output_file_overview = os.path.join(
        folder_output_results_injected,
        f'Overview_PSYN_Hemisphere_Results_{a_o_type_name_clean}.xlsx'
    )

    list_calculation_results = [
        'Total Region Area (μm²)',
        f'{a_o_type_name} Total Inclusion Area (μm²)',
        f'{a_o_type_name} Average Inclusion Area (μm²)',
        f'{a_o_type_name} Percentage PSYN Positive Area',
        f'{a_o_type_name} Counts',
        f'{a_o_type_name} Extrapolated Inclusion Count',
        f'{a_o_type_name} Inclusions/Region Area (mm²)',
        f'{a_o_type_name} Inclusions/Region Volume (mm³)',
        f'{a_o_type_name} Average Area/Perimeter (μm)',
        f'{a_o_type_name} Average Circularity'
    ]

    with pd.ExcelWriter(output_file_overview) as writer:
        for calc in list_calculation_results:
            calc_clean = (
                calc.replace(f'{a_o_type_name} ', '')
                .replace('Inclusions/', 'Incl/')
                .replace('/', ' per ')
                .replace('Volume', 'Vol')
            )
            calc_dict[calc].sort_values('Merged area name', ascending=False, inplace=True)

            print(f'Overview dataframe with all {calc_clean} for {a_o_type_name} for all brains')
            display(calc_dict[calc])

            calc_dict[calc].to_excel(
                writer, sheet_name=calc_clean, index=False, float_format="%.3f"
            )
