# Analysis of Aiforia Microglia Cell Detector Output (IBA1)

## Part 0 - Outline
This code handles the automatic processing of raw data from mouse brains analyzed using the Aiforia model “Microglial (IBA1) Cell Detector”. The data are stored in a local folder on the computer, specified within the code. The supported file formats are CSV, TSV, Excel and feather. To automatically change the format of multiple files, please refer to the notebook Change_Name_Format_Input_Data.ipynb.
This notebook is organized into three sections:

**1) Define Functions for Part 2 & 3**

**2) Automatic Analysis of X * N Slides Across N Brains**
In this section, we automate the analysis of all X*N slides corresponding to N brains (X slides per brain, which is a parameter that the user can choose in the code) contained in the folder with raw data. The approach is as follows:

1) All X*N filenames are collected from the folder and stored in a list.

2) From this list, we extract the N filenames containing '_S1', corresponding to the first slide of each brain.

3) We loop over these N '_S1' slides and perform the following steps for each brain:

    a) We retrieve the second slide (with '_S2' in the filename) belonging to this brain. We do the same for the third ('_S3'), fourth ('_S4'), ..., X'th ('_SX')  slides of this brain.   
    b) We perform the analysis steps on each slide individually (S1, S2, …, SX) and on the combined dataset (S1+S2+…+SX).
    c) We export the results to an excel file for this specific brain.
    
After each iteration, the individual brain’s results are added to a summary table containing data for all brains. Once all brains are processed, this overview table is also exported to Excel.

**3) Automatic Analysis of X * N Slides Across N Brains After Determining the (Un)Injected Areas**

In this section, we extend the analysis by identifying which brain regions are on the injected versus non-injected side. The comparison between both sides follows the same analysis workflow as in Section 2.

## Part 1 - Define the necessary functions

### Part 1.1 - Load all necessary Python packages

In [None]:
import functools
import glob
import math
import os

import pandas as pd
from IPython.display import display

# Pandas display options
pd.options.display.float_format = '{:.2f}'.format

### Part 1.2 - Data Locations

**TO DO:**

Specify the following paths before running the analysis:

1) **Raw data format:** choose the file format of the raw data (csv, tsv, xlsx or feather).
2) **Some experimental parameters:** spacing between sections and section thickness.
3) **Raw data folder:** the folder containing the original data files exported from Aiforia.
4) **Results folders:** the folders where the Excel files with processed results will be saved.
5) **Region mapping files:** the location of the Excel files that specify which brain regions have to be replaced and were (un)injected.

Use the following format to define each path: <font color='darkred'>r'file_location'</font>

In [None]:
# Specify what data format you want to use for your raw data: csv, tsv, xlsx or feather. Do this by uncommenting the data_format that you want.
data_format = 'csv'
# data_format = 'tsv'
# data_format = 'xlsx'
# data_format = 'feather'

# Specify the maximum number of slides per brain. For example, if set to 4, filenames should include '_S1', '_S2', '_S3', and '_S4'. 
# If some brains have fewer slides, the code will generate empty data files for the missing ones to ensure proper execution.
amount_of_slides = 2

# Specify the experimental parameters (section_thickness in micrometers!!):
spacing=12
section_thickness = 40  

# Specify folder locations
folder_raw_data = r'C:\Users\...\Raw_data_IBA1'
folder_output_results = r'C:\Users\...\Output_Results_IBA1'
folder_output_results_injected = r'C:\Users\...\Output_Results_Injected_IBA1'
file_brainregions_to_replace =  r'C:\Users\...\Brainregions_To_Replace_IBA1.xlsx'
file_brainregions_injected =  r'C:\Users\...\Brainregions_Hemisphere_IBA1.xlsx'

In [None]:
# Create output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

# Create a list of expected slide suffixes. For example, if amount_of_slides = 4, then appendices_list = ['_S1', '_S2', '_S3', '_S4'].
appendices_list = [f"_S{i}" for i in range(1, amount_of_slides + 1)]

### Part 1.3 – Function to Load All Image Files for Analysis

In [None]:
def load_all_file_locations_S1(folder_raw_data: str, data_format: str) -> list[str]:
    """
    Retrieve and list all file locations for S1 images in the specified raw data folder.

    This function searches for all raw data files (based on the specified data format) 
    within the given folder and filters those containing '_S1' in their filename. 
    These represent the first slides (S1) of each brain. 
    If '_S1'/'_S2' naming is not used, '_S1' should be appended manually to filenames.

    Parameters
    ----------
    folder_raw_data : str
        Path to the folder containing raw data files.
    data_format : str
        Format of the raw data files ('csv', 'tsv', 'xlsx' or 'feather').

    Returns
    -------
    all_raw_data_file_locations_S1: list[str]
        Sorted list of full file paths for S1 images.
    """

    # Determine the file pattern based on the data format
    if data_format == 'csv':
        pattern = "*.csv"
    elif data_format == 'tsv':
        pattern = "*.tsv"
    elif data_format == 'xlsx':
        pattern = "*.xlsx"
    elif data_format == 'feather':
        pattern = "*.feather"
    else:
        raise ValueError(
            "Invalid data format specified. Please set 'data_format' to 'csv', 'tsv', 'xlsx' or 'feather'."
        )

    # Retrieve and sort all matching raw data files
    all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, pattern))
    all_raw_data_file_locations.sort()

    print("All raw data file locations:")
    for file_location in all_raw_data_file_locations:
        print(f" - {file_location}")

    # Filter filenames that contain '_S1' (first slides of each brain)
    all_raw_data_file_locations_S1 = [path for path in all_raw_data_file_locations if '_S1' in path]

    print("\nRaw data file locations for S1 images:")
    for file_location_S1 in all_raw_data_file_locations_S1:
        print(f" - {file_location_S1}")

    return all_raw_data_file_locations_S1

### Part 1.4 – Function to Load the Brain Region Correction File

In [None]:
def load_data_brainregions_to_replace(file_brainregions_to_replace: str) -> pd.DataFrame:
    """
    Load and clean the file containing corrections for brain regions that need to be replaced for each image.

    Parameters
    ----------
    file_brainregions_to_replace : str
        Path to the Excel file with columns: 'Image', 'Brainregion_Wrong', 'Brainregion_Correct'.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with brain regions to replace for each image. 
        All brain region names are stripped of spaces and converted to uppercase.
    """
    import pandas as pd
    import os

    if not os.path.exists(file_brainregions_to_replace):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_to_replace}")

    # Load the relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_to_replace,
        usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
        dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion_Wrong'] = df['Brainregion_Wrong'].str.upper().str.strip()
    df['Brainregion_Correct'] = df['Brainregion_Correct'].str.upper().str.strip()

    print("The modified table of brain regions to replace for each image:")
    display(df)
    
    return df


### Part 1.5 – Function to Load the File Indicating Which Brain Regions Were Injected

In [None]:
def load_data_brainregions_injected(file_brainregions_injected: str) -> pd.DataFrame:
    """
    Load and clean the file specifying which brain regions were on the injected side for each specific image.

    Parameters
    ----------
    file_brainregions_injected : str
        Path to the Excel file containing injected brain region information.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with columns:
        - 'Image': image identifier
        - 'Brainregion': uppercase, stripped brain region name
        - 'Parent_Injected': uppercase, stripped hemisphere 
        - 'Daughter1_Injected': uppercase, stripped hemisphere 

    """
    if not os.path.exists(file_brainregions_injected):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_injected}")

    # Load relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_injected,
        usecols=['Image', 'Brainregion', 'Hemisphere'],
        dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion'] = df['Brainregion'].str.upper().str.strip()
    df['Parent_Injected'] = df['Hemisphere'].str.upper().str.strip()
    df['Daughter1_Injected'] = df['Hemisphere'].str.upper().str.strip()

    # Drop the original 'Hemisphere' column
    df.drop(columns=['Hemisphere'], inplace=True)

    print("The modified table of injected brain regions for each image:")
    display(df)
    
    return df

### Part 1.6 – Function to Load a DataFrame and Clean It

In [None]:
def dataframe_cleaning(file_location: str, df_brainregions_to_replace: pd.DataFrame, data_format: str) -> pd.DataFrame:
    """
    Load a data file from the given location and clean it using a brain regions replacement table.
    If the file does not exist, an empty file is created with the correct columns.
    Replaces incorrect brain regions, merges names by removing numbers, filters based on area thresholds,
    and calculates derived metrics like Area/Perimeter and Circularity.
    Returns a cleaned dataframe.

    Parameters
    ----------
    file_location : str
        Path to the raw data file for a single image.
    df_brainregions_to_replace : pd.DataFrame
        DataFrame specifying which brain regions need to be replaced for each image.
    data_format : str
        Format of the raw data file: 'csv', 'tsv', 'xlsx' or 'feather'.

    Returns
    -------
    pd.DataFrame
        Cleaned dataframe with additional calculated columns.
    """

    # ------------------------
    # Define expected columns and dtypes
    # ------------------------
    columns = [
        'Image', 'Parent area name', 'Area/object name',
        'Class label', 'Area (μm²)', 'Circumference (µm)'
    ]
    dtypes = {
        'Image': 'str',
        'Parent area name': 'str',
        'Area/object name': 'str',
        'Class label': 'str',
        'Area (μm²)': 'float64',
        'Circumference (µm)': 'float64'
    }

    # ------------------------
    # Load or create the data file
    # ------------------------
    try:
        if data_format in ('csv', 'tsv'):
            df = pd.read_csv(file_location, sep='\t', usecols=columns, dtype=dtypes, index_col=False, keep_default_na=True)
        elif data_format == 'xlsx':
            df = pd.read_excel(file_location, usecols=columns, dtype=dtypes, index_col=False, keep_default_na=True)
        elif data_format == 'feather':
            df = pd.read_feather(file_location).astype(dtypes)
        else:
            raise ValueError("Invalid data format. Choose 'csv', 'tsv', 'xlsx' or 'feather'.")
    except FileNotFoundError:
        # If file doesn't exist, create an empty one and reload
        df_empty = pd.DataFrame(columns=columns)
        df_empty.reset_index(inplace=True)
        if data_format in ('csv', 'tsv'):
            df_empty.to_csv(file_location, sep='\t', index=False)
            df = pd.read_csv(file_location, sep='\t', usecols=columns, dtype=dtypes)
        elif data_format == 'xlsx':
            df_empty.to_excel(file_location, index=False)
            df = pd.read_excel(file_location, usecols=columns, dtype=dtypes)
        elif data_format == 'feather':
            df_empty.to_feather(file_location)
            df = pd.read_feather(file_location).astype(dtypes)
        print(f"\nA dataframe at location {file_location} did not exist, so an empty dataframe was created.")

    # ------------------------
    # Clean up and standardize column values
    # ------------------------
    # Remove rows with missing essential data
    df.dropna(subset=['Parent area name', 'Area (μm²)', 'Area/object name', 'Class label'], inplace=True)

    # Standardize text capitalization
    df['Parent area name'] = df['Parent area name'].str.upper()
    df['Area/object name'] = df['Area/object name'].str.upper()
    df['Class label'] = df['Class label'].str.upper()

    # ------------------------
    # Extract image name from file name
    # ------------------------
    image_name = os.path.splitext(os.path.basename(file_location))[0]
    print('The present image =', image_name)
    
    # Make sure the image name across the whole first Image column is correct
    df['Image'] = image_name

    print('The full raw data =')
    display(df)

    # ------------------------
    # Replace incorrect brain regions
    # ------------------------
    # Create the dictionary of brain regions that should be replaced for this specific image
    df_replacements = df_brainregions_to_replace[df_brainregions_to_replace['Image'] == image_name]
    dict_replace = pd.Series(
        df_replacements.Brainregion_Correct.values,
        index=df_replacements.Brainregion_Wrong
    ).to_dict()

    print(f"The dictionary of brain regions to replace for {image_name} is:", dict_replace)

    # Apply replacements in both parent and Area/object name columns
    df['Parent area name'] = df['Parent area name'].replace(dict_replace, regex=False)
    df['Area/object name'] = df['Area/object name'].replace(dict_replace, regex=False)

    # ------------------------
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the ending numbers are deleted from the original columns:
    # ------------------------
    df['Parent area name merged'] = df['Parent area name'].str.replace(r'\d+$', '', regex=True).str.strip()
    df['Area/object name merged'] = df['Area/object name'].str.replace(r'\d+$', '', regex=True).str.strip()

    # ------------------------
    # Filter unwanted rows
    # ------------------------
    # Remove rows with area < 45 for class label "IBA1 POSITIVE CELL"
    mask_exclude = (df['Area (μm²)'] < 45) & (df['Class label'] == 'IBA1 POSITIVE CELL')
    df = df[~mask_exclude]

    # Remove rows with placeholder EMPTY labels (if created during replacements)
    df = df[(df['Parent area name'] != 'EMPTY') & (df['Area/object name'] != 'EMPTY')]

    # ------------------------
    # Derived metrics
    # ------------------------
    df['Area/Perimeter (μm)'] = df['Area (μm²)'] / df['Circumference (µm)']
    df['Circularity'] = (4 * math.pi * df['Area (μm²)']) / (df['Circumference (µm)'] ** 2)

    print('The fully cleaned table with "Area/Perimeter (μm)" and "Circularity" =')
    display(df)

    return df


### Part 1.7 - Function to Create Hierarchical Dataframes


Note: This function is not used for the IBA-1 code, as the hierarchy does not extend to Daughter 3.

Hierarchy:

One type of **Parent:** TISSUE 1, 2, 3 ... \
Many types of **Daughter 1:** AMYGDALA 1, 2, 3, …, STRIATUM 1, 2, 3, …, ...  \
One type of **Daughter 2:** IBA1 POSITIVE CELL 60451, IBA1 POSITIVE CELL 354269, …

In [None]:
def make_hierarchy() -> None:
    """Note: This function is not used for the IBA-1 code, as the hierarchy does not extend to Daughter 3."""
    pass

### Part 1.8 - Function to Calculate all Statistics

In [None]:
def all_calculations(df1: pd.DataFrame, df2: pd.DataFrame, groupby_column1: str = 'Parent area name merged', groupby_column2: str = 'Area/object name merged') -> pd.DataFrame:
    """
    Compute key summary statistics (counts, areas, averages) for hierarchical regions
    based on two dataframes, with optional grouping columns.

    Parameters:
        df1 (pd.DataFrame): dataframe for calculations.
        df2 (pd.DataFrame): dataframe for calculations (can be same as df1 when injected/uninjected is disregarded).
        groupby_column1 (str): Column in df1 to group by.
        groupby_column2 (str): Column in df2 to group by.

    Returns:
        pd.DataFrame: Merged dataframe containing:
            - Counts
            - Total region area
            - Total cell area
            - Average cell area
            - Average Area/Perimeter
            - Average circularity
            - Extrapolated cell count
            - Percentage of IBA1 positive area
            - Cells per region area and volume (mm² and mm³)
    """

    # Count the number of rows for each parent area name merged 
    df_counts_merged = df1[groupby_column1].value_counts().rename_axis('Merged area name') \
                       .reset_index(name='Counts')

    # Count the total area of each Area/object name merged (e.g. Amygdala 1 + Amygdala 7 + ... area)
    df_total_region_area_merged = df2.groupby(groupby_column2, as_index=False)['Area (μm²)'] \
                                     .sum().rename(columns={'Area (μm²)': 'Total Region Area (μm²)',
                                                            groupby_column2: 'Merged area name'})

    # Calculate the total Area (μm²), average Area (μm²), average Area/Perimeter (μm), average Circularity  
    # of the cells belonging to each Parent area name merged
    agg_dict = {
        'Area (μm²)': ['sum', 'mean'],
        'Area/Perimeter (μm)': 'mean',
        'Circularity': 'mean'
    }
    df_stats = df1.groupby(groupby_column1, as_index=False).agg(agg_dict)
    df_stats.columns = ['Merged area name', 'Total Cell Area (μm²)', 'Average Cell Area (μm²)',
                        'Average Area/Perimeter (μm)', 'Average Circularity']

    # Merge all intermediate results
    dfs_to_merge = [df_counts_merged, df_total_region_area_merged, df_stats]
    df_all_calcs_merged = functools.reduce(
        lambda left, right: pd.merge(left, right, on='Merged area name', how='outer'),
        dfs_to_merge
    )

    # Additional derived calculations
    df_all_calcs_merged['Extrapolated Cell Count'] = df_all_calcs_merged['Counts'] * spacing
    df_all_calcs_merged['Percentage IBA1 Positive Area'] = 100 * df_all_calcs_merged['Total Cell Area (μm²)'] / \
                                                          df_all_calcs_merged['Total Region Area (μm²)']
    df_all_calcs_merged['Cells/Region Area (per μm²)'] = df_all_calcs_merged['Counts'] / \
                                                          df_all_calcs_merged['Total Region Area (μm²)']
    df_all_calcs_merged['Cells/Region Volume (per μm³)'] = df_all_calcs_merged['Cells/Region Area (per μm²)'] / section_thickness
    df_all_calcs_merged['Cells/Region Area (mm²)'] = df_all_calcs_merged['Cells/Region Area (per μm²)'] * 1e6
    df_all_calcs_merged['Cells/Region Volume (mm³)'] = df_all_calcs_merged['Cells/Region Volume (per μm³)'] * 1e9

    # Drop intermediate columns
    df_all_calcs_merged.drop(columns=['Cells/Region Area (per μm²)', 'Cells/Region Volume (per μm³)'], inplace=True)

    # Sort for readability
    df_all_calcs_merged.sort_values('Merged area name', inplace=True)

    print('The total Calculations of each group:')
    display(df_all_calcs_merged)

    return df_all_calcs_merged


## Part 2 - Automatic Analysis of all X*N Slides of all N Brains


In [None]:
%%time
# Measure execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Get all file names containing '_S1' (first images of all N brains)
all_raw_data_file_locations_S1 = load_all_file_locations_S1(folder_raw_data, data_format)

# Initialize dictionary to store all overview dataframes
dictionary_overview_dataframes = {}


# Loop over all S1 images in the raw_data folder
for count, file_location_S1 in enumerate(all_raw_data_file_locations_S1):

    print(f'\nAnalysis of {file_location_S1}')

    # Extract image name from file path
    image_name_S1 = os.path.splitext(os.path.basename(file_location_S1))[0]

    # Dictionaries to store cleaned dataframes and calculated results for all appendices
    dict_df_SX_final = {}              # e.g., {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_all_calcs_merged = {}   # e.g., {'_S1' : df_S1_all_calcs_merged, '_S2' : df_S2_all_calcs_merged, ..., '_SX' : df_SX_all_calcs_merged}

    # Loop over all appendices ('_S1', '_S2', ..., '_SX')
    for appendix in appendices_list:
        # Replace '_S1' in the file path with the current appendix
        file_location = file_location_S1.replace('_S1', appendix)

        # Clean the data using pre-defined function
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace, data_format)

        # Perform all calculations on the cleaned dataframe
        # The except part is for when we have made an empty dataframe because no file was available (will never be the case for appendix =_S1).
        try:
            dict_df_SX_all_calcs_merged[appendix] = all_calculations(dict_df_SX_final[appendix],
                                                                     dict_df_SX_final[appendix])
            print(f"All calculations completed for {appendix} of {file_location}")
            display(dict_df_SX_all_calcs_merged[appendix])
        except:
            pass

    # Concatenate all cleaned dataframes (S1+S2+...+SX)
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)

    # Perform calculations on the concatenated  S1 + S2 + ... + SX dataframe
    try:
        df_SX_all_calcs_concat = all_calculations(df_SX_final_concat, df_SX_final_concat)
        print(f'All calculations completed for concatenated S1+S2+...+SX of {file_location_S1}')
        display(df_SX_all_calcs_concat)
    except:
        pass

    # Save individual and concatenated results to Excel
    output_file_location_SX = os.path.join(
        folder_output_results,
        image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    )

    with pd.ExcelWriter(output_file_location_SX) as writer:
        for appendix in appendices_list:
            try:
                dict_df_SX_all_calcs_merged[appendix].to_excel(
                    writer, sheet_name=appendix[1:] + '_Results', index=False, float_format="%.3f"
                )
            except:
                pass  # No SX dataframe was available, and the empty one would lead to errors in the try clause

        df_SX_all_calcs_concat.to_excel(writer, sheet_name='SX_Combined_Results', index=False, float_format="%.3f")


    # Prepare overview Excel file, for which only the df_SX_all_calcs_concat dataframe is needed. 
    # Define calculation columns for overview
    list_calculation_results = [
        'Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count',
        'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
        'Average Area/Perimeter (μm)', 'Average Circularity',
        'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
    ]

    # Exclude unnecessary brain regions
    brainregions_not_needed = ['IBA1 POSITIVE CELL', 'TISSUE']

    # Prepare overview dataframes
    # We will make 1 overview excelfile with a few tabpages that we store in dictionary_overview_dataframes:
    # dictionary_overview_dataframes = {Total Region area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }
    for calculation_result in list_calculation_results:
        df_calc = df_SX_all_calcs_concat[['Merged area name', calculation_result]].copy()
        df_calc = df_calc[~df_calc['Merged area name'].isin(brainregions_not_needed)]
        df_calc.rename(columns={calculation_result: image_name_S1.replace('_S1', '').replace('_IBA1', '')}, inplace=True)

        # Merge data into overview dictionary
        if count == 0:
            dictionary_overview_dataframes[calculation_result] = df_calc.copy()
        else:
            dictionary_overview_dataframes[calculation_result] = dictionary_overview_dataframes[calculation_result].merge(
                df_calc, how='outer', on='Merged area name'
            )

    # Clean up memory for next iteration
    del dict_df_SX_final, df_SX_final_concat, df_SX_all_calcs_concat

# Save overview Excel file
output_file_name_overview = os.path.join(folder_output_results, 'Overview_IBA1_Results.xlsx')

with pd.ExcelWriter(output_file_name_overview) as writer:
    for calculation_result in list_calculation_results:
        sheet_name_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')

        print(f'Overview dataframe with all {sheet_name_clean} for all brains')
        display(dictionary_overview_dataframes[calculation_result])
        
        dictionary_overview_dataframes[calculation_result].to_excel(
            writer, sheet_name=sheet_name_clean, index=False, float_format="%.3f"
        )

## Part 3 – Automatic Analysis of all N S1 Slides of all N Brains (Including Injected and Uninjected Hemispheres)

In [None]:
%%time
# Measure the execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the file specifying which brain regions belong to which hemisphere for each image
df_brainregions_injected = load_data_brainregions_injected(file_brainregions_injected)

# Get all file names containing '_S1'
all_raw_data_file_locations_S1 = load_all_file_locations_S1(folder_raw_data, data_format)

# Initialize dictionary to store all overview dataframes for injected analysis
dictionary_overview_dataframes_injected = {}

# Loop over all S1 images in the raw_data folder
for count, file_location_S1 in enumerate(all_raw_data_file_locations_S1):

    print(f'\nAnalysis of {file_location_S1}')

    # Extract image name from file path
    image_name_S1 = os.path.splitext(os.path.basename(file_location_S1))[0]

    # Dictionaries to store cleaned dataframes for all appendices
    dict_df_SX_final = {}  # e.g., {'_S1': df_S1_final, '_S2': df_S2_final, ..., '_SX': df_SX_final}

    # Loop over all appendices ('_S1', '_S2', ..., '_SX')
    for appendix in appendices_list:
        file_location = file_location_S1.replace('_S1', appendix)
        # Clean the data using pre-defined function
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace, data_format)

    # Concatenate all cleaned dataframes (S1+S2+...+SX)
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)

    # Merge with hemisphere injected information. 
    # For each row, determine whether the Parent area name and Area/object name were injected or uninjected
    # Brain regions not relevant to the injected analysis are removed through the inner join
    df_SX_injected_parent = df_SX_final_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Parent area name'], right_on=['Image', 'Brainregion'], how='inner'
    )
    df_SX_injected_object = df_SX_final_concat.merge(
        df_brainregions_injected, left_on=['Image', 'Area/object name'], right_on=['Image', 'Brainregion'], how='inner'
    )

    # Perform all calculations for injected data.
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for S1+S2 +...SX).
    try:
        df_SX_all_calcs_injected = all_calculations(
            df_SX_injected_parent, df_SX_injected_object,
            groupby_column1='Parent_Injected', groupby_column2='Daughter1_Injected'
        )
        df_SX_all_calcs_injected.sort_values('Merged area name', ascending=False, inplace=True)
        print(f'All calculations completed for concatenated SX INJECTED of {file_location_S1}')
        display(df_SX_all_calcs_injected)
    except:
        pass

    # Save individual results to Excel
    output_file_location_SX = os.path.join(
        folder_output_results_injected,
        image_name_S1.replace('_S1', '_SX') + '_Hemisphere_Results.xlsx'
    )

    with pd.ExcelWriter(output_file_location_SX) as writer:
        df_SX_all_calcs_injected.to_excel(writer, sheet_name='SX_Hemisphere_Results', index=False, float_format="%.3f")

    # Define calculation columns for overview
    list_calculation_results = [
        'Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count',
        'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
        'Average Area/Perimeter (μm)', 'Average Circularity',
        'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
    ]

    # Exclude unnecessary brain regions
    brainregions_not_needed = ['IBA1 POSITIVE CELL', 'TISSUE']

    # Prepare overview dataframes
    # We will make 1 overview excelfile with a few tabpages that we store in dictionary_overview_dataframes_injected:
    # dictionary_overview_dataframes_injected = {Total Region area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }
    for calculation_result in list_calculation_results:
        df_calc = df_SX_all_calcs_injected[['Merged area name', calculation_result]].copy()
        df_calc = df_calc[~df_calc['Merged area name'].isin(brainregions_not_needed)]
        df_calc.rename(columns={calculation_result: image_name_S1.replace('_S1', '').replace('_IBA1', '')}, inplace=True)

        # Merge data into overview dictionary
        if count == 0:
            dictionary_overview_dataframes_injected[calculation_result] = df_calc.copy()
        else:
            dictionary_overview_dataframes_injected[calculation_result] = dictionary_overview_dataframes_injected[calculation_result].merge(
                df_calc, how='outer', on='Merged area name'
            )

    # Clean up memory for next iteration
    del dict_df_SX_final, df_SX_final_concat, df_SX_all_calcs_injected

# Save overview Excel file
output_file_name_overview = os.path.join(folder_output_results_injected, 'Overview_IBA1_Hemisphere_Results.xlsx')

with pd.ExcelWriter(output_file_name_overview) as writer:
    for calculation_result in list_calculation_results:
        sheet_name_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')
        dictionary_overview_dataframes_injected[calculation_result].sort_values('Merged area name', ascending=False, inplace=True)
        
        print(f'Overview dataframe with all {sheet_name_clean} for all brains')
        display(dictionary_overview_dataframes_injected[calculation_result])
        
        dictionary_overview_dataframes_injected[calculation_result].to_excel(writer, sheet_name=sheet_name_clean, index=False, float_format="%.3f")
