# Analysis of Aiforia Dopaminergic TH Cell Detector 

## Part 0 - Outline
This code handles the automatic processing of raw data from mouse brains analyzed using the Aiforia model “Dopaminergic (TH) Cell Detector”. The data are stored in a local folder on the computer, specified within the code. We typically start with Excel files. To automatically change the format of multiple files, please refer to the notebook Change_Name_Format_Input_Data.ipynb.
This notebook is organized into three sections:

**1) Define Functions for Part 2 & 3**

**2) Automatic Analysis of N Slides (with Names Containing _S2) Across N Brains**

In this section, we automate the analysis of all N S2 slides corresponding to N brains (one slide per brain) contained in the folder with raw data. The approach is as follows:

1) All N filenames containing _S2 are collected from the folder and stored in a list.

2) For each slide (one per brain), the following steps are performed:
    a) Data analysis steps are executed.
    b) The results are exported to an Excel file specific to that brain.
    
After each loop iteration, the results for the current brain are added to an overview table containing the combined results for all brains. After the final loop, this overview table is also exported to an Excel file.

Note: In this section, we do not generate detailed “transposed” tables with information for each Substantia Nigra part.

**3) Automatic Analysis of N Slides (with Names Containing _S2) After Determining the (Un)Injected Areas**

In this section, we extend the analysis by identifying which brain regions are on the injected versus non-injected side. The comparison between both sides follows the same analysis workflow as in Section 2. Additionally, in this section we generate more detailed **transposed tables** containing information for each Substantia Nigra part for easier analysis.

## Part 1 - Define the necessary functions

### Part 1.1 - Load all necessary Python packages

In [None]:
import functools
import glob
import math
import os
import re

import numpy as np
import pandas as pd
from IPython.display import display

# Pandas display options
pd.options.display.float_format = '{:.2f}'.format

### Part 1.2 - Data Locations

**TO DO:**

Specify the following paths before running the analysis:

1) **Raw data format:** choose the file format of the raw data (e.g.: excel, csv).
2) **Some experimental parameters:** spacing between sections and section thickness.
3) **Raw data folder:** the folder containing the original data files exported from Aiforia.
4) **Results folders:** the folders where the Excel files with processed results will be saved.
5) **Region mapping files:** the location of the Excel files that specify which brain regions have to be replaced and were (un)injected.

Use the following format to define each path: <font color='darkred'>r'file_location'</font>

In [None]:
# Specify what data format you want to use for your raw data: excel, csv or feather. Do this by uncommenting the data_format that you want.
# data_format = 'excel'
data_format = 'csv'
# data_format = 'feather'

# Specify the experimental parameters (section_thickness in micrometers!!) 
spacing=5
section_thickness = 40

# Specify folder locations
folder_raw_data = r'C:\Users\...\Raw_data_TH_Cell'
folder_output_results = r'C:\Users\...\Output_Results_TH_Cell'
folder_output_results_injected = r'C:\Users\...\Output_Results_Injected_TH_Cell'
file_brainregions_to_replace =  r'C:\Users\...\Brainregions_To_Replace_TH_Cell.xlsx'
file_brainregions_injected =  r'C:\Users\...\Brainregions_Hemisphere_TH_Cell.xlsx'

In [None]:
# Create output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

### Part 1.3 – Function to Load All Image Files for Analysis

In [None]:
def load_all_file_locations_S2(folder_raw_data: str, data_format: str) -> list[str]:
    """
    Create a list of all file locations for S2 images in the specified folder. 
    It is assumed that only S2 images are present in the folder, so '_S2' does not need to be in the filename here.

    Parameters
    ----------
    folder_raw_data : str
        Path to the folder containing raw data files.
    data_format : str
        Format of the raw data files ('excel', 'csv', or 'feather').

    Returns
    -------
    all_raw_data_file_locations_S2: list[str]
        Sorted list of full file paths for S2 images.
    """
    
    # Determine the file pattern based on the data format
    if data_format == 'excel':
        pattern = "*.xlsx"
    elif data_format == 'csv':
        pattern = "*.csv"
    elif data_format == 'feather':
        pattern = "*.feather"
    else:
        raise ValueError(
            "Invalid data format specified. Please set 'data_format' to 'excel', 'csv', or 'feather'."
        )
    
    # Get all matching files and sort them
    all_raw_data_file_locations_S2 = glob.glob(os.path.join(folder_raw_data, pattern))
    all_raw_data_file_locations_S2.sort()
    
    # Print the locations
    print("The location of all raw data files:")
    for file_location in all_raw_data_file_locations_S2:
        print(f" - {file_location}")
    
    return all_raw_data_file_locations_S2


### Part 1.4 – Function to Load the Brain Region Correction File

In [None]:
def load_data_brainregions_to_replace(file_brainregions_to_replace: str) -> pd.DataFrame:
    """
    Load and clean the file containing corrections for brain regions that need to be replaced for each image.

    Parameters
    ----------
    file_brainregions_to_replace : str
        Path to the Excel file with columns: 'Image', 'Brainregion_Wrong', 'Brainregion_Correct'.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with brain regions to replace for each image. 
        All brain region names are stripped of spaces and converted to uppercase.
    """
    import pandas as pd
    import os

    if not os.path.exists(file_brainregions_to_replace):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_to_replace}")

    # Load the relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_to_replace,
        usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
        dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion_Wrong'] = df['Brainregion_Wrong'].str.upper().str.strip()
    df['Brainregion_Correct'] = df['Brainregion_Correct'].str.upper().str.strip()

    print("The modified table of brain regions to replace for each image:")
    display(df)
    
    return df


### Part 1.5 – Function to Load the File Indicating Which Brain Regions Were Injected

In [None]:
def load_data_brainregions_injected(file_brainregions_injected: str) -> pd.DataFrame:
    """
    Load and clean the file specifying which brain regions were on the injected side for each specific image.

    Parameters
    ----------
    file_brainregions_injected : str
        Path to the Excel file containing injected brain region information.

    Returns
    -------
    pd.DataFrame
        Cleaned DataFrame with columns:
        - 'Image': image identifier
        - 'Brainregion': uppercase, stripped brain region name
        - 'Parent_Injected': uppercase, stripped hemisphere 
        - 'Daughter1_Injected': uppercase, stripped hemisphere 

    """
    if not os.path.exists(file_brainregions_injected):
        raise FileNotFoundError(f"The specified file does not exist: {file_brainregions_injected}")

    # Load relevant columns from the Excel file
    df = pd.read_excel(
        file_brainregions_injected,
        usecols=['Image', 'Brainregion', 'Hemisphere'],
        dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
    )

    # Clean the data in place
    df['Image'] = df['Image'].str.strip()
    df['Brainregion'] = df['Brainregion'].str.upper().str.strip()
    df['Parent_Injected'] = df['Hemisphere'].str.upper().str.strip()
    df['Daughter1_Injected'] = df['Hemisphere'].str.upper().str.strip()

    # Drop the original 'Hemisphere' column
    df.drop(columns=['Hemisphere'], inplace=True)

    print("The modified table of injected brain regions for each image:")
    display(df)
    
    return df

### Part 1.6 – Function to Load a DataFrame and Clean It

In [None]:
def dataframe_cleaning(file_location: str, df_brainregions_to_replace: pd.DataFrame, data_format: str) -> pd.DataFrame:
    """
    Load a raw data file for a specific image and clean it using the brain regions replacement table.
    Outputs a cleaned dataframe with additional calculated values like circularity and area/perimeter ratio.

    Parameters
    ----------
    file_location : str
        Path to the raw data file for a single image.
    df_brainregions_to_replace : pd.DataFrame
        DataFrame specifying which brain regions need to be replaced for each image.
    data_format : str
        Format of the raw data file: 'excel', 'csv', or 'feather'.

    Returns
    -------
    pd.DataFrame
        Cleaned dataframe with additional calculated columns.
    """
    # ------------------------
    # Load the raw data
    # ------------------------
    columns = ['Image', 'Parent area name', 'Area/object name', 'Class label',
               'Area (μm²)', 'Class confidence (%)', 'Circumference (µm)']
    
    dtypes = {
        'Image': 'str',
        'Parent area name': 'str',
        'Area/object name': 'str',
        'Class label': 'str',
        'Area (μm²)': 'float64',
        'Class confidence (%)': 'float64',
        'Circumference (µm)': 'float64'
    }
    
    if data_format == 'excel':
        df = pd.read_excel(file_location, usecols=columns, dtype=dtypes, keep_default_na=True)
    elif data_format == 'csv':
        df = pd.read_csv(file_location, sep='\t', usecols=columns, dtype=dtypes, keep_default_na=True)
    elif data_format == 'feather':
        df = pd.read_feather(file_location)
        df = df.astype(dtypes)  # Ensure proper types
    else:
        raise ValueError("Invalid data format. Choose 'excel', 'csv', or 'feather'.")

    # ------------------------
    # Extract image name from file name
    # ------------------------
    image_name = os.path.splitext(os.path.basename(file_location))[0]
    print('The present image =', image_name)
    df['Image'] = image_name

    # ------------------------
    #  Delete the rows with an empty 'Parent area name', empty Area (μm²), empty Area/object name or empty Class label
    # ------------------------
    df.dropna(subset=['Parent area name', 'Area (μm²)', 'Area/object name', 'Class label'], inplace=True)

    # ------------------------
    #  Capitalize the columns to never make a mistake against capitalization
    # ------------------------
    df['Parent area name'] = df['Parent area name'].str.upper()
    df['Area/object name'] = df['Area/object name'].str.upper()
    df['Class label'] = df['Class label'].str.upper()

    print('The full raw data =')
    display(df)

    # ------------------------
    # Replace incorrect brain regions
    # ------------------------
    # Create the dictionary of brain regions that should be replaced for this specific image
    df_image_replacements = df_brainregions_to_replace[df_brainregions_to_replace['Image'] == image_name]
    dict_replace = pd.Series(
        df_image_replacements.Brainregion_Correct.values,
        index=df_image_replacements.Brainregion_Wrong
    ).to_dict()
    print(f"The dictionary of brain regions to replace for {image_name} is:", dict_replace)
    
    # Replace the value in the rows that have a Parent area name or an Area/object name that is in dict_replace
    df['Parent area name'] = df['Parent area name'].replace(dict_replace, regex=False)
    df['Area/object name'] = df['Area/object name'].replace(dict_replace, regex=False)

    # ------------------------
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the numbers are deleted from these columns:
    # ------------------------
    df['Parent area name merged'] = df['Parent area name'].str.replace(r'\d+$', '', regex=True).str.strip()
    df['Area/object name merged'] = df['Area/object name'].str.replace(r'\d+$', '', regex=True).str.strip()

    # ------------------------
    # Filter unwanted rows 
    # ------------------------
    # We delete the rows that 
    # - have an area > 450 and class label = TH Positive
    # - have an area  in [400, 450] or [81, 95] and confidence < 60 and class label = TH Positive
    # - have an area < 81 and class label = TH Positive
    # Side note: Pandas between function is inclusive
    mask_exclude = (
        ((df['Area (μm²)'] > 450) & (df['Class label'] == 'TH POSITIVE')) |
        ((df['Area (μm²)'].between(400, 450)) & (df['Class confidence (%)'] < 60) & (df['Class label'] == 'TH POSITIVE')) |
        ((df['Area (μm²)'].between(81, 95)) & (df['Class confidence (%)'] < 60) & (df['Class label'] == 'TH POSITIVE')) |
        ((df['Area (μm²)'] < 81) & (df['Class label'] == 'TH POSITIVE'))
    )
    df = df[~mask_exclude]

    # ------------------------
    # Derived metrics
    # ------------------------
    df['Area/Perimeter (μm)'] = df['Area (μm²)'] / df['Circumference (µm)']
    df['Circularity'] = (4 * math.pi * df['Area (μm²)']) / (df['Circumference (µm)'] ** 2)

    print('The new table with "Area/Perimeter (μm)" and "Circularity" =')
    display(df)

    return df


### Part 1.7 - Function to Create Hierarchical Dataframes


Note: This function is not used for the TH-Cell detector code, as the hierarchy does not extend to Daughter 3.

Hierarchy:

One type of **Parent:** Tissue parent detector for Nigra 1, 2, 3 ... \
One type of **Daughter 1:** Substantia Nigra 1, 2, 3, …  \
One type of **Daughter 2:** TH Positive 963, TH Positive 2111, …

In [None]:
def make_hierarchy() -> None:
    """Note: This function is not used for the TH-Cell detector code, as the hierarchy does not extend to Daughter 3."""
    pass

### Part 1.8 - Function to Calculate all Statistics for Total SUBSTANTIA NIGRA region (Ignoring Hemispheres)

In [None]:
def all_calculations_no_hemisphere(df: pd.DataFrame, spacing: float, section_thickness: float) -> pd.DataFrame:
    """
    Perform the main calculations (counts, total/average areas, shape metrics, derived density measures etc.) on the cleaned data, 
    ignoring the hemispheres. Output: dataframe with all calculations.

    Parameters
    ----------
    df : pandas.DataFrame
        Input dataframe containing at least the following columns:
        - 'Parent area name merged'
        - 'Area/object name merged'
        - 'Area (μm²)'
        - 'Area/Perimeter (μm)'
        - 'Circularity'

    spacing : float
    
    section_thickness : float

    Returns
    -------
    pandas.DataFrame
        A summary table indexed by merged region name, including:
        - Counts: Number of cells per region.
        - Total Region Area (μm²)
        - Total Cell Area (μm²)
        - Average Cell Area (μm²)
        - Average Area/Perimeter (μm)
        - Average Circularity
        - Extrapolated Cell Count
        - Cells/Region Area (mm²)
        - Cells/Region Volume (mm³)

    Notes
    -----
    - Requires global variables `spacing` and `section_thickness` to be defined.
    """

    groupby_column1 = 'Parent area name merged'
    groupby_column2 = 'Area/object name merged'

    # Helper to compute groupbay + aggregation + renaming in one go
    def group_stat(df, col, agg, target, new_name):
        out = (
            df.groupby(col, as_index=False)
            .agg({target: agg})
            .rename(columns={target: new_name})
        )
        out.rename(columns={col: 'Merged area name'}, inplace=True)
        return out

    # Calculations
    df_counts_merged = df[groupby_column1].value_counts(sort=True).rename_axis('Merged area name').reset_index(name='Counts')
    df_total_region_area_merged = group_stat(df, groupby_column2, 'sum', 'Area (μm²)', 'Total Region Area (μm²)')
    df_total_cell_area_merged   = group_stat(df, groupby_column1, 'sum', 'Area (μm²)', 'Total Cell Area (μm²)')
    df_average_cell_area_merged = group_stat(df, groupby_column1, 'mean', 'Area (μm²)', 'Average Cell Area (μm²)')
    df_avg_area_perimeter       = group_stat(df, groupby_column1, 'mean', 'Area/Perimeter (μm)', 'Average Area/Perimeter (μm)')
    df_avg_circularity          = group_stat(df, groupby_column1, 'mean', 'Circularity', 'Average Circularity')

    # Merge all calculated dataframes
    dfs = [
        df_counts_merged,
        df_total_region_area_merged,
        df_total_cell_area_merged,
        df_average_cell_area_merged,
        df_avg_area_perimeter,
        df_avg_circularity,
    ]
    df_all = functools.reduce(lambda l, r: pd.merge(l, r, on='Merged area name', how='outer'), dfs)

    # Derived metrics
    df_all['Extrapolated Cell Count']       = df_all['Counts'] * spacing
    df_all['Cells/Region Area (per μm²)']   = df_all['Counts'] / df_all['Total Region Area (μm²)']
    df_all['Cells/Region Volume (per μm³)'] = df_all['Cells/Region Area (per μm²)'] / section_thickness

    # Convert to mm²/mm³ and clean up
    df_all['Cells/Region Area (mm²)']   = df_all['Cells/Region Area (per μm²)'] * 1_000_000
    df_all['Cells/Region Volume (mm³)'] = df_all['Cells/Region Volume (per μm³)'] * 1_000_000_000

    df_all.drop(columns=['Cells/Region Area (per μm²)', 'Cells/Region Volume (per μm³)'], inplace=True)
    df_all.sort_values('Merged area name', inplace=True)

    return df_all


### Part 1.9 - Function to Calculate all Statistics for Eeach SUBSTANTIA NIGRA Subregions 1, 2, 3... X (With Hemispheres Info Added)

In [None]:
def all_calculations_hemisphere(df: pd.DataFrame, df_brainregions_injected: pd.DataFrame) -> pd.DataFrame:
    """
    Perform the main calculations (counts, total/average areas, shape metrics, derived density measures etc.) on the cleaned data, 
    considering the hemispheres. Output: dataframe with all calculations.

    Parameters
    ----------
    df : pandas.DataFrame
        Input dataframe containing at least the following columns:
        - 'Parent area name'
        - 'Area/object name'
        - 'Area (μm²)'
        - 'Area/Perimeter (μm)'
        - 'Circularity'
        - 'Image'

    df_brainregions_injected : pandas.DataFrame
        Dataframe containing 'Image' and 'Brainregion' columns with injected/uninjected info.

    Returns
    -------
    pandas.DataFrame
        Summary table with all calculated metrics per region and injected/uninjected status.
        Includes:
        - Counts
        - Average Cell Area (μm²)
        - Average Area/Perimeter (μm)
        - Average Circularity
        - Total Region Area (μm²)
    """

    groupby_column = 'Parent area name'

    # Filter out "TISSUE" rows for calculations
    df_filtered = df[~df[groupby_column].str.contains('TISSUE')]

    # Helper for mean aggregation
    def group_mean(df, target, new_name):
        out = (
            df
            .groupby(groupby_column)
            .mean(numeric_only=True)[target]
            .rename_axis(groupby_column)
            .reset_index(name=new_name)
        )
        return out
    
    # Counts
    df_counts = df_filtered.value_counts(groupby_column, sort=True).rename_axis(groupby_column).reset_index(name='Counts')

    # Average metrics
    df_avg_area = group_mean(df_filtered, 'Area (μm²)', 'Average Cell Area (μm²)')
    df_avg_perimeter = group_mean(df_filtered, 'Area/Perimeter (μm)', 'Average Area/Perimeter (μm)')
    df_avg_circularity = group_mean(df_filtered, 'Circularity', 'Average Circularity')

    # Region areas
    df_region_area = df[['Image', 'Area/object name', 'Area (μm²)']].copy()
    df_region_area.rename(columns={'Area/object name': groupby_column, 'Area (μm²)': 'Total Region Area (μm²)'}, inplace=True)
    df_region_area = df_region_area[df_region_area[groupby_column].str.contains('SUBSTANTIA NIGRA')]

    # Merge all calculated metrics
    dfs_to_merge = [df_counts, df_avg_area, df_avg_perimeter, df_avg_circularity, df_region_area]
    df_all_calcs = functools.reduce(lambda left, right: pd.merge(left, right, on=groupby_column, how='outer'), dfs_to_merge)

    # Add injected/uninjected information
    df_all_calcs_total = df_brainregions_injected.merge(
        df_all_calcs,
        left_on=['Image', 'Brainregion'],
        right_on=['Image', groupby_column],
        how='inner'
    )
    df_all_calcs_total.drop(columns=['Brainregion', 'Daughter1_Injected'], inplace=True)

    return df_all_calcs_total


### Part 1.10 - Function to Calculate all Statistics for SUBSTANTIA NIGRA UNINJECTED and SUBSTANTIA NIGRA INJECTED	

In [None]:
def all_calculations_hemisphere_aggregated(df: pd.DataFrame, df_brainregions_injected: pd.DataFrame, spacing: float, section_thickness: float) -> pd.DataFrame:
    """
    Compute metrics for SUBSTANTIA NIGRA UNINJECTED and SUBSTANTIA NIGRA INJECTED based on df and df_brainregions_injected.
    Output includes counts, area metrics, circularity, total region area, and derived density metrics.

    Parameters
    ----------
    df : pandas.DataFrame
        Input dataframe containing at least:
        - 'Parent area name'
        - 'Area/object name'
        - 'Area (μm²)'
        - 'Area/Perimeter (μm)'
        - 'Circularity'
        - 'Image'

    df_brainregions_injected : pandas.DataFrame
        Dataframe containing 'Image' and 'Brainregion' columns with injected/uninjected info.

    spacing : float

    section_thickness : float

    Returns
    -------
    pandas.DataFrame
        Summary table with injected/uninjected metrics:
        - Counts
        - Average Cell Area (μm²)
        - Average Area/Perimeter (μm)
        - Average Circularity
        - Total Region Area (μm²)
        - Extrapolated Cell Count
        - Cells/Region Area (mm²)
        - Cells/Region Volume (mm³)
    """

    # Fixed groupby columns
    parent_col = 'Parent_Injected'
    daughter_col = 'Daughter1_Injected'

    # Merge df with injected info for parent-level and object-level
    df_injected_parent = df_brainregions_injected.merge(
        df, left_on=['Image', 'Brainregion'], right_on=['Image', 'Parent area name'], how='inner'
    ).drop(columns=['Brainregion', 'Daughter1_Injected', 'Area/object name'])

    df_injected_object = df_brainregions_injected.merge(
        df, left_on=['Image', 'Brainregion'], right_on=['Image', 'Area/object name'], how='inner'
    ).drop(columns=['Brainregion', 'Parent_Injected'])

    # Helper for mean aggregation
    def group_mean(df, target, new_name):
        out = (
            df
            .groupby(parent_col)
            .mean(numeric_only=True)[target]
            .rename_axis(parent_col)
            .reset_index(name=new_name)
        )
        return out
    # Counts
    df_counts = df_injected_parent.value_counts(parent_col, sort=True).rename_axis(parent_col).reset_index(name='Counts')

    # Average metrics
    df_avg_area = group_mean(df_injected_parent, 'Area (μm²)', 'Average Cell Area (μm²)')
    df_avg_perimeter = group_mean(df_injected_parent, 'Area/Perimeter (μm)', 'Average Area/Perimeter (μm)')
    df_avg_circularity = group_mean(df_injected_parent, 'Circularity', 'Average Circularity')

    # Total region area
    df_region_area = (
        df_injected_object.groupby(daughter_col)['Area (μm²)']
        .sum()
        .rename_axis(parent_col)
        .reset_index(name='Total Region Area (μm²)')
    )

    # Merge all metrics
    dfs_to_merge = [df_counts, df_avg_area, df_avg_perimeter, df_avg_circularity, df_region_area]
    df_all = functools.reduce(lambda l, r: pd.merge(l, r, on=parent_col, how='outer'), dfs_to_merge)

    # Derived density metrics
    df_all['Extrapolated Cell Count'] = df_all['Counts'] * spacing
    df_all['Cells/Region Area (per μm²)'] = df_all['Counts'] / df_all['Total Region Area (μm²)']
    df_all['Cells/Region Volume (per μm³)'] = df_all['Cells/Region Area (per μm²)'] / section_thickness
    df_all['Cells/Region Area (mm²)'] = df_all['Cells/Region Area (per μm²)'] * 1_000_000
    df_all['Cells/Region Volume (mm³)'] = df_all['Cells/Region Volume (per μm³)'] * 1_000_000_000

    # Sort and select columns
    df_all.sort_values(by=parent_col, ascending=False, inplace=True)
    columns_order = [
        parent_col, 'Counts', 'Extrapolated Cell Count', 'Average Cell Area (μm²)',
        'Average Area/Perimeter (μm)', 'Average Circularity', 'Total Region Area (μm²)',
        'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)'
    ]
    df_all = df_all[columns_order]

    return df_all


### Part 1.11 - Function to Calculate all Transposed Statistics for Eeach SUBSTANTIA NIGRA Subregions 1, 2, 3... X	

In [None]:
def all_calculations_transposed(df1: pd.DataFrame, df2: pd.DataFrame, spacing: float) -> dict[str, pd.DataFrame]:
    """
    Create transposed tables for Eeach SUBSTANTIA NIGRA Subregions 1, 2, 3... X.	
    Calculates Counts, Total Region Area, Average Cell Area, and Extrapolated Cell Count.
    
    Parameters
    ----------
    df1 : pandas.DataFrame
        Dataframe containing at least:
        - 'Parent_Injected'
        - 'Parent area name'
        - 'Counts'
        - 'Total Region Area (μm²)'
        - 'Average Cell Area (μm²)'

    df2 : pandas.DataFrame
        Dataframe containing at least:
        - 'Parent_Injected'
        - 'Counts'
        - 'Total Region Area (μm²)'
        - 'Average Cell Area (μm²)'

    spacing : float

    Returns
    -------
    dict[str, pd.DataFrame]
        Dictionary with keys 'SUBSTANTIA NIGRA UNINJECTED' and 'SUBSTANTIA NIGRA INJECTED'.
        Values are transposed DataFrames with statistics.
    """

    filter_col = 'Parent_Injected'

    df_short1 = df1[['Parent_Injected', 'Parent area name', 'Counts', 'Total Region Area (μm²)', 'Average Cell Area (μm²)']]
    df_short2= df2[['Parent_Injected', 'Counts', 'Total Region Area (μm²)', 'Average Cell Area (μm²)']]

    dictionary_SN = {}

    for region in ['SUBSTANTIA NIGRA UNINJECTED', 'SUBSTANTIA NIGRA INJECTED']:
        # Extract whether region is UNINJECTED or INJECTED
        inj_uninj = region.split()[-1]

        # Filter and transpose
        df1_trans = df_short1[df_short1[filter_col] == region].transpose()
        df2_trans = df_short2[df_short2[filter_col] == region].transpose()

        # Concatenate along columns
        df_trans = pd.concat([df1_trans, df2_trans], axis=1)

        # Set column names based on second row: 'SUBSTANTIA NIGRA 1' and so forth
        df_trans.columns = df_trans.iloc[1]

        # Rename NaNs and simplify SUBSTANTIA NIGRA X names to SN X
        df_trans.rename(columns={np.nan: f'{inj_uninj} Total'}, inplace=True)
        df_trans.rename(columns=lambda x: re.sub(r'SUBSTANTIA NIGRA ', 'SN', x), inplace=True)

        # Drop unnecessary rows
        df_trans.drop([filter_col, 'Parent area name'], inplace=True)

        # Add extrapolated cell count
        df_trans[f'{inj_uninj} Extrapolated Cell Count'] = df_trans[f'{inj_uninj} Total'] * spacing

        dictionary_SN[region] = df_trans

    return dictionary_SN


## Part 2 - Automatic Analysis of all N S2 Slides of all N Brains 


In [None]:
%%time
# Measure the execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Get all file names containing '_S2' 
all_raw_data_file_locations_S2 = load_all_file_locations_S2(folder_raw_data, data_format)

# Initialize dictionary to store all overview dataframes
dictionary_overview_dataframes = {}

# Define which calculations to extract for overview tables
list_calculation_results = [
    'Counts', 'Extrapolated Cell Count', 
    'Average Cell Area (μm²)', 'Total Region Area (μm²)', 
    'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
    'Total Cell Area (μm²)', 
    'Average Area/Perimeter (μm)', 'Average Circularity'
]

# Define brain regions to exclude
brainregions_not_needed = ['TISSUE PARENT DETECTOR FOR NIGRA', 'TH POSITIVE']

# Loop over all S2 images in the raw_data folder
for count, file_location_S2 in enumerate(all_raw_data_file_locations_S2):

    print(f'\nAnalysis of {file_location_S2}')
    
    # Extract image name from file path
    image_name_S2 = os.path.splitext(os.path.basename(file_location_S2))[0]
     
    # Clean the S2 data
    df_S2_final = dataframe_cleaning(file_location_S2, df_brainregions_to_replace, data_format)

    # Perform calculations
    df_S2_all_calcs = all_calculations_no_hemisphere(df_S2_final, spacing, section_thickness)
    print(f'All calculations for {image_name_S2}')
    display(df_S2_all_calcs)

    # Save per-image results to Excel in the output folder specified at the beginning of this notebook
    output_file = os.path.join(folder_output_results, f'{image_name_S2}_Results.xlsx')
    df_S2_all_calcs.to_excel(output_file, sheet_name='Results', index=False, float_format="%.3f")

    # Prepare overview dataframes
    # The overview Excel file will have multiple sheets, each stored in dictionary_overview_dataframes.
    # dictionary_overview_dataframes = {Total Region area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }
  
    for calculation_result in list_calculation_results:
        df_calc = df_S2_all_calcs[['Merged area name', calculation_result]].copy()
        df_calc  = df_calc [~df_calc ['Merged area name'].isin(brainregions_not_needed)]
        df_calc .rename(columns={calculation_result: image_name_S2}, inplace=True)
    
        if count == 0:
            # Initialize dictionary entries on first loop
            dictionary_overview_dataframes[calculation_result] = df_calc
        else:
            # Merge subsequent results
            dictionary_overview_dataframes[calculation_result] = dictionary_overview_dataframes[calculation_result].merge(
                df_calc, how='outer', on='Merged area name'
            )

    # Clean up memory for next iteration
    del df_S2_final, df_S2_all_calcs
    

# Export overview tables to Excel
output_file_name_overview = os.path.join(folder_output_results, 'Overview_TH_Cells_Results.xlsx')
    
with pd.ExcelWriter(output_file_name_overview) as writer:
    for calculation_result in list_calculation_results:
        sheet_name = (
            calculation_result.replace('/', ' per ')
            .replace('Volume', 'Vol')
        )
        print(f'Overview dataframe with all {sheet_name} for all brains')
        display(dictionary_overview_dataframes[calculation_result])

        dictionary_overview_dataframes[calculation_result].to_excel(
            writer, sheet_name=sheet_name, index=False, float_format="%.3f"
        )

## Part 3 – Automatic Analysis of all N S2 Slides of all N Brains (Including Injected and Uninjected Hemispheres)

In [None]:
%%time
# Measure the execution time of this cell

# Load the file specifying brain regions to replace/delete for each image
df_brainregions_to_replace = load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the file specifying which brain regions belong to which hemisphere for each image
df_brainregions_injected = load_data_brainregions_injected(file_brainregions_injected)

# Get all file names containing '_S2' 
all_raw_data_file_locations_S2= load_all_file_locations_S2(folder_raw_data, data_format)

# Initialize dictionary to store all overview dataframes
dictionary_overview_dataframes_injected = {}

# Define which calculations to extract for overview tables
list_calculation_results = [
    'Extrapolated Cell Count', 'Average Cell Area (μm²)', 'Total Region Area (μm²)',
    'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
    'Average Area/Perimeter (μm)', 'Average Circularity'
]

# Loop over all S2 images in the raw_data folder
for count, file_location_S2 in enumerate(all_raw_data_file_locations_S2):
    
    print(f'\nAnalysis of {file_location_S2}')
    
    # Extract image name from file path
    image_name_S2 = os.path.splitext(os.path.basename(file_location_S2))[0]
    
    # Clean the S2 data
    df_S2_final = dataframe_cleaning(file_location_S2, df_brainregions_to_replace, data_format)

    # Remove unnecessary columns
    df_S2_final.drop(columns=['Parent area name merged', 'Area/object name merged'], inplace=True, errors='ignore')

    # Perform calculations
    df_S2_all_calcs = all_calculations_hemisphere(df_S2_final, df_brainregions_injected)
    df_S2_all_calcs_injected = all_calculations_hemisphere_aggregated(
        df_S2_final, df_brainregions_injected, spacing, section_thickness
    )
    dictionary_S2_SN = all_calculations_transposed(df_S2_all_calcs, df_S2_all_calcs_injected, spacing)
    df_counts_uninjected = dictionary_S2_SN['SUBSTANTIA NIGRA UNINJECTED']
    df_counts_injected   = dictionary_S2_SN['SUBSTANTIA NIGRA INJECTED']


    # Save per-image results to Excel in the output folder specified at the beginning of this notebook
    output_file = os.path.join(folder_output_results_injected, f'{image_name_S2}_Hemisphere_Results.xlsx')
    with pd.ExcelWriter(output_file) as writer:
        df_S2_all_calcs.to_excel(writer, sheet_name='All Areas Results', index=False, float_format="%.3f")
        df_counts_uninjected.to_excel(writer, sheet_name='Counts Horizontal', startrow=0,  index=True, float_format="%.3f")
        df_counts_injected.to_excel(writer,   sheet_name='Counts Horizontal', startrow=5,  index=True, float_format="%.3f")
        df_S2_all_calcs_injected.to_excel(writer, sheet_name='Injected Results', index=False, float_format="%.3f")
    
    ############################
    #   Prepare overview file  #
    ############################
    
    # For the overview files, we need to concatenate the counts horizontally. For this we change the SN column names 
    # to 'U1', 'U2, 'U3'.. and 'I1', 'I2', 'I3' because for some animals, SN7 can be on the injected side, 
    # for other animals it can be on the uninjected side.

    # Reset the column headers, such that the columns Sn14, SN7.. become the first row (with index 'Parent area name'), 
    # and the column headers just become 0, 1, 2, 3.
    # Also, remove the words 'UNINJECTED Extrapolated Cell Count' and 'UNINJECTED Extrapolated Cell Count' in the first row then
    df_counts_uninj_t = df_counts_uninjected.T.reset_index().T
    df_counts_inj_t   = df_counts_injected.T.reset_index().T

    df_counts_uninj_t.replace({'UNINJECTED Total': '', 'UNINJECTED Extrapolated Cell Count': ''}, inplace=True)
    df_counts_inj_t.replace({'INJECTED Total': '', 'INJECTED Extrapolated Cell Count': ''}, inplace=True)

    # Now give the column headers the desired name: 'U1', 'U2, 'U3'.. and 'I1', 'I2', 'I3'
    column_names_uninj = [f'U{x}' for x in range(1, len(df_counts_uninj_t.columns)-1)] + ['U_Total', 'U_Extrapolated']
    column_names_inj   = [f'I{x}' for x in range(1, len(df_counts_inj_t.columns)-1)] + ['I_Total', 'I_Extrapolated']
    df_counts_uninj_t.columns = column_names_uninj
    df_counts_inj_t.columns   = column_names_inj

    # Now concatenate the uninjected and injected data horizontally, and rename the indices so that the animal name is contained in them
    df_counts_horizontal = pd.concat([df_counts_uninj_t, df_counts_inj_t], axis=1)
    df_counts_horizontal = df_counts_horizontal.filter(items=['Parent area name', 'Counts', 'Total Region Area (μm²)'], axis=0)
    df_counts_horizontal.rename(index={
        'Parent area name': f'Parent area name {image_name_S2}',
        'Counts': f'Counts {image_name_S2}',
        'Total Region Area (μm²)': f'Total Region Area (μm²) {image_name_S2}'
    }, inplace=True)

    # Lastly, calculate the loss between the uninjected and injected counts and areas:
    a = pd.to_numeric(df_counts_horizontal['I_Total'], errors='coerce')
    b = pd.to_numeric(df_counts_horizontal['U_Total'], errors='coerce')
    df_counts_horizontal['Loss'] = 100 * (1 - a / b)

    print('Transposed injected and uninjected regions horizontally merged')
    display(df_counts_horizontal)


    # Prepare overview dataframes
    # The overview Excel file will have multiple sheets, each stored in dictionary_overview_dataframes_injected.
    # dictionary_overview_dataframes_injected = {Cell Count Overview : df, Average Cell Area (μm²): df, Total Region area: df, Extrapolated Cell Count:df, Cells/Area:df, .... }
    
    # In the first loop we initiate an empty overview dictionary that will be filled with dataframes. 
    if count == 0:
        dictionary_overview_dataframes_injected['Cell Count Overview'] = df_counts_horizontal
    else:
        dictionary_overview_dataframes_injected['Cell Count Overview'] = pd.concat(
            [dictionary_overview_dataframes_injected['Cell Count Overview'], df_counts_horizontal]
        )

     
    # Add calculation results to overview dictionary
    for calculation_result in list_calculation_results:
        df_calc = df_S2_all_calcs_injected[['Parent_Injected', calculation_result]].copy()
        df_calc.rename(columns={calculation_result: image_name_S2}, inplace=True)
        
        if count == 0:
            dictionary_overview_dataframes_injected[calculation_result] = df_calc
        else:
            dictionary_overview_dataframes_injected[calculation_result] = dictionary_overview_dataframes_injected[calculation_result].merge(
                df_calc, how='outer', on='Parent_Injected'
            )

    # Clean up memory for next iteration
    del df_S2_final, df_S2_all_calcs_injected, df_S2_all_calcs

                
# Start by rearranging the columns of the dataframe dictionary_overview_dataframes_injected['Cell Count Overview']: 
# first all Uninjected columns, then all Injected columns
df_overview = dictionary_overview_dataframes_injected['Cell Count Overview']
cols = df_overview.columns
cols_loss = ['Loss']
cols_U = sorted([x for x in cols if re.match(r'U\d+$', x)], key=lambda x: int(x[1:]))
cols_I = sorted([x for x in cols if re.match(r'I\d+$', x)], key=lambda x: int(x[1:]))
cols_ordered = cols_loss + cols_U + ['U_Total', 'U_Extrapolated'] + cols_I + ['I_Total', 'I_Extrapolated']
dictionary_overview_dataframes_injected['Cell Count Overview'] = df_overview[cols_ordered]

# Also make a new dataframe that contains only the loss column, and only for the rows that contain the Counts of each animal
df_loss_intensities = df_overview[df_overview.index.str.contains('Counts')]
dictionary_overview_dataframes_injected['Cell Count Loss'] = df_loss_intensities['Loss']


# Export overview tables to Excel
output_file_name_overview = os.path.join(folder_output_results_injected, 'Overview_TH_Cells_Hemisphere_Results.xlsx')

overview_sheets = ['Cell Count Loss', 'Cell Count Overview'] + list_calculation_results

with pd.ExcelWriter(output_file_name_overview) as writer:
    for sheet in overview_sheets:
        sheet_name = sheet.replace('/', ' per ').replace('Volume', 'Vol')
        if sheet in list_calculation_results:
            dictionary_overview_dataframes_injected[sheet].sort_values('Parent_Injected', ascending=False, inplace=True)

        print(f'Overview dataframe with all {sheet_name} for all brains')
        display(dictionary_overview_dataframes_injected[sheet])
        
        float_fmt = "%.4f" if sheet == 'Average Circularity' else "%.3f"
        dictionary_overview_dataframes_injected[sheet].to_excel(
            writer,
            sheet_name=sheet_name,
            index=(sheet in ['Cell Count Overview', 'Cell Count Loss']),  # Index is True or False depending on the sheet
            float_format=float_fmt
        )