# Analysis of Phosphorylated Synuclein Detector (pSynD) model

## 0. Outline
This code deals with the automatic processing of raw data from mouse brains analysed with “Phosphorylated Synuclein Detector” model developed in Aiforia® Create. We typically start from Excel/CSV files collected in a local folder on the computer that is specified in the code. To automatically change the format of a series of files, refer to the Change_Name_Format_Input_Data.ipynb notebook. The code is developed to take into account that a mouse brain can be mounted over several slides. Slides for each animal are named identically, except for a numeric postfix denoting the slide number: '_S1', '_S2', etc.

The present notebook is divided into 3 sections:

**1) Make the necessary functions for part 2 and part 3**


**2) Automatic analysis of X * N Slides of N Brains**

Here we automate the analysis of all X*N slide images of all N brains (X slides per brain, which is a parameter that the user can choose) in the folder with raw data. The approach is as follows:

1) We collect all the X*N names of the raw data files in the folder and put them in a list.

2) We make a list containing only the N filenames with an '_S1' in the name. These are the N first slides of the N brains.

3) We loop over these N first slide images belonging to the N brains and perform the following steps in each loop:

    a) We retrieve the second slide (containing an '_S2' in the filename of the raw data) belonging to this specific brain. We do the same for the third ('_S3'), fourth ('_S4'), ..., X'th ('_SX')  slides of the specific brain.   
    b) We perform the data analysis steps on the S1, S2, ..., SX slides separately, and also on the concatenated data of S1+S2+...+SX.     
    c) We output the results to an excel file for this specific brain.
    
After each loop, we add the output of this specific brain to an overview table that will contain all results for all brains. After the last loop, this overview table is also exported to an excel file.

**3) Automatic analysis of X * N Slides of N Brains after determining to which hemisphere they belong**

Here we add which brain regions are on each hemisphere, and compare the injected vs uninjected sides. Analysis occurs similar to section 2.  

## Part 1 - Make the necessary functions


### Part 1.1 - Load all necessary Python packages

In [None]:
# Import the required Python packages
import pandas as pd                                # For data analysis with dataframes
import math                                        # To get the value for pi
import functools                                   # For higher-order functions that work on other functions
from IPython.display import display                # Enables the display of more than one dataframe per code cell
import numpy as np                                 # For data analysis
import glob                                        # To get all raw data file locations
import os                                          # To get all raw data file locations
pd.options.display.float_format = '{:.2f}'.format  # Display all numbers in dataframes with 2 decimals


### Part 1.2 - Data locations

**TO DO:** 
- Specify the format of the raw data and the raw data folder location, as well as some experimental parameters.
- Specify the file paths of the excel file containing your quality control revisions and the excel file mapping each brain region to a hemisphere.
- Specify the folder locations where you would like to collect the output excel files (for whole brain and hemisphere analysis).  

The format is: <font color='darkred'>r'file_location'</font> 

In [None]:
# Specify what data format you want to use for your raw data: excel, csv or feather. Do this by uncommenting the data_format that you want.
data_format = 'csv'
# data_format = 'excel'
# data_format = 'feather'

# Specify the maximal amount of slides you have per animal brain. If this for instance is 4, we expect filenames containing '_S1', '_S2', '_S3' and '_S4'.
# If some animal brains have less slides, no problem. The code will create empty data files for the missing slides so it can run properly.
amount_of_slides = 2

# Specify the experimental parameters (section_thickness in micrometers) and locations:
# The spacing parameter refers to the serial section spacing interval. It's the interval at which you sample the brain volume for analysis, not the physical distance between each section. For example, if you have a spacing parameter of 10, you would take every 10th section for your analysis.  
spacing=5
section_thickness = 40
folder_raw_data = r'C:\Users\...\Raw_Data_pSynD'
file_brainregions_to_replace =  r'C:\Users\...\Brainregions_To_Replace_pSynD.xlsx'
file_brainregions_injected =  r'C:\Users\...\Brainregions_Hemisphere_pSynD.xlsx'
folder_output_results = r'C:\Users\...\Results_Wholebrain_pSynD'
folder_output_results_injected = r'C:\Users\...\Results_Hemisphere_pSynD'


In [None]:
# Make the output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

# Make the list of filename appendices that are expected. For instance if amount_of_slides = 4, then appendices_list = ['_S1', '_S2', '_S3', '_S4']
appendices_list = [f"_S{i}" for i in range(1, amount_of_slides + 1)]

### Part 1.3 - Function to load all image files that need to be analyzed

In [None]:
def load_all_file_locations_S1(folder_raw_data):
    """
    Make a list of all file locations for S1 images present in the folder with all raw data files.
    It's thus important that the filenames contain '_S1' in their name, even if there is no '_S2' counterparty. 
    If you don't work with '_S1' and '_S2', then just append '_S1' to the filenames to make the code work.
    Output: list of all file locations for S1 images.
    """
    
    if data_format == 'csv':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.csv"))
    elif data_format == 'excel':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.xlsx"))
    elif data_format == 'feather':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.feather"))
    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')
        
    all_raw_data_file_locations.sort()

    print('The location of all the raw data files = ')
    for file_location in all_raw_data_file_locations:
        print(file_location)

    # Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
    all_raw_data_file_locations_S1= [x for x in all_raw_data_file_locations if '_S1' in x]
    print('\nThe location of all the raw S1 data files = ')
    for file_location_S1 in all_raw_data_file_locations_S1:
        print(file_location_S1)
        
    return all_raw_data_file_locations_S1

### Part 1.4 - Function to load the file with corrections for the brainregions

In [None]:
def load_data_brainregions_to_replace(file_brainregions_to_replace):
    """
    Load the file containing the corrections for brain regions that need to be replaced for each specific image.
    Output: cleaned dataframe with brain regions that need to be replaced for each image.
    """
    
    df_brainregions_to_replace_raw=pd.read_excel(file_brainregions_to_replace,
                                                 usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
                                                 dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
                                                )

    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_to_replace=df_brainregions_to_replace_raw.copy()
    df_brainregions_to_replace['Image'] = df_brainregions_to_replace_raw['Image'].str.strip()
    df_brainregions_to_replace['Brainregion_Wrong'] = df_brainregions_to_replace_raw['Brainregion_Wrong'].str.upper().str.strip()
    df_brainregions_to_replace['Brainregion_Correct'] = df_brainregions_to_replace_raw['Brainregion_Correct'].str.upper().str.strip()

    #     print('The raw table of the brain regions to replace for each image = ')
    #     display(df_brainregions_to_replace_raw)

    print('The modified table of the brain regions to replace for each image = ')
    display(df_brainregions_to_replace)
    
    return df_brainregions_to_replace

### Part 1.5 - Function to load the file with which brainregions were injected


In [None]:
def load_data_brainregions_injected(file_brainregions_injected):
    """
    Load the file specifying which brainregions were on the injected side for each specific image.
    Output: cleaned dataframe with brain regions that were injected for each image.
    """
    
    df_brainregions_injected_raw=pd.read_excel(file_brainregions_injected,
                                               usecols=['Image', 'Brainregion', 'Hemisphere'],
                                               dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
                                               )
    
    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_injected=df_brainregions_injected_raw.copy()
    df_brainregions_injected['Image'] = df_brainregions_injected_raw['Image'].str.strip()
    df_brainregions_injected['Brainregion'] = df_brainregions_injected_raw['Brainregion'].str.upper().str.strip()
    df_brainregions_injected['Daughter1_Injected'] = df_brainregions_injected_raw['Hemisphere'].str.upper().str.strip()
    df_brainregions_injected.drop(columns=['Hemisphere'], inplace=True)

    #     print('The raw table of the brain regions injected for each image = ')
    #     display(df_brainregions_injected_raw)

    print('The modified table of the brain regions injected for each image = ')
    display(df_brainregions_injected)
    
    return df_brainregions_injected

### Part 1.6 - Function to load dataframe and clean it


In [None]:
def dataframe_cleaning(file_location, df_brainregions_to_replace):
    """
    Load the specific file location in a dataframe and clean it with df_brainregions_to_replace.
    Output: loaded and cleaned dataframe with some additional calculated values.
    """

    if data_format == 'csv':
        # Check if the  file exists. If not, we make an empty CSV file with the right columns:
        try:
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_csv(file_location, sep='\t')
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')

            
    elif data_format == 'excel':
        # Check if the  file exists. If not, we make an empty excel file with the right columns:
        try:
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_excel(file_location)
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')
            
    elif data_format == 'feather':
        # Check if the  file exists. If not, we make an empty feather file with the right columns:
        try:
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64', 'Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Class confidence (%)', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_feather(file_location)
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Class confidence (%)': 'float64', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')

    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')

    # Delete the rows with an empty  Area (μm²) or Area/object name 
    df_1.dropna(subset =['Area (μm²)', 'Area/object name'] , how='any', inplace=True)
    
    # Put all columns in capitals to never make mistakes against capitalization
    df_1['Parent area name'] = df_1['Parent area name'].str.upper()
    df_1['Area/object name'] = df_1['Area/object name'].str.upper()
    df_1['Class label']      = df_1['Class label'].str.upper()

    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty, 
    # and getting from filename makes more sense anyway) and change some field based on the recipe. 
    full_name = os.path.basename(file_location)
    file_name = os.path.splitext(full_name)
    image_name = file_name[0]
    print('The present image=', image_name)

    # Make sure the image name across the whole first column is correct
    df_1['Image']=image_name
    
    print('The full raw data=')
    display(df_1)

    # Determine the dictionary of brain regions that should be replaced for this specific image
    df_brainregions_to_replace = df_brainregions_to_replace[df_brainregions_to_replace['Image']==image_name]
    dict_brainregions_to_replace= pd.Series(df_brainregions_to_replace.Brainregion_Correct.values, index=df_brainregions_to_replace.Brainregion_Wrong).to_dict()

    print('The dictionary of brain regions to replace for this specific image', image_name, 'is', dict_brainregions_to_replace)

    # Replace the value in the rows that have a Parent area name or Area/object name that is in list_brainregions_replace
    df_2=df_1.copy()
    df_2['Parent area name'] = df_1['Parent area name'].replace(dict_brainregions_to_replace, regex=False)
    df_2['Area/object name'] = df_1['Area/object name'].replace(dict_brainregions_to_replace, regex=False)
    
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the numbers are deleted from these columns:
    df_2['Parent area name merged'] = df_2['Parent area name'].str.replace('\\d+', '', regex=True).str.strip()
    df_2['Area/object name merged'] = df_2['Area/object name'].str.replace('\\d+', '', regex=True).str.strip()

    # The rows in which we put a parent empty, can have an area/object name that itself occurs as parent and that should also be deleted
    # (basically the daugher 3 of the parent should also be deleted)
    df_empty_parent = df_2[df_2['Parent area name']=='EMPTY']
    list_of_area_objects_that_should_be_empty = df_empty_parent['Area/object name'].to_list()
    print('list_of_area_objects_that_should_be_empty = ', list_of_area_objects_that_should_be_empty)
    df_2.loc[df_2["Parent area name"].isin(list_of_area_objects_that_should_be_empty), "Parent area name"] = "EMPTY"

    # Delete the rows in which we just made the Parent area name or Area/object name 'EMPTY' by replacing them with the dictionary
    df_3 = df_2[(df_2['Parent area name']!='EMPTY') &  (df_2['Area/object name']!='EMPTY')]


    # We delete the rows that have 'Area/object name merged' ==  'A | CELLULAR DIFFUSE INCLUSION' or 'O | CELLULAR DIFFUSE INCLUSION' and
    # have an area > 350 and area < 25. We also delete the A | NEURITIC SEEDED INCLUSION rows with area smaller than 2, because that is dirt on the sample
    # Side note: Pandas between function is inclusive
    df_3x = df_3[ ~( (df_3['Area (μm²)'] > 600) & (df_3['Area/object name merged'].str.contains('CELLULAR DIFFUSE INCLUSION')) )
                  &
                  ~( (df_3['Area (μm²)'] <  25) & (df_3['Area/object name merged'].str.contains('CELLULAR DIFFUSE INCLUSION')) )
                  &
                  ~( (df_3['Area (μm²)'] <  4 ) & (df_3['Area/object name merged'].str.contains('A | NEURITIC SEEDED INCLUSION')) )
                  &
                  ~( (df_3['Area (μm²)'] <  15) & (df_3['Area/object name merged'].str.contains('CELLULAR SEEDED INCLUSION')) )
                ]

    # Calculate the Area/Perimeter (μm) and the circularity
    df_4 = df_3x.copy()
    df_4['Area/Perimeter (μm)'] = df_3x['Area (μm²)']/df_3x['Circumference (µm)']
    df_4['Circularity'] = (4 * math.pi * df_3x['Area (μm²)'])/ (df_3x['Circumference (µm)'])**2

    # Show the full updated dataframe:
    print('The fully cleaned table with "Area/Perimeter", "Circularity" and so on:')
    display(df_4)
    
    return df_4


### Part 1.7 - Function to make hierarchical dataframes


In [None]:
# The hierarchy: We have 1 type of parent (BRAIN TISSUE 1, 2, 3 etc) with many types of daughter 1 (Amygdala 1, 2, 3 etc,  Striatum 1, 2, 3 etc) 
# and 3 types of daughter 2 (A | CELLULAR DIFFUSE INCLUSION X, A | CELLULAR SEEDED INCLUSION X, A | NEURITIC SEEDED INCLUSION)
# and 2 types of daughter 3 (O | CELLULAR DIFFUSE INCLUSION X, O | CELLULAR SEEDED INCLUSIONS X)
# where X is a number specifying the specific object

def make_hierarchy(df):
    """ 
    Here we make the hierarchical structure of the data in the dataframe more clear. 
    The field 'Parent area name' is always the parent of the 'Area/object name' in the same row. 
    The area in the row always belongs to the 'Area/object name'.
    Output: four dataframes in which gradually more hierarchy is added.
    """

    # The rows with the top parent (= BRAIN TISSUE X) are the rows that don't have an own Parent area name
    df_parent_almost = df[df['Parent area name'].isna()]
    dict_parent={'Area/object name':'Parent name', 'Area/object name merged': 'Parent name merged', 'Area (μm²)': 'Area Parent (μm²)',
                'Area/Perimeter (μm)': 'Area/Perimeter Parent (μm)', 'Circularity': 'Circularity Parent'}
    df_parent=df_parent_almost.rename(columns=dict_parent)
    df_parent.drop(columns=['Parent area name', 'Class label', 'Parent area name merged','Area/Perimeter Parent (μm)', 'Circularity Parent' ], inplace=True)

    # Then we add the first daughter = the daughter of the top parents
    df_parent_daughter1_almost=df_parent.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)']], left_on='Parent name', right_on='Parent area name', how='inner')
    dict_daughter1 = {'Parent area name': 'Parent name copy', 'Area/object name':'Daughter1', 
                      'Area/object name merged': 'Daughter1 merged', 'Area (μm²)': 'Area Daughter1 (μm²)'}

    df_parent_daughter1_almost2=df_parent_daughter1_almost.rename(columns=dict_daughter1)
    # Groupby is needed because there now can be for instance 2 Striatum 4's 
    # (one of them originated from e.g. changing Amygdala 1 to Striatum 4 in the brainregion corrections)
    # We need to turn this Striatum 4 into a unique row because otherwise we will double in the next join when making df_parent_daughter2
    df_parent_daughter1=df_parent_daughter1_almost2.groupby(['Daughter1'], as_index=False).agg(
        {'Image': 'first', 'Parent name': 'first', 'Area Parent (μm²)': 'first',
         'Parent name merged': 'first', 'Parent name copy': 'first', 'Daughter1': 'first',
         'Daughter1 merged': 'first', 'Area Daughter1 (μm²)': 'sum'})

    # Then we add the second daughter = the daughter of daughter 1
    df_parent_daughter2_almost=df_parent_daughter1.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)', 'Area/Perimeter (μm)' , 'Circularity' ]], left_on='Daughter1', right_on='Parent area name', how='inner')
    dict_daughter2 = {'Parent area name': 'Daughter1 copy','Area/object name':'Daughter2', 
                      'Area/object name merged': 'Daughter2 merged', 'Area (μm²)': 'Area Daughter2 (μm²)',
                      'Area/Perimeter (μm)': 'Area/Perimeter Daughter2 (μm)', 'Circularity': 'Circularity Daughter2'}
    df_parent_daughter2=df_parent_daughter2_almost.rename(columns=dict_daughter2)

    # Then we add the third daughter = the daughter of daughter 2
    df_parent_daughter3_almost=df_parent_daughter2.merge(df[['Parent area name', 'Area/object name','Area/object name merged', 'Area (μm²)', 'Area/Perimeter (μm)' , 'Circularity' ]], left_on='Daughter2', right_on='Parent area name', how='inner')
    dict_daughter3 = {'Parent area name': 'Daughter2 copy','Area/object name':'Daughter3', 
                      'Area/object name merged': 'Daughter3 merged', 'Area (μm²)': 'Area Daughter3 (μm²)',
                      'Area/Perimeter (μm)': 'Area/Perimeter Daughter3 (μm)', 'Circularity': 'Circularity Daughter3'}
    df_parent_daughter3=df_parent_daughter3_almost.rename(columns=dict_daughter3)

#     print('Original df')
#     display(df)
#     print('df_parent')
#     display(df_parent)
#     print('df_parent_daughter1')
#     display(df_parent_daughter1)
#     print('df_parent_daughter2')
#     display(df_parent_daughter2)
#     print('df_parent_daughter3')
#     display(df_parent_daughter3)
    

    return df_parent, df_parent_daughter1, df_parent_daughter2, df_parent_daughter3

### Part 1.8 - Function to calculate all information for all daughter 1's, but subdivided in the three A types that occur


In [None]:
def all_calculations_per_a_type(df1, df2, groupby_column1='Area/object name merged', groupby_column2='Daughter1 merged'):
    """
    Make the main calculations for all A-type inclusions, based on two dataframes with different hierarchies (df1 and df2),
    and based on two groupby columns that can be chosen.
    A-type can be: 'A | CELLULAR DIFFUSE INCLUSION', 'A | CELLULAR SEEDED INCLUSION', 'A | NEURITIC SEEDED INCLUSION'
    Output: dictionary: {A-type : dataframe with A-type calculations}
    """
    
    # Count the totalregion area of each Daugher 1 merged (so the second layer in the hierarchy, e.g. total area of Amygdala = Area Amygdala 1 + Area Amygdala 2 + ... Area Amygdala 7)

    # First method to do this: just group by over all Area/object name merged in the original dataframe
    # and discard the elements that are not in the second layer of the hierarchy, namely BRAIN TISSUE (layer 1)
    df_region_areas_merged_almost = df1.groupby(groupby_column1).sum()['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')
    regions_exclude=['BRAIN TISSUE']
    df_region_areas_merged1=df_region_areas_merged_almost[~df_region_areas_merged_almost['Merged area name'].isin(regions_exclude)]

    print(f'The total area of each {groupby_column1}')
    display(df_region_areas_merged1)
    
    # Second method to do this: just group by over all 'Daughter1 merged' in df_parent_daughter1 
    # (where those daughter 1 rows have not been multiplied due to another join as is the case in df_parent_daughter2)
    # df_region_areas_merged2 = df_parent_daughter1.groupby(groupby_column2).sum()['Area Daughter1 (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')

    # print('METHOD 2: The total area of each Daugher1 merged')
    # display(df_region_areas_merged2)
    
    A_type_list = ['A | CELLULAR DIFFUSE INCLUSION', 'A | CELLULAR SEEDED INCLUSION', 'A | NEURITIC SEEDED INCLUSION']
    A_type_dictionary = {}
    for A_type in A_type_list:

        # Count the total A_type Area and counts and so on (layer 3 = Daugter2 area) of the Inclusions belonging to each Daugher1 merged
        df_parent_daughter2_A_type = df2[df2['Daughter2 merged']==A_type]
        df_total_areas_merged_A_type = df_parent_daughter2_A_type.groupby(groupby_column2).sum()['Area Daughter2 (μm²)'].rename_axis('Merged area name').reset_index(name= A_type + ' ' + 'Total Inclusion Area (μm²)')
        df_counts_merged_A_type = df_parent_daughter2_A_type.value_counts(groupby_column2, sort=True).rename_axis('Merged area name').reset_index(name=A_type + ' ' + 'Counts')
        df_average_areas_merged_A_type = df_parent_daughter2_A_type.groupby(groupby_column2).mean(numeric_only=True)['Area Daughter2 (μm²)'].rename_axis('Merged area name').reset_index(name=A_type + ' ' + 'Average Inclusion Area (μm²)')
        df_areaperimeter_merged_A_type = df_parent_daughter2_A_type.groupby(groupby_column2).mean(numeric_only=True)['Area/Perimeter Daughter2 (μm)'].rename_axis('Merged area name').reset_index(name=A_type + ' ' + 'Average Area/Perimeter (μm)')
        df_circularity_merged_A_type = df_parent_daughter2_A_type.groupby(groupby_column2).mean(numeric_only=True)['Circularity Daughter2'].rename_axis('Merged area name').reset_index(name=A_type + ' ' + 'Average Circularity')

        #         print(f"The total {A_type} Area of each Daugher1 merged")
        #         display(df_total_areas_merged_A_type)
        #         print(f"The total {A_type} Counts of each Daugher1 merged")
        #         display(df_counts_merged_A_type)
        #         print(f"The average {A_type} Area of each Daugher1 merged")
        #         display(df_average_areas_merged_A_type)
        #         print(f"The average {A_type} Area/Perimeter of each Daugher1 merged")
        #         display(df_areaperimeter_merged_A_type)
        #         print(f"The average {A_type} Circularity of each Daugher1 merged")
        #         display(df_circularity_merged_A_type)

        # Join all these dataframes together on the 'Merged area name' column

        dfs_to_merge_A_type = [df_region_areas_merged1, df_total_areas_merged_A_type, df_counts_merged_A_type,
                               df_average_areas_merged_A_type, df_areaperimeter_merged_A_type, df_circularity_merged_A_type]
        df_merged_A_type = functools.reduce(lambda left, right: pd.merge(left,right,on='Merged area name', how='outer'), dfs_to_merge_A_type)

        # Put all calculated results together
        df_merged_A_type[A_type + ' ' + 'Percentage PSYN Positive Area']            = df_merged_A_type[A_type + ' ' + 'Total Inclusion Area (μm²)']/df_merged_A_type['Total Region Area (μm²)']*100
        df_merged_A_type[A_type + ' ' + 'Extrapolated Inclusion Count']    = df_merged_A_type[A_type + ' ' + 'Counts']*spacing
        df_merged_A_type[A_type + ' ' + 'Inclusions/Region Area (per μm²)']  = df_merged_A_type[A_type + ' ' + 'Counts']/df_merged_A_type['Total Region Area (μm²)']
        df_merged_A_type[A_type + ' ' + 'Inclusions/Region Volume (per μm³)']= df_merged_A_type[A_type + ' ' + 'Inclusions/Region Area (per μm²)']/section_thickness
        df_merged_A_type[A_type + ' ' + 'Inclusions/Region Area (mm²)']  = df_merged_A_type[A_type + ' ' + 'Inclusions/Region Area (per μm²)']*1000000
        df_merged_A_type[A_type + ' ' + 'Inclusions/Region Volume (mm³)']= df_merged_A_type[A_type + ' ' + 'Inclusions/Region Volume (per μm³)']*1000000000

        df_merged_A_type.drop(columns=[A_type + ' ' + 'Inclusions/Region Area (per μm²)', A_type + ' ' + 'Inclusions/Region Volume (per μm³)'], inplace=True)

        print(f'The total {A_type} Calculations of each Daugher1 merged')
        display(df_merged_A_type)
        
        # Fill the dictionary with the resulting dataframe:
        A_type_dictionary[A_type] = df_merged_A_type
    
    return A_type_dictionary

### Part 1.9 - Function to calculate all information for all daughter 1's, but subdivided in the three O types that occur


In [None]:
def all_calculations_per_o_type(df1, df2, groupby_column1='Area/object name merged', groupby_column2='Daughter1 merged'):
    """
    Make the main calculations for all O-type inclusions, based on two dataframes with different hierarchies (df1 and df2),
    and based on two groupby columns that can be chosen.
    O-type can be: 'O | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS'
    Output: dictionary: {O-type : dataframe with O-type calculations}
    """
    
    # Count the total area of each Daugher 1 merged (so the second layer in the hierarchy, e.g. total area of Amygdala = Area Amygdala 1 + Area Amygdala 2 + ... Area Amygdala 7)

    # First method to do this: just group by over all Area/object name merged in the original dataframe
    # and discard the elements that are not in the second layer of the hierarchy, namely BRAIN TISSUE (layer 1)
    df_region_areas_merged_almost = df1.groupby(groupby_column1).sum()['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')
    regions_exclude=['BRAIN TISSUE']
    df_region_areas_merged1=df_region_areas_merged_almost[~df_region_areas_merged_almost['Merged area name'].isin(regions_exclude)]

    print(f'The total area of each {groupby_column1}')
    display(df_region_areas_merged1)
    
    # Second method to do this: just group by over all 'Daughter1 merged' in df_parent_daughter1 
    # (where those daughter 1 rows have not been multiplied due to another join as is the case in df_parent_daughter2)
    # df_region_areas_merged2 = df_parent_daughter1.groupby(groupby_column2).sum()['Area Daughter1 (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')

    # print('METHOD 2: The total area of each Daugher1 merged')
    # display(df_region_areas_merged2)
    
    
    O_type_list = ['O | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS']
    O_type_dictionary = {}
    for O_type in O_type_list:

        # Count the O_type total Area and counts (layer 3 = Daugter2 area) of the Inclusions belonging to each Daugher1 merged
        df_parent_daughter3_O_type = df2[df2['Daughter3 merged']==O_type]
        df_areas_merged_O_type = df_parent_daughter3_O_type.groupby(groupby_column2).sum()['Area Daughter3 (μm²)'].rename_axis('Merged area name').reset_index(name= O_type + ' ' + 'Total Inclusion Area (μm²)')
        df_counts_merged_O_type = df_parent_daughter3_O_type.value_counts(groupby_column2, sort=True).rename_axis('Merged area name').reset_index(name=O_type + ' ' + 'Counts')
        df_average_areas_merged_O_type = df_parent_daughter3_O_type.groupby(groupby_column2).mean(numeric_only=True)['Area Daughter3 (μm²)'].rename_axis('Merged area name').reset_index(name=O_type + ' ' + 'Average Inclusion Area (μm²)')
        df_areaperimeter_merged_O_type = df_parent_daughter3_O_type.groupby(groupby_column2).mean(numeric_only=True)['Area/Perimeter Daughter3 (μm)'].rename_axis('Merged area name').reset_index(name=O_type + ' ' + 'Average Area/Perimeter (μm)')
        df_circularity_merged_O_type = df_parent_daughter3_O_type.groupby(groupby_column2).mean(numeric_only=True)['Circularity Daughter3'].rename_axis('Merged area name').reset_index(name=O_type + ' ' + 'Average Circularity')

        #         print(f"The total {O_type} Area of each Daugher1 merged")
        #         display(df_areas_merged_O_type)
        #         print(f"The total {O_type} Counts of each Daugher1 merged")
        #         display(df_counts_merged_O_type)
        #         print(f"The average {O_type} Area of each Daugher1 merged")
        #         display(df_average_areas_merged_O_type)
        #         print(f"The average {O_type} Area/Perimeter of each Daugher1 merged")
        #         display(df_areaperimeter_merged_O_type)
        #         print(f"The average {O_type} Circularity of each Daugher1 merged")
        #         display(df_circularity_merged_O_type)

        # Join all these dataframes together on the 'Merged area name' column

        dfs_to_merge_O_type = [df_region_areas_merged1, df_areas_merged_O_type, df_counts_merged_O_type,
                               df_average_areas_merged_O_type, df_areaperimeter_merged_O_type, df_circularity_merged_O_type]
        df_merged_O_type = functools.reduce(lambda left, right: pd.merge(left,right,on='Merged area name', how='outer'), dfs_to_merge_O_type)

        # Put all calculated results together
        df_merged_O_type[O_type + ' ' + 'Percentage PSYN Positive Area']            = df_merged_O_type[O_type + ' ' + 'Total Inclusion Area (μm²)']/df_merged_O_type['Total Region Area (μm²)']*100
        df_merged_O_type[O_type + ' ' + 'Extrapolated Inclusion Count']    = df_merged_O_type[O_type + ' ' + 'Counts']*spacing
        df_merged_O_type[O_type + ' ' + 'Inclusions/Region Area (per μm²)']  = df_merged_O_type[O_type + ' ' + 'Counts']/df_merged_O_type['Total Region Area (μm²)']
        df_merged_O_type[O_type + ' ' + 'Inclusions/Region Volume (per μm³)']= df_merged_O_type[O_type + ' ' + 'Inclusions/Region Area (per μm²)']/section_thickness
        df_merged_O_type[O_type + ' ' + 'Inclusions/Region Area (mm²)']  = df_merged_O_type[O_type + ' ' + 'Inclusions/Region Area (per μm²)']*1000000
        df_merged_O_type[O_type + ' ' + 'Inclusions/Region Volume (mm³)']= df_merged_O_type[O_type + ' ' + 'Inclusions/Region Volume (per μm³)']*1000000000

        df_merged_O_type.drop(columns=[O_type + ' ' + 'Inclusions/Region Area (per μm²)', O_type + ' ' + 'Inclusions/Region Volume (per μm³)'], inplace=True)

        print(f'The total {O_type} Calculations of each Daugher1 merged')
        display(df_merged_O_type)
        
        # Fill the dictionary with the resulting dataframe:
        O_type_dictionary[O_type] = df_merged_O_type
        
    return O_type_dictionary



### Part 1.10 - Function to merge the A and O dictionary together and also merge all dataframes in these dictionaries

In [None]:
def merge_all_a_and_o_calculations(dict1, dict2): 
    """
    Merge the dictionaries with all A-type calculations, and all O-type calculations. 
    Sort the merged dictionary in the way most convenient for the future excel output.
    Merge all dataframes in the dictionary, just in case ever needed (but probably not).
    Output 1: merged sorted dictionary: {A/O-type : dataframe with A/O-type calculations}
    Output 2: dataframe that is the outer join of all dataframes in the dictionary of output 1
    """
    
    A_and_O_type_dictionary = dict1.copy()
    A_and_O_type_dictionary.update(dict2)
    
    # Order the dictionary in the way most convenient for the output:
    list_order_of_dict = ['A | NEURITIC SEEDED INCLUSION', 'A | CELLULAR SEEDED INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS', 
                          'A | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR DIFFUSE INCLUSION']
    A_and_O_type_dictionary_sorted = {i:A_and_O_type_dictionary[i] for i in list_order_of_dict}
    
    dfs_to_merge_A_and_O_type = [A_and_O_type_dictionary_sorted[i] for i in A_and_O_type_dictionary_sorted.keys()]
    df_all_calcs_merged = functools.reduce(lambda left, right: pd.merge(left,right,on=['Merged area name','Total Region Area (μm²)' ], how='outer'), dfs_to_merge_A_and_O_type)
    
    return A_and_O_type_dictionary_sorted, df_all_calcs_merged


## Part 2 - Automatic Wholebrain Analysis of all X*N Slides of all N Brains


In [None]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_parent = {}   # {'_S1' : df_S1_parent, '_S2' : df_S2_parent, ..., '_SX' : df_SX_parent}
    dict_df_SX_parent_daughter1 = {}
    dict_df_SX_parent_daughter2 = {}
    dict_df_SX_parent_daughter3 = {}
    dict_df_SX_all_calcs_merged = {}
    dict_df_A_and_O_type_dictionary= {}  # dictionary of dictionaries
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above
        print('\n Analysis of ', file_location)
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)
        dict_df_SX_parent[appendix], dict_df_SX_parent_daughter1[appendix], dict_df_SX_parent_daughter2[appendix], dict_df_SX_parent_daughter3[appendix] = make_hierarchy(dict_df_SX_final[appendix])

        # Do all the calculations, making use of the functions defined above. 
        # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for appendix =_S1).
        try: 
            A_type_dictionary_SX = all_calculations_per_a_type(dict_df_SX_final[appendix], dict_df_SX_parent_daughter2[appendix])
            O_type_dictionary_SX = all_calculations_per_o_type(dict_df_SX_final[appendix], dict_df_SX_parent_daughter3[appendix])
            dict_df_A_and_O_type_dictionary[appendix], dict_df_SX_all_calcs_merged[appendix] = merge_all_a_and_o_calculations(A_type_dictionary_SX, O_type_dictionary_SX)
            print(f"All A type and O type calculations together for {appendix} for {file_location}")
            display(dict_df_SX_all_calcs_merged[appendix])
        except:
            pass    
        
        # A_type_dictionary_SX and O_type_dictionary_SX were temporary dataframes and can be deleted
        del(A_type_dictionary_SX)
        del(O_type_dictionary_SX)
      
   
    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat            = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Do all the S1 + S2 + ... + SX calculations, making use of the functions defined above. 
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for this concatenated df).
    try: 
        A_type_dictionary_SX_concat = all_calculations_per_a_type(df_SX_final_concat, df_SX_parent_daughter2_concat)
        O_type_dictionary_SX_concat = all_calculations_per_o_type(df_SX_final_concat, df_SX_parent_daughter3_concat)
        A_and_O_type_dictionary_SX, df_SX_all_calcs_concat = merge_all_a_and_o_calculations(A_type_dictionary_SX_concat, O_type_dictionary_SX_concat)
        print('All A type and O type calculations together for S1+S2+...+SX for ', file_location_S1)
        display(df_SX_all_calcs_concat)
    except:
        pass

    # A_type_dictionary_SX_concat and O_type_dictionary_SX_concat were temporary dataframes and can be deleted
    del(A_type_dictionary_SX_concat)
    del(O_type_dictionary_SX_concat)
    
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    # Sheet 1 contains 5 tables (1 per A/O type) with all S1 information, sheet 2 for S2, sheet X for SX,  and sheet X+1 for concatenated S1+S2+...SX.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
            try:
                count_df=0
                for df in dict_df_A_and_O_type_dictionary[appendix]:
                    dict_df_A_and_O_type_dictionary[appendix][df].to_excel(writer, sheet_name=appendix[1:]+'_Results', index=False, float_format = "%.3f", startrow=count_df)
                    count_df += dict_df_A_and_O_type_dictionary[appendix][df].shape[0] + 2
            except:
                pass
        
        count_df=0
        for df in A_and_O_type_dictionary_SX:
            A_and_O_type_dictionary_SX[df].to_excel(writer, sheet_name='SX_Results', index=False, float_format = "%.3f", startrow=count_df)
            count_df += A_and_O_type_dictionary_SX[df].shape[0] + 2    
    
    
    # For the overview excel file, only the 5 dataframes for the 5 type_names in A_and_O_type_dictionary_SX are needed. 
    # We will make 5 overview excelfiles with 8 tabpages that we store in dictionary_overview_dataframes:
    # dictionary_overview_dataframes = {'A | NEURITIC SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                   'A | CELLULAR SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                    ... }
    
    # In the first loop we initiate an empty overview dictionary that will be filled with dictionaries that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes={}

    for type_name in A_and_O_type_dictionary_SX:
        # type_name can be ['A | NEURITIC SEEDED INCLUSION', 'A | CELLULAR SEEDED INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS', 'A | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR DIFFUSE INCLUSION'] 
        
        # In the first loop we initiate empty dictionaries (that will be filled with dataframes) within the overview dictionary.
        if count == 1:
            dictionary_overview_dataframes[type_name]={}

        # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
        # and rename the header of the column with the values to the image_name 
        # (we go from e.g. image_name_S1 = 155118-3_S1_PSYN_8 to column name = 155118-3)
        list_calculation_results=['Total Region Area (μm²)', type_name + ' ' + 'Total Inclusion Area (μm²)', type_name + ' ' + 'Percentage PSYN Positive Area',
                                  type_name + ' ' + 'Counts', type_name + ' ' + 'Extrapolated Inclusion Count', 
                                  type_name + ' ' + 'Inclusions/Region Area (mm²)',  type_name + ' ' + 'Inclusions/Region Volume (mm³)',
                                  type_name + ' ' + 'Average Inclusion Area (μm²)',
                                  type_name + ' ' + 'Average Area/Perimeter (μm)', type_name + ' ' + 'Average Circularity']
        
        # print('list_calculation_results = ', list_calculation_results)
        
        for calculation_result in list_calculation_results:
            df_SX_all_calcs_merged_calculation= A_and_O_type_dictionary_SX[type_name][['Merged area name', calculation_result]].copy()
            df_SX_all_calcs_merged_calculation.rename(columns={calculation_result: image_name_S1.replace('_S1_PSYN', '')}, inplace=True)
        
            if count==1:
                # In the first loop we fill the empty dictionaries within the overview dictionary.
                dictionary_overview_dataframes[type_name][calculation_result]  = df_SX_all_calcs_merged_calculation.copy()

            elif count > 1 :
                # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary.
                dictionary_overview_dataframes[type_name][calculation_result] = dictionary_overview_dataframes[type_name][calculation_result].merge(df_SX_all_calcs_merged_calculation, how='outer', on='Merged area name')

    
    # At the end, we delete some of the dataframes, to ensure they cannot be used in the next loop
    del(dict_df_SX_final)
    del(dict_df_SX_parent)
    del(dict_df_SX_parent_daughter1)
    del(dict_df_SX_parent_daughter2)
    del(dict_df_SX_parent_daughter3)
    del(dict_df_A_and_O_type_dictionary)
    del(df_SX_final_concat)
    del(df_SX_all_calcs_concat)

                
# After the for loops, we print the final overview tables
for type_name in dictionary_overview_dataframes:
    type_name_clean = type_name.replace(' | ', '_').replace(' ', '_')
    
    # Output the final overview tables to an excel file Overview_PSYN_Results_'type_name'.xlsx that is created in the output folder specified at the beginning of this notebook
    output_file_name_overview = os.path.join(folder_output_results, 'Overview_PSYN_Results_' + type_name_clean + '.xlsx')
    
    list_calculation_results=['Total Region Area (μm²)',  
                              type_name + ' ' + 'Total Inclusion Area (μm²)',  type_name + ' ' + 'Percentage PSYN Positive Area',
                              type_name + ' ' + 'Counts',      type_name + ' ' + 'Extrapolated Inclusion Count', 
                              type_name + ' ' + 'Inclusions/Region Area (mm²)',     type_name + ' ' + 'Inclusions/Region Volume (mm³)',
                              type_name + ' ' + 'Average Inclusion Area (μm²)',
                              type_name + ' ' + 'Average Area/Perimeter (μm)',   type_name + ' ' + 'Average Circularity']
    
    with pd.ExcelWriter(output_file_name_overview) as writer: 
        for calculation_result in list_calculation_results:
            calculation_result_clean = calculation_result.replace(type_name + ' ', '').replace('Inclusions/', 'Incl/').replace('/', ' per ').replace('Volume', 'Vol')
            
            print(f'Overview dataframe with all {calculation_result_clean} for {type_name} for all brains')
            display(dictionary_overview_dataframes[type_name][calculation_result])

            dictionary_overview_dataframes[type_name][calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")


## Part 3 - Automatic Hemisphere Analysis of all X*N Slides of all N Brains (injected vs uninjected)


In [None]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the modified file with hemisphere analysis for each specific image 
df_brainregions_injected=load_data_brainregions_injected(file_brainregions_injected)

# Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_parent = {}   # {'_S1' : df_S1_parent, '_S2' : df_S2_parent, ..., '_SX' : df_SX_parent}
    dict_df_SX_parent_daughter1 = {}
    dict_df_SX_parent_daughter2 = {}
    dict_df_SX_parent_daughter3 = {}
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above
        print('\n Analysis of ', file_location)
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)
        dict_df_SX_parent[appendix], dict_df_SX_parent_daughter1[appendix], dict_df_SX_parent_daughter2[appendix], dict_df_SX_parent_daughter3[appendix] = make_hierarchy(dict_df_SX_final[appendix])


    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat            = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Start from the fully concatenated dataframes of S1, S2... SX after replacing the wrong brainregions
    # For those dataframes, we will determine for each row wheter Daughter 1 was Injected or Uninjected. 
    # The brainregions we don't care about will be deleted because we are doing an inner join
    df_SX_final_concat_injected = df_SX_final_concat.merge(df_brainregions_injected, left_on=['Image', 'Area/object name'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter1_injected = df_SX_parent_daughter1_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter2_injected = df_SX_parent_daughter2_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter3_injected = df_SX_parent_daughter3_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')

    
    # Do all the calculations, making use of the functions defined above. 
    A_type_dictionary_SX_injected = all_calculations_per_a_type(df_SX_final_concat_injected, df_SX_parent_daughter2_injected, 'Daughter1_Injected', 'Daughter1_Injected')
    O_type_dictionary_SX_injected = all_calculations_per_o_type(df_SX_final_concat_injected, df_SX_parent_daughter3_injected, 'Daughter1_Injected', 'Daughter1_Injected')
    A_and_O_type_dictionary_SX_injected, df_SX_all_calcs_injected = merge_all_a_and_o_calculations(A_type_dictionary_SX_injected, O_type_dictionary_SX_injected)
    print('All A type and O type calculations together for S1+S2+...+SX injected')
    display(df_SX_all_calcs_injected)

    
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    # The sheet contains 5 tables (1 per A/O type) with all S1+S2 Injected information.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Hemisphere_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results_injected, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        count_df=0
        for df in A_and_O_type_dictionary_SX_injected:
            A_and_O_type_dictionary_SX_injected[df].sort_values('Merged area name', ascending=False, inplace=True)
            A_and_O_type_dictionary_SX_injected[df].to_excel(writer, sheet_name='SX_Hemisphere_Results', index=False, float_format = "%.3f", startrow=count_df)
            count_df += A_and_O_type_dictionary_SX_injected[df].shape[0] + 2
        
   
    # For the overview excel file, only the 5 dataframes for the 5 type_names in A_and_O_type_dictionary_SX are needed. 
    # We will make 5 overview excelfiles with 8 tabpages that we store in dictionary_overview_dataframes_injected:
    # dictionary_overview_dataframes_injected = {'A | NEURITIC SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                   'A | CELLULAR SEEDED INCLUSION' : {total_region_area: df, total_inclusion_area:df, extrapolated_inclusions:df, .... },
    #                                    ... }
        
    # In the first loop we initiate an empty overview dictionary that will be filled with dictionaries that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes_injected={}

    for type_name in A_and_O_type_dictionary_SX_injected:
        # type_name can be ['A | NEURITIC SEEDED INCLUSION', 'A | CELLULAR SEEDED INCLUSION', 'O | CELLULAR SEEDED INCLUSIONS', 'A | CELLULAR DIFFUSE INCLUSION', 'O | CELLULAR DIFFUSE INCLUSION'] 
        
        # In the first loop we initiate empty dictionaries (that will be filled with dataframes) within the overview dictionary.
        if count == 1:
            dictionary_overview_dataframes_injected[type_name]={}
        
        # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
        # and rename the header of the column with the values to the image_name 
        # (we go from e.g. image_name_S1 = 155118-3_S1_PSYN_8 to column name = 155118-3)
        list_calculation_results=['Total Region Area (μm²)', type_name + ' ' + 'Total Inclusion Area (μm²)', type_name + ' ' + 'Percentage PSYN Positive Area',
                                  type_name + ' ' + 'Counts', type_name + ' ' + 'Extrapolated Inclusion Count', 
                                  type_name + ' ' + 'Inclusions/Region Area (mm²)',  type_name + ' ' + 'Inclusions/Region Volume (mm³)',
                                  type_name + ' ' + 'Average Inclusion Area (μm²)',
                                  type_name + ' ' + 'Average Area/Perimeter (μm)', type_name + ' ' + 'Average Circularity']
        
        # print('list_calculation_results = ', list_calculation_results)
        
        for calculation_result in list_calculation_results:
            df_SX_all_calcs_injected_calculation= A_and_O_type_dictionary_SX_injected[type_name][['Merged area name', calculation_result]].copy()
            df_SX_all_calcs_injected_calculation.rename(columns={calculation_result: image_name_S1.replace('_S1_PSYN', '')}, inplace=True)
        
            if count==1:
                # In the first loop we fill the empty dictionaries within the overview dictionary.
                dictionary_overview_dataframes_injected[type_name][calculation_result]  = df_SX_all_calcs_injected_calculation.copy()

            elif count > 1 :
                # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary.
                dictionary_overview_dataframes_injected[type_name][calculation_result] = dictionary_overview_dataframes_injected[type_name][calculation_result].merge(df_SX_all_calcs_injected_calculation, how='outer', on='Merged area name')

    # At the end, we delete some of the dataframes, to ensure they cannot be used in the next loop
    del(dict_df_SX_final)
    del(dict_df_SX_parent)
    del(dict_df_SX_parent_daughter1)
    del(dict_df_SX_parent_daughter2)
    del(dict_df_SX_parent_daughter3)
    del(df_SX_final_concat)
    del(df_SX_all_calcs_injected)
                
# After the for loops, we print the final overview tables
for type_name in dictionary_overview_dataframes_injected:
    type_name_clean = type_name.replace(' | ', '_').replace(' ', '_')

    # Output the final overview tables to an excel file Overview_PSYN_Hemisphere_Results_'type_name'.xlsx that is created in the output folder specified at the beginning of this notebook
    output_file_name_overview = os.path.join(folder_output_results_injected, 'Overview_PSYN_Hemisphere_Results_' + type_name_clean + '.xlsx')
    
    list_calculation_results=['Total Region Area (μm²)', 
                              type_name + ' ' + 'Total Inclusion Area (μm²)',  type_name + ' ' + 'Percentage PSYN Positive Area', 
                              type_name + ' ' + 'Counts',      type_name + ' ' + 'Extrapolated Inclusion Count', 
                              type_name + ' ' + 'Inclusions/Region Area (mm²)',     type_name + ' ' + 'Inclusions/Region Volume (mm³)',
                              type_name + ' ' + 'Average Inclusion Area (μm²)',
                              type_name + ' ' + 'Average Area/Perimeter (μm)',   type_name + ' ' + 'Average Circularity']
    
    with pd.ExcelWriter(output_file_name_overview) as writer: 
        for calculation_result in list_calculation_results:
            calculation_result_clean = calculation_result.replace(type_name + ' ', '').replace('Inclusions/', 'Incl/').replace('/', ' per ').replace('Volume', 'Vol')
            
            print(f'Overview dataframe with all {calculation_result_clean} for {type_name} for all brains')
            display(dictionary_overview_dataframes_injected[type_name][calculation_result])

            dictionary_overview_dataframes_injected[type_name][calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")
