# Analysis of Neuronal Cell Detector (NCD) model

## 0. Outline
This code deals with the automatic processing of raw data from mouse brains analysed with “Neuronal Cell Detector” model developed in Aiforia® Create. We typically start from Excel/CSV files collected in a local folder on the computer that is specified in the code. To automatically change the format of a series of files, refer to the Change_Name_Format_Input_Data.ipynb notebook. The code is developed to take into account that a mouse brain can be mounted over several slides. Slides for each animal are named identically, except for a numeric postfix denoting the slide number: '_S1', '_S2', etc.

The present notebook is divided into 3 sections:

**1) Make the necessary functions for part 2 and part 3**


**2) Automatic analysis of X * N Slides of N Brains**

Here we automate the analysis of all X*N slide images of all N brains (X slides per brain, which is a parameter that the user can choose) in the folder with raw data. The approach is as follows:

1) We collect all the X*N names of the raw data files in the folder and put them in a list.

2) We make a list containing only the N filenames with an '_S1' in the name. These are the N first slides of the N brains.

3) We loop over these N first slide images belonging to the N brains and perform the following steps in each loop:

    a) We retrieve the second slide (containing an '_S2' in the filename of the raw data) belonging to this specific brain. We do the same for the third ('_S3'), fourth ('_S4'), ..., X'th ('_SX')  slides of the specific brain.   
    b) We perform the data analysis steps on the S1, S2, ..., SX slides separately, and also on the concatenated data of S1+S2+...+SX.     
    c) We output the results to an excel file for this specific brain.
    
After each loop, we add the output of this specific brain to an overview table that will contain all results for all brains. After the last loop, this overview table is also exported to an excel file.

**3) Automatic analysis of X * N Slides of N Brains after determining to which hemisphere they belong**

Here we add which brain regions are on each hemisphere, and compare the injected vs uninjected sides. Analysis occurs similar to section 2. 

## Part 1 - Make the necessary functions


### Part 1.1 - Load all necessary Python packages

In [None]:
# Import the required Python packages
import pandas as pd                                # For data analysis with dataframes
import math                                        # To get the value for pi
import functools                                   # For higher-order functions that work on other functions
from IPython.display import display                # Enables the display of more than one dataframe per code cell
import numpy as np                                 # For data analysis
import glob                                        # To get all raw data file locations
import os                                          # To get all raw data file locations
pd.options.display.float_format = '{:.2f}'.format  # Display all numbers in dataframes with 2 decimals


### Part 1.2 - Data locations

**TO DO:** 
- Specify the format of the raw data and the raw data folder location, as well as some experimental parameters.
- Specify the file paths of the excel file containing your quality control revisions and the excel file mapping each brain region to a hemisphere.
- Specify the folder locations where you would like to collect the output excel files (for whole brain and hemisphere analysis).  

The format is: <font color='darkred'>r'file_location'</font> 

In [None]:
# Specify what data format you want to use for your raw data: excel, csv or feather. Do this by uncommenting the data_format that you want.
data_format = 'csv'
# data_format = 'excel'
# data_format = 'feather'

# Specify the maximal amount of slides you have per animal brain. If this for instance is 4, we expect filenames containing '_S1', '_S2', '_S3' and '_S4'.
# If some animal brains have less slides, no problem. The code will create empty data files for the missing slides so it can run properly.
amount_of_slides = 4

# Specify the experimental parameters (section_thickness in micrometers) and locations:
# The spacing parameter refers to the serial section spacing interval. It's the interval at which you sample the brain volume for analysis, not the physical distance between each section. For example, if you have a spacing parameter of 10, you would take every 10th section for your analysis.  
spacing=12
section_thickness = 40
folder_raw_data = r'C:\Users\...\Raw_Data_NCD'
file_brainregions_to_replace =  r'C:\Users\...\Brainregions_To_Replace_NCD.xlsx'
file_brainregions_injected =  r'C:\Users\...\Brainregions_Hemisphere_NCD.xlsx'
folder_output_results = r'C:\Users\...\Results_Wholebrain_NCD'
folder_output_results_injected = r'C:\Users\...\Results_Hemisphere_NCD'


In [None]:
# Make the output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

# Make the list of filename appendices that are expected. For instance if amount_of_slides = 4, then appendices_list = ['_S1', '_S2', '_S3', '_S4']
appendices_list = [f"_S{i}" for i in range(1, amount_of_slides + 1)]


### Part 1.3 - Function to load all image files that need to be analyzed

In [None]:
def load_all_file_locations_S1(folder_raw_data):
    """
    Make a list of all file locations for S1 images present in the folder with all raw data files.
    It's thus important that the filenames contain '_S1' in their name, even if there is no '_S2' counterparty. 
    If you don't work with '_S1' and '_S2', then just append '_S1' to the filenames to make the code work.
    Output: list of all file locations for S1 images.
    """
    
    if data_format == 'csv':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.csv"))
    elif data_format == 'excel':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.xlsx"))
    elif data_format == 'feather':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.feather"))
    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')
        
    all_raw_data_file_locations.sort()

    print('The location of all the raw data files = ')
    for file_location in all_raw_data_file_locations:
        print(file_location)

    # Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
    all_raw_data_file_locations_S1= [x for x in all_raw_data_file_locations if '_S1' in x]
    print('\nThe location of all the raw S1 data files = ')
    for file_location_S1 in all_raw_data_file_locations_S1:
        print(file_location_S1)
        
    return all_raw_data_file_locations_S1

### Part 1.4 - Function to load the file with corrections for the brainregions

In [None]:
def load_data_brainregions_to_replace(file_brainregions_to_replace):
    """
    Load the file containing the corrections for brain regions that need to be replaced for each specific image.
    Output: cleaned dataframe with brain regions that need to be replaced for each image.
    """
    
    df_brainregions_to_replace_raw=pd.read_excel(file_brainregions_to_replace,
                                                 usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
                                                 dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
                                                )

    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_to_replace=df_brainregions_to_replace_raw.copy()
    df_brainregions_to_replace['Image'] = df_brainregions_to_replace_raw['Image'].str.strip()
    df_brainregions_to_replace['Brainregion_Wrong'] = df_brainregions_to_replace_raw['Brainregion_Wrong'].str.upper().str.strip()
    df_brainregions_to_replace['Brainregion_Correct'] = df_brainregions_to_replace_raw['Brainregion_Correct'].str.upper().str.strip()

    #     print('The raw table of the brain regions to replace for each image = ')
    #     display(df_brainregions_to_replace_raw)

    print('The modified table of the brain regions to replace for each image = ')
    display(df_brainregions_to_replace)
    
    return df_brainregions_to_replace

### Part 1.5 - Function to load the file with which brainregions were injected


In [None]:
def load_data_brainregions_injected(file_brainregions_injected):
    """
    Load the file specifying which brainregions were on the injected side for each specific image.
    Output: cleaned dataframe with brain regions that were injected for each image.
    """
    
    df_brainregions_injected_raw=pd.read_excel(file_brainregions_injected,
                                               usecols=['Image', 'Brainregion', 'Hemisphere'],
                                               dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
                                               )
    
    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_injected=df_brainregions_injected_raw.copy()
    df_brainregions_injected['Image'] = df_brainregions_injected_raw['Image'].str.strip()
    df_brainregions_injected['Brainregion'] = df_brainregions_injected_raw['Brainregion'].str.upper().str.strip()
    df_brainregions_injected['Daughter1_Injected'] = df_brainregions_injected_raw['Hemisphere'].str.upper().str.strip()
    df_brainregions_injected.drop(columns=['Hemisphere'], inplace=True)

    #     print('The raw table of the brain regions injected for each image = ')
    #     display(df_brainregions_injected_raw)

    print('The modified table of the brain regions injected for each image = ')
    display(df_brainregions_injected)
    
    return df_brainregions_injected

### Part 1.6 - Function to load dataframe and clean it


In [None]:
def dataframe_cleaning(file_location, df_brainregions_to_replace):
    """
    Load the specific file location in a dataframe and clean it with df_brainregions_to_replace.
    Output: loaded and cleaned dataframe with some additional calculated values.
    """
    if data_format == 'csv':
        # Check if the  file exists. If not, we make an empty CSV file with the right columns:
        try:
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_csv(file_location, sep='\t')
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')

            
    elif data_format == 'excel':
        # Check if the  file exists. If not, we make an empty excel file with the right columns:
        try:
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_excel(file_location)
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')
            
    elif data_format == 'feather':
        # Check if the  file exists. If not, we make an empty feather file with the right columns:
        try:
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_feather(file_location)
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')

    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')
        


    # Put all columns in capitals to never make mistakes against capitalization
    df_1['Parent area name'] = df_1['Parent area name'].str.upper()
    df_1['Area/object name'] = df_1['Area/object name'].str.upper()
    df_1['Class label']      = df_1['Class label'].str.upper()

    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty, 
    # and getting from filename makes more sense anyway) and change some field based on the recipe. 
    full_name = os.path.basename(file_location)
    file_name = os.path.splitext(full_name)
    image_name = file_name[0]
    print('The present image=', image_name)
    
    # Make sure the image name across the whole first column is correct
    df_1['Image']=image_name
    
    print('The full raw data=')
    display(df_1)

    # Determine the dictionary of brain regions that should be replaced for this specific image
    df_brainregions_to_replace = df_brainregions_to_replace[df_brainregions_to_replace['Image']==image_name]
    dict_brainregions_to_replace= pd.Series(df_brainregions_to_replace.Brainregion_Correct.values, index=df_brainregions_to_replace.Brainregion_Wrong).to_dict()

    print('The dictionary of brain regions to replace for this specific image', image_name, 'is', dict_brainregions_to_replace)

    # Replace the value in the rows that have a Parent area name or Area/object name that is in list_brainregions_replace
    df_2=df_1.copy()
    df_2['Parent area name'] = df_1['Parent area name'].replace(dict_brainregions_to_replace, regex=False)
    df_2['Area/object name'] = df_1['Area/object name'].replace(dict_brainregions_to_replace, regex=False)
    
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the numbers are deleted from these columns:
    df_2['Parent area name merged'] = df_2['Parent area name'].str.replace('\\d+', '', regex=True).str.strip()
    df_2['Area/object name merged'] = df_2['Area/object name'].str.replace('\\d+', '', regex=True).str.strip()

    # The rows in which we put a parent empty, can have an area/object name that itself occurs as parent and that should also be deleted
    # (basically the Daughter 3 of the parent should also be deleted)
    df_empty_parent = df_2[df_2['Parent area name']=='EMPTY']
    list_of_area_objects_that_should_be_empty = df_empty_parent['Area/object name'].to_list()
    print('list_of_area_objects_that_should_be_empty = ', list_of_area_objects_that_should_be_empty)
    df_2.loc[df_2["Parent area name"].isin(list_of_area_objects_that_should_be_empty), "Parent area name"] = "EMPTY"

    # Delete the rows in which we just made the Parent area name or Area/object name 'EMPTY' by replacing them with the dictionary
    df_3 = df_2[(df_2['Parent area name']!='EMPTY') &  (df_2['Area/object name']!='EMPTY')]

    # Show the full updated dataframe:
    print('The fully cleaned table with "Parent area name merged", "Area/object name merged" and with the replaced brainregions:')
    display(df_3)
    
    return df_3


### Part 1.7 - Function to make hierarchical dataframes


In [None]:
# The hierarchy: We have 1 type of parent (TISSUE 1, 2, 3 etc) with many types of daughter 1 (Amygdala 1, 2, 3 etc,  Striatum 1, 2, 3 etc) 
# and 1 type of daughter 2 (NEUN POSITIVE AREA 1010, NEUN POSITIVE AREA 1011 etc)
# and 1 type of daughter 3 (NEUN POSITIVE CELL 5495, NEUN POSITIVE CELL 5711 etc)
def make_hierarchy(df):
    """ 
    Here we make the hierarchical structure of the data in the dataframe more clear. 
    The field 'Parent area name' is always the parent of the 'Area/object name' in the same row. 
    The area in the row always belongs to the 'Area/object name'.
    Output: four dataframes in which gradually more hierarchy is added.
    """

    # The rows with the top parent (= TISSUE X) are the rows that don't have an own Parent area name
    df_parent_almost = df[df['Parent area name'].isna()]
    dict_parent={'Area/object name':'Parent name', 'Area/object name merged': 'Parent name merged', 'Area (μm²)': 'Area Parent (μm²)'}
    df_parent=df_parent_almost.rename(columns=dict_parent)
    df_parent.drop(columns=['Parent area name', 'Class label', 'Parent area name merged'], inplace=True)

    # Then we add the first daughter = the daughter of the top parents
    df_parent_daughter1_almost=df_parent.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)']], left_on='Parent name', right_on='Parent area name', how='inner')
    dict_daughter1 = {'Parent area name': 'Parent name copy', 'Area/object name':'Daughter1', 
                      'Area/object name merged': 'Daughter1 merged', 'Area (μm²)': 'Area Daughter1 (μm²)'}

    df_parent_daughter1_almost2=df_parent_daughter1_almost.rename(columns=dict_daughter1)
    # Groupby is needed because there now can be for instance 2 Striatum 4's 
    # (one of them originated from e.g. changing Amygdala 1 to Striatum 4 in the brainregion corrections)
    # We need to turn this Striatum 4 into a unique row because otherwise we will double in the next join when making df_parent_daughter2
    df_parent_daughter1=df_parent_daughter1_almost2.groupby(['Daughter1'], as_index=False).agg(
        {'Image': 'first', 'Parent name': 'first', 'Area Parent (μm²)': 'first',
         'Parent name merged': 'first', 'Parent name copy': 'first', 'Daughter1': 'first',
         'Daughter1 merged': 'first', 'Area Daughter1 (μm²)': 'sum'})

    # Then we add the second daughter = the daughter of daughter 1
    df_parent_daughter2_almost=df_parent_daughter1.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)']], left_on='Daughter1', right_on='Parent area name', how='inner')
    dict_daughter2 = {'Parent area name': 'Daughter1 copy','Area/object name':'Daughter2', 
                      'Area/object name merged': 'Daughter2 merged', 'Area (μm²)': 'Area Daughter2 (μm²)'}
    df_parent_daughter2=df_parent_daughter2_almost.rename(columns=dict_daughter2)

    # Then we add the third daughter = the daughter of daughter 2
    df_parent_daughter3_almost=df_parent_daughter2.merge(df[['Parent area name', 'Area/object name','Area/object name merged', 'Area (μm²)' ]], left_on='Daughter2', right_on='Parent area name', how='inner')
    dict_daughter3 = {'Parent area name': 'Daughter2 copy','Area/object name':'Daughter3', 
                      'Area/object name merged': 'Daughter3 merged', 'Area (μm²)': 'Area Daughter3 (μm²)'}
    df_parent_daughter3=df_parent_daughter3_almost.rename(columns=dict_daughter3)

#     print('Original df')
#     display(df)
#     print('df_parent')
#     display(df_parent)
#     print('df_parent_daughter1')
#     display(df_parent_daughter1)
#     print('df_parent_daughter2')
#     display(df_parent_daughter2)
#     print('df_parent_daughter3')
#     display(df_parent_daughter3)
    
    return df_parent, df_parent_daughter1, df_parent_daughter2, df_parent_daughter3

### Part 1.8 - Function to calculate all information for all daughter 1's


In [None]:
def all_calculations(df1, df2, df3, groupby_column1='Daughter1 merged'):
    """
    Make the main calculations (areas, counts...) based on three dataframes with different hierarchies (df1, df2 and df3), 
    and based on a groupby columns that can be chosen.
    Output: dataframe with all calculations.
    """

    # Count the totalregion area of each Daughter 1 merged (so the second layer in the hierarchy, e.g. total area of Amygdala = Area Amygdala 1 + Area Amygdala 2 + ... Area Amygdala 7)

    # First method to do this: just group by over all Area/object name merged and discard the elements that are not 
    # in the second layer of the hierarchy, namely TISSUE (layer 1), NEUN POSITIVE AREA (layer 3) and NEUN POSITIVE CELL (layer 4)
    # df_region_areas_merged_almost = df1.groupby(groupby_column2).sum()['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')
    # regions_exclude=['TISSUE','NEUN POSITIVE CELL','NEUN POSITIVE AREA']
    # df_region_areas_merged1=df_region_areas_merged_almost[~df_region_areas_merged_almost['Merged area name'].isin(regions_exclude)]

    # print('The total area of each Daughter1 merged')
    # display(df_region_areas_merged1)

    # Second method to do this: just group by over all 'Daughter1 merged' in df_parent_daughter1 
    # (where those daughter 1 rows have not been multiplied due to another join as is the case in df_parent_daughter2)
    df_region_areas_merged2 = df1.groupby(groupby_column1).sum()['Area Daughter1 (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')

    # print('METHOD 2: The total area of each Daughter1 merged')
    # display(df_region_areas_merged2)
    

    # Count the NeuN Positive Area (layer 3 = Daugter2 area) of the cells belonging to each Daughter1 merged
    df_positive_areas_merged = df2.groupby(groupby_column1).sum()['Area Daughter2 (μm²)'].rename_axis('Merged area name').reset_index(name='Total NeuN Positive Area (μm²)')

    # Count the number of rows (each row has a cell name like NEUN POSITIVE CELL 29144 in Daughter 3, corresponding to layer 4) for each Daughter 1 merged
    df_positive_counts = df3.value_counts(groupby_column1, sort=True).rename_axis('Merged area name').reset_index(name='Counts')
    
    # print('The total NeuN Positive Area of each Daughter1 merged')
    # display(df_positive_areas_merged)
    # print('The total NeuN Positive cell count of each Daughter1 merged')       
    # display(df_positive_counts)

    # Join all these dataframes together on the 'Merged area name' column

    dfs_to_merge = [df_region_areas_merged2, df_positive_areas_merged, df_positive_counts]
    df_all_calcs_merged  = functools.reduce(lambda left, right: pd.merge(left,right,on='Merged area name', how='outer'), dfs_to_merge)

    # Put all calculated results together
    df_all_calcs_merged['Extrapolated Cell Count']            = df_all_calcs_merged['Counts']*spacing
    df_all_calcs_merged['Percentage NeuN Positive Area'] = df_all_calcs_merged['Total NeuN Positive Area (μm²)']/df_all_calcs_merged['Total Region Area (μm²)']*100
    df_all_calcs_merged['Cells/Region Area (per μm²)']          = df_all_calcs_merged['Counts']/df_all_calcs_merged['Total Region Area (μm²)']
    df_all_calcs_merged['Cells/Region Volume (per μm³)']        = df_all_calcs_merged['Cells/Region Area (per μm²)']/section_thickness
    df_all_calcs_merged['Cells/Region Area (mm²)']          = df_all_calcs_merged['Cells/Region Area (per μm²)']*1000000
    df_all_calcs_merged['Cells/Region Volume (mm³)']        = df_all_calcs_merged['Cells/Region Volume (per μm³)']*1000000000
    
    df_all_calcs_merged.drop(columns=['Cells/Region Area (per μm²)', 'Cells/Region Volume (per μm³)'], inplace=True)
    df_all_calcs_merged.sort_values('Merged area name',inplace=True)
    
    print(f'The total Calculations of each {groupby_column1}')
    display(df_all_calcs_merged)
    
    return df_all_calcs_merged

## Part 2 - Automatic Wholebrain Analysis of all X*N Slides of all N Brains


In [None]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Extract the file names that contain '_S1' in the file name. These are the N first slide images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder, corresponding to all N unique brains
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_parent = {}   # {'_S1' : df_S1_parent, '_S2' : df_S2_parent, ..., '_SX' : df_SX_parent}
    dict_df_SX_parent_daughter1 = {}
    dict_df_SX_parent_daughter2 = {}
    dict_df_SX_parent_daughter3 = {}
    dict_df_SX_all_calcs_merged = {}
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above
        print('\n Analysis of ', file_location)
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)
        dict_df_SX_parent[appendix], dict_df_SX_parent_daughter1[appendix], dict_df_SX_parent_daughter2[appendix], dict_df_SX_parent_daughter3[appendix] = make_hierarchy(dict_df_SX_final[appendix])

        # Do all the calculations, making use of the functions defined above. 
        # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for appendix =_S1).
        try: 
            dict_df_SX_all_calcs_merged[appendix] = all_calculations(dict_df_SX_parent_daughter1[appendix], dict_df_SX_parent_daughter2[appendix], dict_df_SX_parent_daughter3[appendix])
            print(f"All calculations together for {appendix} for {file_location}")
            display(dict_df_SX_all_calcs_merged[appendix])
        except:
            pass
   

    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat            = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Do all the S1 + S2 + ... + SX calculations, making use of the functions defined above. 
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for S1+S2).
    try: 
        df_SX_all_calcs_concat = all_calculations(df_SX_parent_daughter1_concat, df_SX_parent_daughter2_concat, df_SX_parent_daughter3_concat)
        print('All calculations together for S1+S2+...+SX concatenated for ', file_location_S1)
        display(df_SX_all_calcs_concat)
    except:
        pass
    
    
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
            try:
                dict_df_SX_all_calcs_merged[appendix].to_excel(writer, sheet_name=appendix[1:]+'_Results', index=False, float_format = "%.3f")
            except:
                pass  # No SX dataframe was available, and the empty one would lead to errors in the try clause
        
        df_SX_all_calcs_concat.to_excel(writer, sheet_name='SX_Combined_Results', index=False, float_format = "%.3f")
    
    
    # For the overview excel file, only the df_SX_all_calcs_concat dataframe is needed. 
    # We will make 1 overview excelfiles with a few tabpages that we store in dictionary_overview_dataframes:
    # dictionary_overview_dataframes = {Total Region Area: df, Percentage NeuN Positive Area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }
    
    # In the first loop we initiate an empty overview dictionary that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes={}

    # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
    # and rename the header of the column with the values to the image_name 
    # (we go from e.g. image_name_S1 = 131297-1_S1_NeuN to column name = 131297-1)
    list_calculation_results=['Total Region Area (μm²)', 'Counts', 'Total NeuN Positive Area (μm²)', 'Extrapolated Cell Count', 'Cells/Region Area (mm²)', 
                              'Cells/Region Volume (mm³)', 'Percentage NeuN Positive Area']
    
    # print('list_calculation_results = ', list_calculation_results)
    
    for calculation_result in list_calculation_results:
        df_SX_all_calcs_concat_calculation= df_SX_all_calcs_concat[['Merged area name', calculation_result]].copy()
        df_SX_all_calcs_concat_calculation.rename(columns={calculation_result: image_name_S1.replace('_NeuN_S1', '')}, inplace=True)
    
        if count==1:
            # In the first loop we fill the empty overview dictionary with a dataframe with the values calculated in loop 1 
            dictionary_overview_dataframes[calculation_result]  = df_SX_all_calcs_concat_calculation.copy()

        elif count > 1 :
            # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary
            dictionary_overview_dataframes[calculation_result] = dictionary_overview_dataframes[calculation_result].merge(df_SX_all_calcs_concat_calculation, how='outer', on='Merged area name')

    # At the end, we delete some of the dataframes, to ensure they cannot be used in the next loop
    del(dict_df_SX_final)
    del(dict_df_SX_parent)
    del(dict_df_SX_parent_daughter1)
    del(dict_df_SX_parent_daughter2)
    del(dict_df_SX_parent_daughter3)
    del(df_SX_all_calcs_concat)

    
# After the for loops, we print the final overview tables
# Output the final overview tables to an excel file Overview_NeuN_Results.xlsx that is created in the output folder specified at the beginning of this notebook
output_file_name_overview = os.path.join(folder_output_results, 'Overview_NeuN_Results.xlsx')
    
list_calculation_results=['Total Region Area (μm²)', 'Total NeuN Positive Area (μm²)',
                          'Counts', 'Extrapolated Cell Count', 
                          'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)', 
                          'Percentage NeuN Positive Area'
                          ]

with pd.ExcelWriter(output_file_name_overview) as writer: 
    for calculation_result in list_calculation_results:
        calculation_result_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')
        
        print(f'Overview dataframe with all {calculation_result_clean} for all brains')
        display(dictionary_overview_dataframes[calculation_result])

        dictionary_overview_dataframes[calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")



## Part 3 - Automatic Hemisphere Analysis of all X*N Slides of all N Brains (injected vs uninjected)

In [None]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the modified file with hemisphere analysis for each specific image 
df_brainregions_injected=load_data_brainregions_injected(file_brainregions_injected)

# Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder, corresponding to the N unique brains
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_parent = {}   # {'_S1' : df_S1_parent, '_S2' : df_S2_parent, ..., '_SX' : df_SX_parent}
    dict_df_SX_parent_daughter1 = {}
    dict_df_SX_parent_daughter2 = {}
    dict_df_SX_parent_daughter3 = {}
    dict_df_SX_all_calcs_merged = {}
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above
        print('\n Analysis of ', file_location)
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)
        dict_df_SX_parent[appendix], dict_df_SX_parent_daughter1[appendix], dict_df_SX_parent_daughter2[appendix], dict_df_SX_parent_daughter3[appendix] = make_hierarchy(dict_df_SX_final[appendix])

    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat            = pd.concat(dict_df_SX_final.values(), axis=0)
    df_SX_parent_daughter1_concat = pd.concat(dict_df_SX_parent_daughter1.values(), axis=0)
    df_SX_parent_daughter2_concat = pd.concat(dict_df_SX_parent_daughter2.values(), axis=0)
    df_SX_parent_daughter3_concat = pd.concat(dict_df_SX_parent_daughter3.values(), axis=0)

    # Start from the fully concatenated dataframes of S1, S2... SX after replacing the wrong brainregions
    # For those dataframes, we will determine for each row wheter Daughter 1 was Injected or Uninjected. 
    # The brainregions we don't care about will be deleted because we are doing an inner join
    df_SX_final_injected            = df_SX_final_concat.merge(df_brainregions_injected, left_on=['Image', 'Area/object name'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter1_injected = df_SX_parent_daughter1_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter2_injected = df_SX_parent_daughter2_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_parent_daughter3_injected = df_SX_parent_daughter3_concat.merge(df_brainregions_injected, left_on=['Image', 'Daughter1'], right_on=['Image', 'Brainregion'], how='inner')

    
    # Do all the S1+ S2...+ SX INJECTED calculations, making use of the functions defined above. 
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for S1+S2 +...SX).
    try: 
        df_SX_all_calcs_injected = all_calculations(df_SX_parent_daughter1_injected, df_SX_parent_daughter2_injected, df_SX_parent_daughter3_injected, groupby_column1='Daughter1_Injected')
        df_SX_all_calcs_injected.sort_values('Merged area name', ascending=False, inplace=True)
        print('All calculations together for all SX INJECTED for ', file_location_S1)
        display(df_SX_all_calcs_injected)
    except:
        pass
    
  
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Hemisphere_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results_injected, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        df_SX_all_calcs_injected.to_excel(writer, sheet_name='SX_Hemisphere_Results', index=False, float_format = "%.3f")
        
   
    # For the overview excel file, only the df_SX_all_calcs_injected dataframe is needed. 
    # We will make 1 overview excelfiles with a few tabpages that we store in dictionary_overview_dataframes_injected:
    # dictionary_overview_dataframes_injected = {Total Region Area: df, Percentage NeuN Positive Area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }

    # In the first loop we initiate an empty overview dictionary that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes_injected={}
        
    # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
    # and rename the header of the column with the values to the image_name 
    # (we go from e.g. image_name_S1 = 131297-1_S1_NeuN to column name = 131297-1)
    list_calculation_results=['Total Region Area (μm²)', 'Total NeuN Positive Area (μm²)', 'Counts', 'Extrapolated Cell Count', 'Cells/Region Area (mm²)', 
                              'Cells/Region Volume (mm³)', 'Percentage NeuN Positive Area']
    
    # print('list_calculation_results = ', list_calculation_results)
    
    for calculation_result in list_calculation_results:
        df_SX_all_calcs_injected_calculation= df_SX_all_calcs_injected[['Merged area name', calculation_result]].copy()
        df_SX_all_calcs_injected_calculation.rename(columns={calculation_result: image_name_S1.replace('_NeuN_S1', '')}, inplace=True)
    
        if count==1:
            # In the first loop we fill the empty overview dictionary with a dataframe with the values calculated in loop 1 
            dictionary_overview_dataframes_injected[calculation_result]  = df_SX_all_calcs_injected_calculation.copy()

        elif count > 1 :
            # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary
            dictionary_overview_dataframes_injected[calculation_result] = dictionary_overview_dataframes_injected[calculation_result].merge(df_SX_all_calcs_injected_calculation, how='outer', on='Merged area name')

    # At the end, we delete some of the dataframes, to ensure they cannot be used in the next loop
    del(dict_df_SX_final)
    del(dict_df_SX_parent)
    del(dict_df_SX_parent_daughter1)
    del(dict_df_SX_parent_daughter2)
    del(dict_df_SX_parent_daughter3)
    del(df_SX_all_calcs_injected)

    
# After the for loops, we print the final overview tables
# Output the final overview tables to an excel file Overview_NeuN_Hemisphere_Results.xlsx that is created in the output folder specified at the beginning of this notebook
output_file_name_overview = os.path.join(folder_output_results_injected, 'Overview_NeuN_Hemisphere_Results.xlsx')
    
list_calculation_results=['Total Region Area (μm²)', 'Total NeuN Positive Area (μm²)',
                          'Counts', 'Extrapolated Cell Count', 
                          'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)', 
                          'Percentage NeuN Positive Area'
                          ]    
with pd.ExcelWriter(output_file_name_overview) as writer: 
    for calculation_result in list_calculation_results:
        calculation_result_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')
        
        print(f'Overview dataframe with all {calculation_result_clean} for all brains')
        display(dictionary_overview_dataframes_injected[calculation_result])

        dictionary_overview_dataframes_injected[calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")
