# Analysis of Microglial Cell Detector (MCD) model

## 0. Outline
This code deals with the automatic processing of raw data from mouse brains analysed with “Microglial Cell Detector” model developed in Aiforia® Create. We typically start from Excel/CSV files collected in a local folder on the computer that is specified in the code. To automatically change the format of a series of files, refer to the Change_Name_Format_Input_Data.ipynb notebook. The code is developed to take into account that a mouse brain can be mounted over several slides. Slides for each animal are named identically, except for a numeric postfix denoting the slide number: '_S1', '_S2', etc.

The present notebook is divided into 3 sections:

**1) Make the necessary functions for part 2 and part 3**


**2) Automatic analysis of X * N Slides of N Brains**

Here we automate the analysis of all X*N slide images of all N brains (X slides per brain, which is a parameter that the user can choose) in the folder with raw data. The approach is as follows:

1) We collect all the X*N names of the raw data files in the folder and put them in a list.

2) We make a list containing only the N filenames with an '_S1' in the name. These are the N first slides of the N brains.

3) We loop over these N first slide images belonging to the N brains and perform the following steps in each loop:

    a) We retrieve the second slide (containing an '_S2' in the filename of the raw data) belonging to this specific brain. We do the same for the third ('_S3'), fourth ('_S4'), ..., X'th ('_SX')  slides of the specific brain.   
    b) We perform the data analysis steps on the S1, S2, ..., SX slides separately, and also on the concatenated data of S1+S2+...+SX.     
    c) We output the results to an excel file for this specific brain.
    
After each loop, we add the output of this specific brain to an overview table that will contain all results for all brains. After the last loop, this overview table is also exported to an excel file.

**3) Automatic analysis of X * N Slides of N Brains after determining to which hemisphere they belong**

Here we add which brain regions are on each hemisphere, and compare the injected vs uninjected sides. Analysis occurs similar to section 2. 

## Part 1 - Make the necessary functions


### Part 1.1 - Load all necessary Python packages

In [1]:
# Import the required Python packages
import pandas as pd                                # For data analysis with dataframes
import math                                        # To get the value for pi
import functools                                   # For higher-order functions that work on other functions
from IPython.display import display                # Enables the display of more than one dataframe per code cell
import numpy as np                                 # For data analysis
import glob                                        # To get all raw data file locations
import os                                          # To get all raw data file locations
pd.options.display.float_format = '{:.2f}'.format  # Display all numbers in dataframes with 2 decimals


### Part 1.2 - Data locations

**TO DO:** 
- Specify the format of the raw data and the raw data folder location, as well as some experimental parameters.
- Specify the file paths of the excel file containing your quality control revisions and the excel file mapping each brain region to a hemisphere.
- Specify the folder locations where you would like to collect the output excel files (for whole brain and hemisphere analysis).  

The format is: <font color='darkred'>r'file_location'</font> 

In [2]:
# Specify what data format you want to use for your raw data: excel, csv or feather. Do this by uncommenting the data_format that you want.
data_format = 'csv'
# data_format = 'excel'
# data_format = 'feather'

# Specify the maximal amount of slides you have per animal brain. If this for instance is 4, we expect filenames containing '_S1', '_S2', '_S3' and '_S4'.
# If some animal brains have less slides, no problem. The code will create empty data files for the missing slides so it can run properly.
amount_of_slides = 4

# Specify the experimental parameters (section_thickness in micrometers) and locations:
# The spacing parameter refers to the serial section spacing interval. It's the interval at which you sample the brain volume for analysis, not the physical distance between each section. For example, if you have a spacing parameter of 10, you would take every 10th section for your analysis.  
spacing=12
section_thickness = 40
folder_raw_data = r'C:\Users\...\Raw_Data_MCD'
file_brainregions_to_replace =  r'C:\Users\...\Brainregions_To_Replace_MCD.xlsx'
file_brainregions_injected =  r'C:\Users\...\Brainregions_Hemisphere_MCD.xlsx'
folder_output_results = r'C:\Users\...\Results_Wholebrain_MCD'
folder_output_results_injected = r'C:\Users\...\Results_Hemisphere_MCD'

In [3]:
# Make the output folders if they did not exist yet
if not os.path.isdir(folder_output_results):
    os.mkdir(folder_output_results)
if not os.path.isdir(folder_output_results_injected):
    os.mkdir(folder_output_results_injected)

# Make the list of filename appendices that are expected. For instance if amount_of_slides = 4, then appendices_list = ['_S1', '_S2', '_S3', '_S4']
appendices_list = [f"_S{i}" for i in range(1, amount_of_slides + 1)]

### Part 1.3 - Function to load all image files that need to be analyzed

In [4]:
def load_all_file_locations_S1(folder_raw_data):
    """
    Make a list of all file locations for S1 images present in the folder with all raw data files.
    It's thus important that the filenames contain '_S1' in their name, even if there is no '_S2' counterparty. 
    If you don't work with '_S1' and '_S2', then just append '_S1' to the filenames to make the code work.
    Output: list of all file locations for S1 images.
    """

    if data_format == 'csv':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.csv"))
    elif data_format == 'excel':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.xlsx"))
    elif data_format == 'feather':
        all_raw_data_file_locations = glob.glob(os.path.join(folder_raw_data, "*.feather"))
    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')
        
    all_raw_data_file_locations.sort()

    print('The location of all the raw data files = ')
    for file_location in all_raw_data_file_locations:
        print(file_location)

    # Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
    all_raw_data_file_locations_S1= [x for x in all_raw_data_file_locations if '_S1' in x]
    print('\nThe location of all the raw S1 data files = ')
    for file_location_S1 in all_raw_data_file_locations_S1:
        print(file_location_S1)
        
    return all_raw_data_file_locations_S1

### Part 1.4 - Function to load the file with corrections for the brainregions

In [5]:
def load_data_brainregions_to_replace(file_brainregions_to_replace):
    """
    Load the file containing the corrections for brain regions that need to be replaced for each specific image.
    Output: cleaned dataframe with brain regions that need to be replaced for each image.
    """
    
    df_brainregions_to_replace_raw=pd.read_excel(file_brainregions_to_replace,
                                                 usecols=['Image', 'Brainregion_Wrong', 'Brainregion_Correct'],
                                                 dtype={'Image': 'str', 'Brainregion_Wrong': 'str', 'Brainregion_Correct': 'str'}
                                                )

    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_to_replace=df_brainregions_to_replace_raw.copy()
    df_brainregions_to_replace['Image'] = df_brainregions_to_replace_raw['Image'].str.strip()
    df_brainregions_to_replace['Brainregion_Wrong'] = df_brainregions_to_replace_raw['Brainregion_Wrong'].str.upper().str.strip()
    df_brainregions_to_replace['Brainregion_Correct'] = df_brainregions_to_replace_raw['Brainregion_Correct'].str.upper().str.strip()

    #     print('The raw table of the brain regions to replace for each image = ')
    #     display(df_brainregions_to_replace_raw)

    print('The modified table of the brain regions to replace for each image = ')
    display(df_brainregions_to_replace)
    
    return df_brainregions_to_replace

### Part 1.5 - Function to load the file with which brainregions were injected


In [6]:
def load_data_brainregions_injected(file_brainregions_injected):
    """
    Load the file specifying which brainregions were on the injected side for each specific image.
    Output: cleaned dataframe with brain regions that were injected for each image.
    """
    
    df_brainregions_injected_raw=pd.read_excel(file_brainregions_injected,
                                               usecols=['Image', 'Brainregion', 'Hemisphere'],
                                               dtype={'Image': 'str', 'Brainregion': 'str', 'Hemisphere': 'str'}
                                               )
    
    # Modify the dataframe to delete spaces that are by accident there, and put the brainregions in upper case 
    df_brainregions_injected=df_brainregions_injected_raw.copy()
    df_brainregions_injected['Image'] = df_brainregions_injected_raw['Image'].str.strip()
    df_brainregions_injected['Brainregion'] = df_brainregions_injected_raw['Brainregion'].str.upper().str.strip()
    df_brainregions_injected['Parent_Injected'] = df_brainregions_injected_raw['Hemisphere'].str.upper().str.strip()
    df_brainregions_injected['Daughter1_Injected'] = df_brainregions_injected_raw['Hemisphere'].str.upper().str.strip()
    df_brainregions_injected.drop(columns=['Hemisphere'], inplace=True)

    #     print('The raw table of the brain regions injected for each image = ')
    #     display(df_brainregions_injected_raw)

    print('The modified table of the brain regions injected for each image = ')
    display(df_brainregions_injected)
    
    return df_brainregions_injected

### Part 1.6 - Function to load dataframe and clean it


In [7]:
def dataframe_cleaning(file_location, df_brainregions_to_replace):
    """
    Load the specific file location in a dataframe and clean it with df_brainregions_to_replace.
    Output: loaded and cleaned dataframe with some additional calculated values.
    """
    if data_format == 'csv':
        # Check if the  file exists. If not, we make an empty CSV file with the right columns:
        try:
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)','Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_csv(file_location, sep='\t')
            df_1=pd.read_csv(file_location, sep='\t',
                             usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'],
                             dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                    'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                             keep_default_na = True) 
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')
            
    elif data_format == 'excel':
        # Check if the  file exists. If not, we make an empty excel file with the right columns:
        try:
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_excel(file_location)
            df_1=pd.read_excel(file_location,
                               usecols=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)','Circumference (µm)'],
                               dtype={'Image': 'str', 'Parent area name': 'str', 'Area/object name': 'str', 
                                      'Class label': 'str', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' },
                               keep_default_na = True)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')
            
    elif data_format == 'feather':
        # Check if the  file exists. If not, we make an empty feather file with the right columns:
        try:
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
        except:
            df_empty = pd.DataFrame(columns=['Image', 'Parent area name', 'Area/object name', 'Class label', 'Area (μm²)', 'Circumference (µm)'])
            df_empty.reset_index(inplace=True)
            df_empty.to_feather(file_location)
            df_1=pd.read_feather(file_location) 
            dtype_dictionary = {'Image': 'object', 'Parent area name': 'object', 'Area/object name': 'object', 
                                'Class label': 'object', 'Area (μm²)': 'float64','Circumference (µm)': 'float64' }
            df_1=df_1.astype(dtype_dictionary)
            print(f'\n A dataframe at location {file_location} did not exist, so we made an empty dataframe.')

    else:
        print('You did not specify a correct data-format in Part 1.2 and can expect some errors in the rest of the code')
        

    # Delete the rows with an empty 'Parent area name' or empty Area (μm²) or Area/object name or Class label
    df_1.dropna(subset =['Parent area name', 'Area (μm²)', 'Area/object name', 'Class label'] , how='any', inplace=True)

    # Put all columns in capitals to never make mistakes against capitalization
    df_1['Parent area name'] = df_1['Parent area name'].str.upper()
    df_1['Area/object name'] = df_1['Area/object name'].str.upper()
    df_1['Class label']      = df_1['Class label'].str.upper()

    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty, 
    # and getting from filename makes more sense anyway) and change some field based on the recipe. 
    full_name = os.path.basename(file_location)
    file_name = os.path.splitext(full_name)
    image_name = file_name[0]
    print('The present image=', image_name)
    
    # Make sure the image name across the whole first column is correct
    df_1['Image']=image_name
    
    print('The full raw data=')
    display(df_1)

    # Determine the dictionary of brain regions that should be replaced for this specific image
    df_brainregions_to_replace = df_brainregions_to_replace[df_brainregions_to_replace['Image']==image_name]
    dict_brainregions_to_replace= pd.Series(df_brainregions_to_replace.Brainregion_Correct.values, index=df_brainregions_to_replace.Brainregion_Wrong).to_dict()

    print('The dictionary of brain regions to replace for this specific image', image_name, 'is', dict_brainregions_to_replace)

    # Replace the value in the rows that have a Parent area name or Area/object name that is in list_brainregions_replace
    df_2=df_1.copy()
    df_2['Parent area name'] = df_1['Parent area name'].replace(dict_brainregions_to_replace, regex=False)
    df_2['Area/object name'] = df_1['Area/object name'].replace(dict_brainregions_to_replace, regex=False)
    
    # Create a column 'Parent area name merged' and 'Area/object name merged' where the numbers are deleted from these columns:
    df_2['Parent area name merged'] = df_2['Parent area name'].str.replace('\\d+', '', regex=True).str.strip()
    df_2['Area/object name merged'] = df_2['Area/object name'].str.replace('\\d+', '', regex=True).str.strip()

    # This part is not needed for the IBA1 code as the hierarchy doesn't extend till a daugher 3.
    # The rows in which we put a parent empty, can have an area/object name that itself occurs as parent and that should also be deleted
    # (basically the daugher 3 of the empty parent should also be deleted)
    # df_empty_parent = df_2[df_2['Parent area name']=='EMPTY']
    # list_of_area_objects_that_should_be_empty = df_empty_parent['Area/object name'].to_list()
    # print('list_of_area_objects_that_should_be_empty = ', list_of_area_objects_that_should_be_empty)
    # df_2.loc[df_2["Parent area name"].isin(list_of_area_objects_that_should_be_empty), "Parent area name"] = "EMPTY"

    # Delete the rows in which we just made the Parent area name or Area/object name 'EMPTY' by replacing them with the dictionary
    df_3 = df_2[(df_2['Parent area name']!='EMPTY') &  (df_2['Area/object name']!='EMPTY')]

    # We delete the rows that 
    # - have an area < 45 and class label = 'Iba1 Positive Cell'
    df_3x = df_3[ ~( (df_3['Area (μm²)'] < 45) & (df_3['Class label'] == 'Iba1 Positive Cell') )]    

    # Calculate the Area/Perimeter (μm) and the circularity
    df_4 = df_3x.copy()
    df_4['Area/Perimeter (μm)'] = df_3x['Area (μm²)']/df_4['Circumference (µm)']
    df_4['Circularity'] = (4 * math.pi * df_3x['Area (μm²)'])/ (df_3x['Circumference (µm)'])**2

    # Show the full updated dataframe:
    print('The fully cleaned table with "Area/Perimeter", "Circularity" and so on:')
    display(df_4)
    
    return df_4


### Part 1.7 - Function to make hierarchical dataframes


In [8]:
# This function will not be used for IBA-1 code, as the hierarchy does not extend till daughter 3. 
# We just have 1 type of parent (Tissue 1, 2, 3 etc) with many types of daughter 1 (Amygdala 1, 2, 3 etc,  Striatum 1, 2, 3 etc) 
# and 1 type of daughter 2 (Iba1 Positive Cell 60451, Iba1 Positive Cell 354269 etc)
def make_hierarchy(df):
    """ 
    Here we make the hierarchical structure of the data in the dataframe more clear. 
    The field 'Parent area name' is always the parent of the 'Area/object name' in the same row. 
    The area in the row always belongs to the 'Area/object name'.
    Output: four dataframes in which gradually more hierarchy is added.
    """

    # The rows with the top parent (= BRAIN TISSUE X) are the rows that don't have an own Parent area name
    df_parent_almost = df[df['Parent area name'].isna()]
    dict_parent={'Area/object name':'Parent name', 'Area/object name merged': 'Parent name merged', 'Area (μm²)': 'Area Parent (μm²)',
                'Area/Perimeter (μm)': 'Area/Perimeter Parent (μm)', 'Circularity': 'Circularity Parent'}
    df_parent=df_parent_almost.rename(columns=dict_parent)
    df_parent.drop(columns=['Parent area name', 'Class label', 'Parent area name merged','Area/Perimeter Parent (μm)', 'Circularity Parent' ], inplace=True)

    # Then we add the first daughter = the daughter of the top parents
    df_parent_daughter1_almost=df_parent.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)']], left_on='Parent name', right_on='Parent area name', how='inner')
    dict_daughter1 = {'Parent area name': 'Parent name copy', 'Area/object name':'Daughter1', 
                      'Area/object name merged': 'Daughter1 merged', 'Area (μm²)': 'Area Daughter1 (μm²)'}

    df_parent_daughter1_almost2=df_parent_daughter1_almost.rename(columns=dict_daughter1)
    # Groupby is needed because there now can be for instance 2 Striatum 4's 
    # (one of them originated from e.g. changing Amygdala 1 to Striatum 4 in the brainregion corrections)
    # We need to turn this Striatum 4 into a unique row because otherwise we will double in the next join when making df_parent_daughter2
    df_parent_daughter1=df_parent_daughter1_almost2.groupby(['Daughter1'], as_index=False).agg(
        {'Image': 'first', 'Parent name': 'first', 'Area Parent (μm²)': 'first',
         'Parent name merged': 'first', 'Parent name copy': 'first', 'Daughter1': 'first',
         'Daughter1 merged': 'first', 'Area Daughter1 (μm²)': 'sum'})

    # Then we add the second daughter = the daughter of daughter 1
    df_parent_daughter2_almost=df_parent_daughter1.merge(df[['Parent area name', 'Area/object name', 'Area/object name merged', 'Area (μm²)', 'Area/Perimeter (μm)' , 'Circularity' ]], left_on='Daughter1', right_on='Parent area name', how='inner')
    dict_daughter2 = {'Parent area name': 'Daughter1 copy','Area/object name':'Daughter2', 
                      'Area/object name merged': 'Daughter2 merged', 'Area (μm²)': 'Area Daughter2 (μm²)',
                      'Area/Perimeter (μm)': 'Area/Perimeter Daughter2 (μm)', 'Circularity': 'Circularity Daughter2'}
    df_parent_daughter2=df_parent_daughter2_almost.rename(columns=dict_daughter2)


#     print('Original df')
#     display(df)
#     print('df_parent')
#     display(df_parent)
#     print('df_parent_daughter1')
#     display(df_parent_daughter1)
#     print('df_parent_daughter2')
#     display(df_parent_daughter2)
    
    return df_parent, df_parent_daughter1, df_parent_daughter2

### Part 1.8 - Function to calculate all information for all daughter 1's


In [9]:
def all_calculations(df1, df2, groupby_column1='Parent area name merged', groupby_column2='Area/object name merged'):
    """
    Make the main calculations (areas, counts...) based on 2 dataframes (df1 and df2, but df1=df2 for the calculations without 'injected'), 
    and based on a groupby columns that can be chosen.
    Output: dataframe with all calculations.
    """

    # Count the number of rows for each parent area name merged 
    df_counts_merged = df1.value_counts(groupby_column1, sort=True).rename_axis('Merged area name').reset_index(name='Counts')
    # print('The number of rows for each Parent area name merged =')
    # display(df_counts_merged)
    
    # Count the total area of each Area/object name merged (e.g. Amygdala 1 + Amygdala 7 + ... area)
    df_total_region_area_merged = df2.groupby(groupby_column2).sum()['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Total Region Area (μm²)')
    # print('The total region area of each Area/object name merged')
    # display(df_total_region_area_merged)

    # Calculate the total Area (μm²) of the cells belonging to each Parent area name merged
    df_total_cell_area_merged = df1.groupby(groupby_column1).sum(numeric_only=True)['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Total Cell Area (μm²)')
    # print('The total Area (μm²) of the cells belonging to each Parent area name merged =')
    # display(df_total_cell_area_merged)
    
    # Calculate the average Area (μm²) of the cells belonging to each Parent area name merged
    df_average_cell_area_merged = df1.groupby(groupby_column1).mean(numeric_only=True)['Area (μm²)'].rename_axis('Merged area name').reset_index(name='Average Cell Area (μm²)')
    # print('The average Area (μm²) of the cells belonging to each Parent area name merged =')
    # display(df_average_cell_area_merged)
   
    # Calculate the average Area/Perimeter (μm) of the cells belonging to each Parent area name merged
    df_average_area_perimeter_merged = df1.groupby(groupby_column1).mean(numeric_only=True)['Area/Perimeter (μm)'].rename_axis('Merged area name').reset_index(name='Average Area/Perimeter (μm)')
    # print('The average Area/Perimeter (μm) of the cells belonging to each Parent area name merged =')
    # display(df_average_area_perimeter_merged)

    # Calculate the average circularity of the cells belonging to each Parent area name merged
    df_average_circularity_merged = df1.groupby(groupby_column1).mean(numeric_only=True)['Circularity'].rename_axis('Merged area name').reset_index(name='Average Circularity')
    # print('The average Circularity of the cells belonging to each Parent area name merged =')
    # display(df_average_circularity_merged)

    dfs_to_merge = [df_counts_merged, df_total_region_area_merged, df_total_cell_area_merged, df_average_cell_area_merged, 
                    df_average_area_perimeter_merged, df_average_circularity_merged]
    df_all_calcs_merged  = functools.reduce(lambda left, right: pd.merge(left,right,on='Merged area name', how='outer'), dfs_to_merge)

    # Put all calculated results together
    df_all_calcs_merged['Extrapolated Cell Count']    = df_all_calcs_merged['Counts']*spacing
    df_all_calcs_merged['Percentage IBA1 Positive Area']    = 100* df_all_calcs_merged['Total Cell Area (μm²)']/df_all_calcs_merged['Total Region Area (μm²)']
    df_all_calcs_merged['Cells/Region Area (per μm²)']  = df_all_calcs_merged['Counts']/df_all_calcs_merged['Total Region Area (μm²)']
    df_all_calcs_merged['Cells/Region Volume (per μm³)']= df_all_calcs_merged['Cells/Region Area (per μm²)']/section_thickness
    df_all_calcs_merged['Cells/Region Area (mm²)']  = df_all_calcs_merged['Cells/Region Area (per μm²)']*1000000
    df_all_calcs_merged['Cells/Region Volume (mm³)']= df_all_calcs_merged['Cells/Region Volume (per μm³)']*1000000000
    
    df_all_calcs_merged.drop(columns=['Cells/Region Area (per μm²)', 'Cells/Region Volume (per μm³)'], inplace=True)
    df_all_calcs_merged.sort_values('Merged area name',inplace=True)
    
    print('The total Calculations of each Daugher1 merged:')
    display(df_all_calcs_merged)
    
    return df_all_calcs_merged

## Part 2 - Automatic Wholebrain Analysis of all X*N Slides of all N Brains


In [10]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    
    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_all_calcs_merged = {} # {'_S1' : df_S1_all_calcs_merged, '_S2' : df_S2_all_calcs_merged, ..., '_SX' : df_SX_all_calcs_merged}
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above    
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)

        # Do all the calculations, making use of the functions defined above. 
        # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for appendix =_S1).
        try: 
            dict_df_SX_all_calcs_merged[appendix] = all_calculations(dict_df_SX_final[appendix], dict_df_SX_final[appendix])
            print(f"All calculations together for {appendix} for {file_location}")
            display(dict_df_SX_all_calcs_merged[appendix])
        except:
            pass

    
    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)


    # Do all the S1 + S2 + ... + SX calculations, making use of the functions defined above. 
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for this concatenated df).
    try: 
        df_SX_all_calcs_concat = all_calculations(df_SX_final_concat, df_SX_final_concat)
        print('All calculations together for S1+S2+...+SX concatenated for ', file_location_S1)
        display(df_SX_all_calcs_concat)
    except:
        pass

    
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results, output_file_name_SX)
 
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
            try:
                dict_df_SX_all_calcs_merged[appendix].to_excel(writer, sheet_name=appendix[1:]+'_Results', index=False, float_format = "%.3f")
            except:
                pass  # No SX dataframe was available, and the empty one would lead to errors in the try clause
        
        df_SX_all_calcs_concat.to_excel(writer, sheet_name='SX_Combined_Results', index=False, float_format = "%.3f")
    
    
    # For the overview excel file, only the df_SX_all_calcs_concat dataframe is needed. 
    # We will make 1 overview excelfiles with a few tabpages that we store in dictionary_overview_dataframes:
    # dictionary_overview_dataframes = {Total Region area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }
    
    # In the first loop we initiate an empty overview dictionary that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes={}

    # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
    # and rename the header of the column with the values to the image_name 
    # (we go from e.g. image_name_S1 = 131297-1_S1_IBA1 to column name = 131297-1)
    list_calculation_results=['Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count', 
                              'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
                              'Average Area/Perimeter (μm)', 'Average Circularity', 
                              'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
                              ]
    
    # print('list_calculation_results = ', list_calculation_results)
    brainregions_not_needed = ['Iba1 Positive Cell', 'TISSUE']
    for calculation_result in list_calculation_results:
        df_SX_all_calcs_concat_calculation= df_SX_all_calcs_concat[['Merged area name', calculation_result]].copy()
        df_SX_all_calcs_concat_calculation = df_SX_all_calcs_concat_calculation[~df_SX_all_calcs_concat_calculation['Merged area name'].isin(brainregions_not_needed)]
        df_SX_all_calcs_concat_calculation.rename(columns={calculation_result: image_name_S1.replace('_S1_IBA1', '')}, inplace=True)
    
        if count==1:
            # In the first loop we fill the empty overview dictionary with a dataframe with the values calculated in loop 1  
            dictionary_overview_dataframes[calculation_result]  = df_SX_all_calcs_concat_calculation.copy()

        elif count > 1 :
            # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary
            dictionary_overview_dataframes[calculation_result] = dictionary_overview_dataframes[calculation_result].merge(df_SX_all_calcs_concat_calculation, how='outer', on='Merged area name')

    # At the end, we delete some of the dataframes, so they cannot be used in the next loop
    del(dict_df_SX_final)
    del(df_SX_final_concat)
    del(df_SX_all_calcs_concat)
    

# After the for loops, we print the final overview tables
# Output the final overview tables to an excel file Overview_IBA1_Results.xlsx that is created in the output folder specified at the beginning of this notebook
output_file_name_overview = os.path.join(folder_output_results, 'Overview_IBA1_Results.xlsx')
    
list_calculation_results=['Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count', 
                          'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
                          'Average Area/Perimeter (μm)', 'Average Circularity', 
                          'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
                          ]
    
with pd.ExcelWriter(output_file_name_overview) as writer: 
    for calculation_result in list_calculation_results:
        calculation_result_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')
        
        print(f'Overview dataframe with all {calculation_result_clean} for all brains')
        display(dictionary_overview_dataframes[calculation_result])

        dictionary_overview_dataframes[calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")


The modified table of the brain regions to replace for each image = 


Unnamed: 0,Image,Brainregion_Wrong,Brainregion_Correct
0,131297-1_S1_IBA1,CEREBELLAR PEDUNCLES 1,CINGULATE CORTEX 6
1,131297-1_S1_IBA1,GLOBUS PALLIDUS 3,STRIATUM 7
2,131297-1_S2_IBA1,FIMBRIA 1,EMPTY
3,131297-1_S2_IBA1,FIMBRIA 2,HIPPOCAMPUS 5
4,131297-1_S2_IBA1,THALAMUS 3,PERIAQUEDUCTAL GRAY 10
...,...,...,...
64,,,
65,,,
66,,,
67,,,


The location of all the raw data files = 
C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S1_IBA1.csv
C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S2_IBA1.csv
C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S3_IBA1.csv
C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S4_IBA1.csv

The location of all the raw S1 data files = 
C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S1_IBA1.csv
The present image= 131297-1_S1_IBA1
The full raw data=


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm)
0,131297-1_S1_IBA1,TISSUE 6,AMYGDALA 1,AMYGDALA,175928.54,
1,131297-1_S1_IBA1,TISSUE 6,AMYGDALA 6,AMYGDALA,226329.21,
2,131297-1_S1_IBA1,TISSUE 4,AMYGDALA 4,AMYGDALA,1063228.80,
3,131297-1_S1_IBA1,TISSUE 4,AMYGDALA 5,AMYGDALA,998080.06,
4,131297-1_S1_IBA1,TISSUE 5,AMYGDALA 2,AMYGDALA,784435.85,
...,...,...,...,...,...,...
63764,131297-1_S1_IBA1,TISSUE 11,TAENIA TECTA 1,TAENIA TECTA,1145515.81,
63765,131297-1_S1_IBA1,TISSUE 3,THALAMUS 3,THALAMUS,4484718.23,
63766,131297-1_S1_IBA1,TISSUE 6,THALAMUS 4,THALAMUS,7437031.46,
63767,131297-1_S1_IBA1,TISSUE 4,THALAMUS 1,THALAMUS,9592859.15,


The dictionary of brain regions to replace for this specific image 131297-1_S1_IBA1 is {'CEREBELLAR PEDUNCLES 1': 'CINGULATE CORTEX 6', 'GLOBUS PALLIDUS 3': 'STRIATUM 7'}
The fully cleaned table with "Area/Perimeter", "Circularity" and so on:


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm),Parent area name merged,Area/object name merged,Area/Perimeter (μm),Circularity
0,131297-1_S1_IBA1,TISSUE 6,AMYGDALA 1,AMYGDALA,175928.54,,TISSUE,AMYGDALA,,
1,131297-1_S1_IBA1,TISSUE 6,AMYGDALA 6,AMYGDALA,226329.21,,TISSUE,AMYGDALA,,
2,131297-1_S1_IBA1,TISSUE 4,AMYGDALA 4,AMYGDALA,1063228.80,,TISSUE,AMYGDALA,,
3,131297-1_S1_IBA1,TISSUE 4,AMYGDALA 5,AMYGDALA,998080.06,,TISSUE,AMYGDALA,,
4,131297-1_S1_IBA1,TISSUE 5,AMYGDALA 2,AMYGDALA,784435.85,,TISSUE,AMYGDALA,,
...,...,...,...,...,...,...,...,...,...,...
63764,131297-1_S1_IBA1,TISSUE 11,TAENIA TECTA 1,TAENIA TECTA,1145515.81,,TISSUE,TAENIA TECTA,,
63765,131297-1_S1_IBA1,TISSUE 3,THALAMUS 3,THALAMUS,4484718.23,,TISSUE,THALAMUS,,
63766,131297-1_S1_IBA1,TISSUE 6,THALAMUS 4,THALAMUS,7437031.46,,TISSUE,THALAMUS,,
63767,131297-1_S1_IBA1,TISSUE 4,THALAMUS 1,THALAMUS,9592859.15,,TISSUE,THALAMUS,,


The total Calculations of each Daugher1 merged:


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
12,AMYGDALA,1301.0,3966632.93,847877.56,651.71,2.07,0.09,15612.0,21.38,327.99,8199.65
14,ANTERIOR COMMISSURE,273.0,917708.86,94067.98,344.57,1.63,0.12,3276.0,10.25,297.48,7437.0
9,CINGULATE CORTEX,2222.0,8392166.21,1278183.97,575.24,1.98,0.1,26664.0,15.23,264.77,6619.27
7,CORPUS CALLOSUM,3468.0,11845885.99,1227059.87,353.82,1.58,0.11,41616.0,10.36,292.76,7319.0
11,FIMBRIA,1672.0,5159742.22,606190.26,362.55,1.64,0.11,20064.0,11.75,324.05,8101.18
10,GLOBUS PALLIDUS,2020.0,4059853.2,1129029.68,558.93,2.19,0.12,24240.0,27.81,497.55,12438.87
5,HIPPOCAMPUS,4845.0,14599578.01,2997587.08,618.7,2.05,0.1,58140.0,20.53,331.86,8296.47
4,HYPOTHALAMUS,4912.0,17330539.5,2666778.14,542.91,1.85,0.1,58944.0,15.39,283.43,7085.76
16,IBA POSITIVE CELL,,36308105.62,,,,,,,,
2,MOTOR CORTEX,6523.0,22114419.32,3512398.2,538.46,1.92,0.1,78276.0,15.88,294.97,7374.15


All calculations together for _S1 for C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S1_IBA1.csv


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
12,AMYGDALA,1301.0,3966632.93,847877.56,651.71,2.07,0.09,15612.0,21.38,327.99,8199.65
14,ANTERIOR COMMISSURE,273.0,917708.86,94067.98,344.57,1.63,0.12,3276.0,10.25,297.48,7437.0
9,CINGULATE CORTEX,2222.0,8392166.21,1278183.97,575.24,1.98,0.1,26664.0,15.23,264.77,6619.27
7,CORPUS CALLOSUM,3468.0,11845885.99,1227059.87,353.82,1.58,0.11,41616.0,10.36,292.76,7319.0
11,FIMBRIA,1672.0,5159742.22,606190.26,362.55,1.64,0.11,20064.0,11.75,324.05,8101.18
10,GLOBUS PALLIDUS,2020.0,4059853.2,1129029.68,558.93,2.19,0.12,24240.0,27.81,497.55,12438.87
5,HIPPOCAMPUS,4845.0,14599578.01,2997587.08,618.7,2.05,0.1,58140.0,20.53,331.86,8296.47
4,HYPOTHALAMUS,4912.0,17330539.5,2666778.14,542.91,1.85,0.1,58944.0,15.39,283.43,7085.76
16,IBA POSITIVE CELL,,36308105.62,,,,,,,,
2,MOTOR CORTEX,6523.0,22114419.32,3512398.2,538.46,1.92,0.1,78276.0,15.88,294.97,7374.15


The present image= 131297-1_S2_IBA1
The full raw data=


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm)
0,131297-1_S2_IBA1,TISSUE 2,CEREBELLAR PEDUNCLES 4,CEREBELLAR PEDUNCLES,681842.22,
1,131297-1_S2_IBA1,TISSUE 3,CEREBELLAR PEDUNCLES 1,CEREBELLAR PEDUNCLES,376768.11,
2,131297-1_S2_IBA1,TISSUE 3,CEREBELLAR PEDUNCLES 2,CEREBELLAR PEDUNCLES,66803.21,
3,131297-1_S2_IBA1,TISSUE 3,CEREBELLAR PEDUNCLES 5,CEREBELLAR PEDUNCLES,71056.61,
4,131297-1_S2_IBA1,TISSUE 1,CEREBELLAR PEDUNCLES 3,CEREBELLAR PEDUNCLES,660902.58,
...,...,...,...,...,...,...
44069,131297-1_S2_IBA1,TISSUE 7,THALAMUS 7,THALAMUS,886660.93,
44070,131297-1_S2_IBA1,TISSUE 1,THALAMUS 5,THALAMUS,299906.08,
44071,131297-1_S2_IBA1,TISSUE 1,THALAMUS 8,THALAMUS,271536.01,
44072,131297-1_S2_IBA1,TISSUE 5,THALAMUS 1,THALAMUS,431103.35,


The dictionary of brain regions to replace for this specific image 131297-1_S2_IBA1 is {'FIMBRIA 1': 'EMPTY', 'FIMBRIA 2': 'HIPPOCAMPUS 5', 'THALAMUS 3': 'PERIAQUEDUCTAL GRAY 10', 'THALAMUS 12': 'PERIAQUEDUCTAL GRAY 10', 'HIPPOCAMPUS 9': 'PERIAQUEDUCTAL GRAY 10', 'HIPPOCAMPUS 7': 'PERIAQUEDUCTAL GRAY 10', 'SEPTAL NUCLEUS 1': 'PERIAQUEDUCTAL GRAY 10', 'THALAMUS 8': 'SUBSTANTIA NIGRA 5', 'MIDBRAIN 4': 'EMPTY', 'PERIAQUEDUCTAL GRAY 5': 'EMPTY', 'PONS 3': 'EMPTY', 'CEREBELLAR PEDUNCLES 1': 'EMPTY', 'CEREBELLAR PEDUNCLES 2': 'EMPTY', 'CEREBELLAR PEDUNCLES 5': 'EMPTY'}
The fully cleaned table with "Area/Perimeter", "Circularity" and so on:


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm),Parent area name merged,Area/object name merged,Area/Perimeter (μm),Circularity
0,131297-1_S2_IBA1,TISSUE 2,CEREBELLAR PEDUNCLES 4,CEREBELLAR PEDUNCLES,681842.22,,TISSUE,CEREBELLAR PEDUNCLES,,
4,131297-1_S2_IBA1,TISSUE 1,CEREBELLAR PEDUNCLES 3,CEREBELLAR PEDUNCLES,660902.58,,TISSUE,CEREBELLAR PEDUNCLES,,
5,131297-1_S2_IBA1,TISSUE 4,CORPUS CALLOSUM 1,CORPUS CALLOSUM,23103.73,,TISSUE,CORPUS CALLOSUM,,
7,131297-1_S2_IBA1,TISSUE 4,FIMBRIA 3,FIMBRIA,230932.70,,TISSUE,FIMBRIA,,
8,131297-1_S2_IBA1,TISSUE 6,HIPPOCAMPUS 5,FIMBRIA,27898.54,,TISSUE,HIPPOCAMPUS,,
...,...,...,...,...,...,...,...,...,...,...
44069,131297-1_S2_IBA1,TISSUE 7,THALAMUS 7,THALAMUS,886660.93,,TISSUE,THALAMUS,,
44070,131297-1_S2_IBA1,TISSUE 1,THALAMUS 5,THALAMUS,299906.08,,TISSUE,THALAMUS,,
44071,131297-1_S2_IBA1,TISSUE 1,SUBSTANTIA NIGRA 5,THALAMUS,271536.01,,TISSUE,SUBSTANTIA NIGRA,,
44072,131297-1_S2_IBA1,TISSUE 5,THALAMUS 1,THALAMUS,431103.35,,TISSUE,THALAMUS,,


The total Calculations of each Daugher1 merged:


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
7,CEREBELLAR PEDUNCLES,424.0,1342744.8,148357.74,349.9,1.82,0.14,5088.0,11.05,315.77,7894.28
10,CORPUS CALLOSUM,10.0,23103.73,3299.37,329.94,1.85,0.16,120.0,14.28,432.83,10820.76
8,FIMBRIA,76.0,230932.7,32044.07,421.63,1.67,0.1,912.0,13.88,329.1,8227.51
1,HIPPOCAMPUS,9701.0,31760034.95,6279402.93,647.29,2.06,0.1,116412.0,19.77,305.45,7636.17
3,HYPOTHALAMUS,2713.0,9644502.57,1413290.49,520.93,1.81,0.1,32556.0,14.65,281.3,7032.5
11,IBA POSITIVE CELL,,20739888.3,,,,,,,,
0,MIDBRAIN,14199.0,48490353.16,7566571.81,532.89,1.76,0.09,170388.0,15.6,292.82,7320.53
6,PERIAQUEDUCTAL GRAY,2485.0,8629867.4,1348856.65,542.8,1.8,0.09,29820.0,15.63,287.95,7198.84
4,PONS,2619.0,8362390.88,1026063.21,391.78,1.68,0.11,31428.0,12.27,313.19,7829.7
5,SUBSTANTIA NIGRA,2577.0,5603576.83,1385417.91,537.61,2.09,0.11,30924.0,24.72,459.88,11497.12


All calculations together for _S2 for C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S2_IBA1.csv


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
7,CEREBELLAR PEDUNCLES,424.0,1342744.8,148357.74,349.9,1.82,0.14,5088.0,11.05,315.77,7894.28
10,CORPUS CALLOSUM,10.0,23103.73,3299.37,329.94,1.85,0.16,120.0,14.28,432.83,10820.76
8,FIMBRIA,76.0,230932.7,32044.07,421.63,1.67,0.1,912.0,13.88,329.1,8227.51
1,HIPPOCAMPUS,9701.0,31760034.95,6279402.93,647.29,2.06,0.1,116412.0,19.77,305.45,7636.17
3,HYPOTHALAMUS,2713.0,9644502.57,1413290.49,520.93,1.81,0.1,32556.0,14.65,281.3,7032.5
11,IBA POSITIVE CELL,,20739888.3,,,,,,,,
0,MIDBRAIN,14199.0,48490353.16,7566571.81,532.89,1.76,0.09,170388.0,15.6,292.82,7320.53
6,PERIAQUEDUCTAL GRAY,2485.0,8629867.4,1348856.65,542.8,1.8,0.09,29820.0,15.63,287.95,7198.84
4,PONS,2619.0,8362390.88,1026063.21,391.78,1.68,0.11,31428.0,12.27,313.19,7829.7
5,SUBSTANTIA NIGRA,2577.0,5603576.83,1385417.91,537.61,2.09,0.11,30924.0,24.72,459.88,11497.12


The present image= 131297-1_S3_IBA1
The full raw data=


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm)


The dictionary of brain regions to replace for this specific image 131297-1_S3_IBA1 is {}
The fully cleaned table with "Area/Perimeter", "Circularity" and so on:


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm),Parent area name merged,Area/object name merged,Area/Perimeter (μm),Circularity


The total Calculations of each Daugher1 merged:


Unnamed: 0,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Merged area name,Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)


All calculations together for _S3 for C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S3_IBA1.csv


Unnamed: 0,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Merged area name,Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)


The present image= 131297-1_S4_IBA1
The full raw data=


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm)


The dictionary of brain regions to replace for this specific image 131297-1_S4_IBA1 is {}
The fully cleaned table with "Area/Perimeter", "Circularity" and so on:


Unnamed: 0,Image,Parent area name,Area/object name,Class label,Area (μm²),Circumference (µm),Parent area name merged,Area/object name merged,Area/Perimeter (μm),Circularity


The total Calculations of each Daugher1 merged:


Unnamed: 0,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Merged area name,Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)


All calculations together for _S4 for C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S4_IBA1.csv


Unnamed: 0,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Merged area name,Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)



 Analysis of all SX files of 131297-1_S1_IBA1
The total Calculations of each Daugher1 merged:


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
16,AMYGDALA,1301.0,3966632.93,847877.56,651.71,2.07,0.09,15612.0,21.38,327.99,8199.65
19,ANTERIOR COMMISSURE,273.0,917708.86,94067.98,344.57,1.63,0.12,3276.0,10.25,297.48,7437.0
17,CEREBELLAR PEDUNCLES,424.0,1342744.8,148357.74,349.9,1.82,0.14,5088.0,11.05,315.77,7894.28
13,CINGULATE CORTEX,2222.0,8392166.21,1278183.97,575.24,1.98,0.1,26664.0,15.23,264.77,6619.27
8,CORPUS CALLOSUM,3478.0,11868989.73,1230359.24,353.75,1.58,0.11,41736.0,10.37,293.03,7325.81
15,FIMBRIA,1748.0,5390674.92,638234.33,365.12,1.64,0.11,20976.0,11.84,324.26,8106.59
14,GLOBUS PALLIDUS,2020.0,4059853.2,1129029.68,558.93,2.19,0.12,24240.0,27.81,497.55,12438.87
0,HIPPOCAMPUS,14546.0,46359612.95,9276990.01,637.77,2.06,0.1,174552.0,20.01,313.76,7844.11
4,HYPOTHALAMUS,7625.0,26975042.07,4080068.63,535.09,1.84,0.1,91500.0,15.13,282.67,7066.72
21,IBA POSITIVE CELL,,57047993.92,,,,,,,,


All calculations together for S1+S2+...+SX concatenated for  C:\Users\u0133542\OneDrive - KU Leuven\PhD Leuven\Python Code for Aiforia\MAD\Raw_Data_IBA1\131297-1_S1_IBA1.csv


Unnamed: 0,Merged area name,Counts,Total Region Area (μm²),Total Cell Area (μm²),Average Cell Area (μm²),Average Area/Perimeter (μm),Average Circularity,Extrapolated Cell Count,Percentage IBA1 Positive Area,Cells/Region Area (mm²),Cells/Region Volume (mm³)
16,AMYGDALA,1301.0,3966632.93,847877.56,651.71,2.07,0.09,15612.0,21.38,327.99,8199.65
19,ANTERIOR COMMISSURE,273.0,917708.86,94067.98,344.57,1.63,0.12,3276.0,10.25,297.48,7437.0
17,CEREBELLAR PEDUNCLES,424.0,1342744.8,148357.74,349.9,1.82,0.14,5088.0,11.05,315.77,7894.28
13,CINGULATE CORTEX,2222.0,8392166.21,1278183.97,575.24,1.98,0.1,26664.0,15.23,264.77,6619.27
8,CORPUS CALLOSUM,3478.0,11868989.73,1230359.24,353.75,1.58,0.11,41736.0,10.37,293.03,7325.81
15,FIMBRIA,1748.0,5390674.92,638234.33,365.12,1.64,0.11,20976.0,11.84,324.26,8106.59
14,GLOBUS PALLIDUS,2020.0,4059853.2,1129029.68,558.93,2.19,0.12,24240.0,27.81,497.55,12438.87
0,HIPPOCAMPUS,14546.0,46359612.95,9276990.01,637.77,2.06,0.1,174552.0,20.01,313.76,7844.11
4,HYPOTHALAMUS,7625.0,26975042.07,4080068.63,535.09,1.84,0.1,91500.0,15.13,282.67,7066.72
21,IBA POSITIVE CELL,,57047993.92,,,,,,,,


Overview dataframe with all Total Region Area (μm²) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,3966632.93
19,ANTERIOR COMMISSURE,917708.86
17,CEREBELLAR PEDUNCLES,1342744.8
13,CINGULATE CORTEX,8392166.21
8,CORPUS CALLOSUM,11868989.73
15,FIMBRIA,5390674.92
14,GLOBUS PALLIDUS,4059853.2
0,HIPPOCAMPUS,46359612.95
4,HYPOTHALAMUS,26975042.07
21,IBA POSITIVE CELL,57047993.92


Overview dataframe with all Counts for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,1301.0
19,ANTERIOR COMMISSURE,273.0
17,CEREBELLAR PEDUNCLES,424.0
13,CINGULATE CORTEX,2222.0
8,CORPUS CALLOSUM,3478.0
15,FIMBRIA,1748.0
14,GLOBUS PALLIDUS,2020.0
0,HIPPOCAMPUS,14546.0
4,HYPOTHALAMUS,7625.0
21,IBA POSITIVE CELL,


Overview dataframe with all Extrapolated Cell Count for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,15612.0
19,ANTERIOR COMMISSURE,3276.0
17,CEREBELLAR PEDUNCLES,5088.0
13,CINGULATE CORTEX,26664.0
8,CORPUS CALLOSUM,41736.0
15,FIMBRIA,20976.0
14,GLOBUS PALLIDUS,24240.0
0,HIPPOCAMPUS,174552.0
4,HYPOTHALAMUS,91500.0
21,IBA POSITIVE CELL,


Overview dataframe with all Total Cell Area (μm²) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,847877.56
19,ANTERIOR COMMISSURE,94067.98
17,CEREBELLAR PEDUNCLES,148357.74
13,CINGULATE CORTEX,1278183.97
8,CORPUS CALLOSUM,1230359.24
15,FIMBRIA,638234.33
14,GLOBUS PALLIDUS,1129029.68
0,HIPPOCAMPUS,9276990.01
4,HYPOTHALAMUS,4080068.63
21,IBA POSITIVE CELL,


Overview dataframe with all Average Cell Area (μm²) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,651.71
19,ANTERIOR COMMISSURE,344.57
17,CEREBELLAR PEDUNCLES,349.9
13,CINGULATE CORTEX,575.24
8,CORPUS CALLOSUM,353.75
15,FIMBRIA,365.12
14,GLOBUS PALLIDUS,558.93
0,HIPPOCAMPUS,637.77
4,HYPOTHALAMUS,535.09
21,IBA POSITIVE CELL,


Overview dataframe with all Percentage IBA1 Positive Area for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,21.38
19,ANTERIOR COMMISSURE,10.25
17,CEREBELLAR PEDUNCLES,11.05
13,CINGULATE CORTEX,15.23
8,CORPUS CALLOSUM,10.37
15,FIMBRIA,11.84
14,GLOBUS PALLIDUS,27.81
0,HIPPOCAMPUS,20.01
4,HYPOTHALAMUS,15.13
21,IBA POSITIVE CELL,


Overview dataframe with all Average Area per Perimeter (μm) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,2.07
19,ANTERIOR COMMISSURE,1.63
17,CEREBELLAR PEDUNCLES,1.82
13,CINGULATE CORTEX,1.98
8,CORPUS CALLOSUM,1.58
15,FIMBRIA,1.64
14,GLOBUS PALLIDUS,2.19
0,HIPPOCAMPUS,2.06
4,HYPOTHALAMUS,1.84
21,IBA POSITIVE CELL,


Overview dataframe with all Average Circularity for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,0.09
19,ANTERIOR COMMISSURE,0.12
17,CEREBELLAR PEDUNCLES,0.14
13,CINGULATE CORTEX,0.1
8,CORPUS CALLOSUM,0.11
15,FIMBRIA,0.11
14,GLOBUS PALLIDUS,0.12
0,HIPPOCAMPUS,0.1
4,HYPOTHALAMUS,0.1
21,IBA POSITIVE CELL,


Overview dataframe with all Cells per Region Area (mm²) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,327.99
19,ANTERIOR COMMISSURE,297.48
17,CEREBELLAR PEDUNCLES,315.77
13,CINGULATE CORTEX,264.77
8,CORPUS CALLOSUM,293.03
15,FIMBRIA,324.26
14,GLOBUS PALLIDUS,497.55
0,HIPPOCAMPUS,313.76
4,HYPOTHALAMUS,282.67
21,IBA POSITIVE CELL,


Overview dataframe with all Cells per Region Vol (mm³) for all brains


Unnamed: 0,Merged area name,131297-1
16,AMYGDALA,8199.65
19,ANTERIOR COMMISSURE,7437.0
17,CEREBELLAR PEDUNCLES,7894.28
13,CINGULATE CORTEX,6619.27
8,CORPUS CALLOSUM,7325.81
15,FIMBRIA,8106.59
14,GLOBUS PALLIDUS,12438.87
0,HIPPOCAMPUS,7844.11
4,HYPOTHALAMUS,7066.72
21,IBA POSITIVE CELL,


CPU times: total: 2min 29s
Wall time: 2min 35s


## Part 3 - Automatic Hemisphere Analysis of all X*N Slides of all N Brains (injected vs uninjected)


In [None]:
%%time   
# For curiosity we measure the time the code in this cell takes to run

# Load the modified file with brain regions to replace/delete for each specific image 
df_brainregions_to_replace=load_data_brainregions_to_replace(file_brainregions_to_replace)

# Load the modified file with hemisphere analysis for each specific image 
df_brainregions_injected=load_data_brainregions_injected(file_brainregions_injected)

# Extract the file names that contain '_S1' in the file name. These are the N first images of the N unique brains.
all_raw_data_file_locations_S1= load_all_file_locations_S1(folder_raw_data)

# We initiate a counter to keep track in which loop we are below:
count = 0

# Loop over all the S1 pictures in the raw_data folder
for file_location_S1 in all_raw_data_file_locations_S1:
    count = count +1 # Counts the loop; first loop: counter = 1
    
    # Get the image name out of the file_path (getting image name from dataframe first column is hard because some are empty)
    full_name = os.path.basename(file_location_S1)
    file_name = os.path.splitext(full_name)
    image_name_S1 = file_name[0]

    dict_df_SX_final = {}    # {'_S1' : df_S1_final, '_S2' : df_S2_final, ..., '_SX' : df_SX_final}
    dict_df_SX_all_calcs_merged = {} # {'_S1' : df_S1_all_calcs_merged, '_S2' : df_S2_all_calcs_merged, ..., '_SX' : df_SX_all_calcs_merged}
    
    for appendix in appendices_list:   
        # appendix is in ['_S1', '_S2', '_S3', ... , '_SX']
        file_location = file_location_S1.replace('_S1', appendix)
    
        # Do the data cleaning, making use of the functions defined above    
        dict_df_SX_final[appendix] = dataframe_cleaning(file_location, df_brainregions_to_replace)

    # Concatenate the full dataframes of all S1, S2, ..., SX
    print('\n Analysis of all SX files of', image_name_S1)
    df_SX_final_concat = pd.concat(dict_df_SX_final.values(), axis=0)


    # Start from the fully concatenated dataframe of S1, S2... SX after replacing the wrong brainregions
    # For that dataframe, we will determine for each row whether the brainregion (Parent area name or Area/object name) was injected or not. 
    # The brainregions we don't care about will be deleted because we are doing an inner join
    df_SX_injected_parent= df_SX_final_concat.merge(df_brainregions_injected, left_on=['Image', 'Parent area name'], right_on=['Image', 'Brainregion'], how='inner')
    df_SX_injected_object= df_SX_final_concat.merge(df_brainregions_injected, left_on=['Image', 'Area/object name'], right_on=['Image', 'Brainregion'], how='inner')
    
    # Do all the S1+ S2...+ SX INJECTED calculations, making use of the functions defined above. 
    # The except part is for when we have made an empty dataframe because no dataframe was available (will never be the case for S1+S2 +...SX).
    try: 
        df_SX_all_calcs_injected = all_calculations(df_SX_injected_parent, df_SX_injected_object, groupby_column1='Parent_Injected', groupby_column2='Daughter1_Injected')
        df_SX_all_calcs_injected.sort_values('Merged area name', ascending=False, inplace=True)
        print('All calculations together for all SX INJECTED for ', file_location_S1)
        display(df_SX_all_calcs_injected)
    except:
        pass
    
  
    # Output the results to an excel file that is created in the output folder specified at the beginning of this notebook.
    output_file_name_SX = image_name_S1.replace('_S1', '_SX') + '_Hemisphere_Results.xlsx'
    output_file_location_SX = os.path.join(folder_output_results_injected, output_file_name_SX)
 
    with pd.ExcelWriter(output_file_location_SX) as writer:
        df_SX_all_calcs_injected.to_excel(writer, sheet_name='SX_Hemisphere_Results', index=False, float_format = "%.3f")
        
   
    # For the overview excel file, only the df_SX_all_calcs_injected dataframe is needed. 
    # We will make 1 overview excelfiles with a few tabpages that we store in dictionary_overview_dataframes_injected:
    # dictionary_overview_dataframes_injected = {Total Region area: df, Extrapolated Cell Count:df, Cells/Region Area:df, .... }

    # In the first loop we initiate an empty overview dictionary that will be filled with dataframes. 
    if count==1:
        dictionary_overview_dataframes_injected={}
        
    # Prepare the dataframes that are needed for the overview excel file: choose the needed columns,
    # and rename the header of the column with the values to the image_name 
    # (we go from e.g. image_name_S1 = 131297-1_S1_IBA1 to column name = 131297-1)
    list_calculation_results=['Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count', 
                              'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
                              'Average Area/Perimeter (μm)', 'Average Circularity', 
                              'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
                              ]
                                   
    # print('list_calculation_results = ', list_calculation_results)
    
    for calculation_result in list_calculation_results:
        df_SX_all_calcs_injected_calculation= df_SX_all_calcs_injected[['Merged area name', calculation_result]].copy()
        df_SX_all_calcs_injected_calculation = df_SX_all_calcs_injected_calculation[~df_SX_all_calcs_injected_calculation['Merged area name'].isin(brainregions_not_needed)]
        df_SX_all_calcs_injected_calculation.rename(columns={calculation_result: image_name_S1.replace('_S1_IBA1', '')}, inplace=True)
    
        if count==1:
            # In the first loop we fill the empty overview dictionary with a dataframe with the values calculated in loop 1 
            dictionary_overview_dataframes_injected[calculation_result]  = df_SX_all_calcs_injected_calculation.copy()

        elif count > 1 :
            # In the subsequent loops we will add the values of those loops to the dataframes in the overview dictionary
            dictionary_overview_dataframes_injected[calculation_result] = dictionary_overview_dataframes_injected[calculation_result].merge(df_SX_all_calcs_injected_calculation, how='outer', on='Merged area name')

    # At the end, we delete some of the dataframes, so they cannot be used in the next loop
    del(dict_df_SX_final)
    del(df_SX_final_concat)
    del(df_SX_all_calcs_injected)
    

# After the for loops, we print the final overview tables
# Output the final overview tables to an excel file Overview_IBA1_Hemisphere_Results.xlsx that is created in the output folder specified at the beginning of this notebook
output_file_name_overview = os.path.join(folder_output_results_injected, 'Overview_IBA1_Hemisphere_Results.xlsx')
    
list_calculation_results=['Total Region Area (μm²)', 'Counts', 'Extrapolated Cell Count', 
                          'Total Cell Area (μm²)', 'Average Cell Area (μm²)', 'Percentage IBA1 Positive Area',
                          'Average Area/Perimeter (μm)', 'Average Circularity', 
                          'Cells/Region Area (mm²)', 'Cells/Region Volume (mm³)',
                          ]
                              
with pd.ExcelWriter(output_file_name_overview) as writer: 
    for calculation_result in list_calculation_results:
        calculation_result_clean = calculation_result.replace('/', ' per ').replace('Volume', 'Vol')
        
        print(f'Overview dataframe with all {calculation_result_clean} for all brains')
        display(dictionary_overview_dataframes_injected[calculation_result])

        dictionary_overview_dataframes_injected[calculation_result].to_excel(writer, sheet_name=calculation_result_clean, index=False, float_format = "%.3f")