<h2>Imports</h2>

In [15]:
from utils import read_results_csv, extract_analysis_parameters, show_exploratory_data, qc_filter_dataset
import pandas as pd
import plotly.express as px

<h2>Data analysis input</h2>

In [16]:
# Define the dataset results that you want to analyze below ("microglia" or "astrocyte")
dataset = "microglia"

# Define the .csv results you want to explore and quality check
csv_path = "./results/microglia_results_cellpdia30_sigma1_dilrad4_dnad_obj_seg_v1_gliaero6_gliathr20_dnadero2.csv"

# Read both results and mouse_id .csv files and load them into a Dataframe 
df, df_mouse_id, merged_df = read_results_csv(dataset, csv_path)

# Print the analysis settings and extract them into variables
cellpose_nuclei_diameter, gaussian_sigma, dilation_radius_nuclei, dna_damage_segmenter_version, glia_nuclei_colocalization_erosion, glia_channel_threshold, glia_segmenter, glia_segmenter_version, dna_damage_erosion, parameters_title = extract_analysis_parameters(csv_path)

# Display the first few rows of the DataFrame
merged_df.head()


The following dataset will be analyzed: microglia
Cellpose nuclei diameter: 30
Gaussian sigma: 1
Dilation radius nuclei: 4
Dna damage segmenter version: 1
Glia erosion: 6
Glia threshold: 20
Glia semantic segmentation version: None
DNA damage foci erosion: 2


Unnamed: 0,index,filename,avg_dna_damage_foci/glia_+,avg_dna_damage_foci/glia_+_damage_+,avg_dna_damage_foci/all_nuclei,avg_dna_damage_foci/all_nuclei_damage_+,nr_+_dna_damage_glia_nuclei,nr_+_dna_damage_all_nuclei,nr_-_dna_damage_glia_nuclei,nr_glia_+_nuclei,...,%_dna_damage_signal,%_glia+_signal,damage_load_ratio,tissue_location,staining_id,animal_id,sex,genotype,dna_damage_stain_quality_manual,manual_qc
0,0,DSB Iba1 101_40X_CA1,1.0,1.333333,0.573333,1.409836,9,61,3,12,...,0.783348,1.67799,0.406667,CA1,101,2119,female,APP/PS1,good,passed
1,1,DSB Iba1 101_40X_CA3,0.777778,1.0,0.934959,1.513158,7,76,2,9,...,1.286697,2.135658,0.617886,CA3,101,2119,female,APP/PS1,good,passed
2,2,DSB Iba1 101_40X_CTX1,1.1,1.375,0.958084,1.415929,24,113,6,30,...,2.621174,5.073738,0.676647,CTX1,101,2119,female,APP/PS1,good,passed
3,3,DSB Iba1 101_40X_CTX2,1.363636,1.666667,0.898374,1.407643,9,157,2,11,...,1.908875,5.266762,0.638211,CTX2,101,2119,female,APP/PS1,good,passed
4,4,DSB Iba1 101_40X_CTX3,0.533333,1.333333,0.759657,1.301471,6,136,9,15,...,1.623058,3.178596,0.583691,CTX3,101,2119,female,APP/PS1,good,passed


<h2>Initial data exploration</h2>

In [17]:
show_exploratory_data(df, dataset, parameters_title)

<h2>Data filtration and quality control (QC)</h2>

We can observe there is a number of outliers in the glial and dna damage mask detections given the staining is suboptimal in some of the samples. I will filter the data to remove those suboptimal stains and just plot the optimal ones where the automated image analysis offers reliable results. The images passing and not passing quality control can be individually checked in the next two Juypter notebooks.

In [18]:
# Quality check the analyzed stainings based on deviations from the mean of both %_glia_signal and %_dna_damage_signal
merged_df = qc_filter_dataset(merged_df,
                              dataset, 
                              cellpose_nuclei_diameter, 
                              gaussian_sigma, 
                              dilation_radius_nuclei, 
                              dna_damage_segmenter_version, 
                              glia_nuclei_colocalization_erosion, 
                              glia_channel_threshold, 
                              glia_segmenter, 
                              glia_segmenter_version, 
                              dna_damage_erosion)

# Dataframe now displays the QC values
merged_df.head()

Glia_mask_area_%_mean: 9.674813516163907, Dna_damage_mask_area_%_mean: 8.994491602664807


Unnamed: 0,index,filename,avg_dna_damage_foci/glia_+,avg_dna_damage_foci/glia_+_damage_+,avg_dna_damage_foci/all_nuclei,avg_dna_damage_foci/all_nuclei_damage_+,nr_+_dna_damage_glia_nuclei,nr_+_dna_damage_all_nuclei,nr_-_dna_damage_glia_nuclei,nr_glia_+_nuclei,...,tissue_location,staining_id,animal_id,sex,genotype,dna_damage_stain_quality_manual,manual_qc,glia_stain_quality_auto,dna_damage_stain_quality_auto,staining_qc_passed
0,0,DSB Iba1 101_40X_CA1,1.0,1.333333,0.573333,1.409836,9,61,3,12,...,CA1,101,2119,female,APP/PS1,good,passed,optimal,optimal,True
1,1,DSB Iba1 101_40X_CA3,0.777778,1.0,0.934959,1.513158,7,76,2,9,...,CA3,101,2119,female,APP/PS1,good,passed,optimal,optimal,True
2,2,DSB Iba1 101_40X_CTX1,1.1,1.375,0.958084,1.415929,24,113,6,30,...,CTX1,101,2119,female,APP/PS1,good,passed,optimal,optimal,True
3,3,DSB Iba1 101_40X_CTX2,1.363636,1.666667,0.898374,1.407643,9,157,2,11,...,CTX2,101,2119,female,APP/PS1,good,passed,optimal,optimal,True
4,4,DSB Iba1 101_40X_CTX3,0.533333,1.333333,0.759657,1.301471,6,136,9,15,...,CTX3,101,2119,female,APP/PS1,good,passed,optimal,optimal,True


First we will plot technical replicates without averaging them into biological replicas

In [30]:
# Remove data from images with a poor quality stain (auto QC), copy to avoid warnings
auto_filtered_df = merged_df[merged_df['staining_qc_passed'] == True].copy()

# Create a new column for combined sex and tissue location
auto_filtered_df['sex_tissue'] = auto_filtered_df['sex'] + ' - ' + auto_filtered_df['tissue_location']

# Define the order of the categories to ensure that male and female for each location are side by side
categories = [
    'male - CA1', 'female - CA1',
    'male - CA3', 'female - CA3',
    'male - CTX1', 'female - CTX1',
    'male - CTX2', 'female - CTX2',
    'male - CTX3', 'female - CTX3',
    'male - DG', 'female - DG'
]

In [32]:
# Create the boxplot with ordered categories
fig = px.box(auto_filtered_df, x='sex_tissue', y='avg_dna_damage_foci/glia_+',
             color='genotype',  # Different genotypes will be shown in different colors
             category_orders={'sex_tissue': categories},  # Ensuring the specified order
             title=f'DNA Damage Foci nr in All Glia Nuclei by Tissue Location and Genotype (separated by sex) - Auto stain QC - {parameters_title}')

# Show the plot
fig.show()

In [33]:
# Create the boxplot with ordered categories
fig = px.box(auto_filtered_df, x='sex_tissue', y='avg_dna_damage_foci/glia_+_damage_+',
             color='genotype',  # Different genotypes will be shown in different colors
             category_orders={'sex_tissue': categories},  # Ensuring the specified order
             title=f'DNA Damage Foci nr in Damage+ Glia Nuclei by Tissue Location and Genotype (separated by sex) - Auto stain QC - {parameters_title}')

# Show the plot
fig.show()

In [34]:
# Create the boxplot with ordered categories
fig = px.box(auto_filtered_df, x='sex_tissue', y='avg_dna_damage_foci/all_nuclei',
             color='genotype',  # Different genotypes will be shown in different colors
             category_orders={'sex_tissue': categories},  # Ensuring the specified order
             title=f'DNA Damage Foci nr in All Nuclei by Tissue Location and Genotype (separated by sex) - Auto stain QC - {parameters_title}')

# Show the plot
fig.show()

In [35]:
# Create the boxplot with ordered categories
fig = px.box(auto_filtered_df, x='sex_tissue', y='avg_dna_damage_foci/all_nuclei_damage_+',
             color='genotype',  # Different genotypes will be shown in different colors
             category_orders={'sex_tissue': categories},  # Ensuring the specified order
             title=f'DNA Damage Foci nr in Damage+ Nuclei by Tissue Location and Genotype (separated by sex) - Auto stain QC - {parameters_title}')

# Show the plot
fig.show()

In [25]:
# Create the boxplot
fig = px.box(auto_filtered_df, x='tissue_location', y='damage_load_ratio',
             color='genotype', # Different genotypes will be shown in different colors
             title=f'Damage load ratio by Tissue Location and Genotype (sex-aggregated) - Auto stain QC - {parameters_title}')

# Show the plot
fig.show()

<h2>Explore failed QC Dataframe</h2>

In [36]:
qc_failed_df = merged_df[merged_df['staining_qc_passed'] == False]

print(f"{qc_failed_df.shape[0]} stains have not passed QC and have been discarded")

qc_failed_df


60 stains have not passed QC and have been discarded


Unnamed: 0,index,filename,avg_dna_damage_foci/glia_+,avg_dna_damage_foci/glia_+_damage_+,avg_dna_damage_foci/all_nuclei,avg_dna_damage_foci/all_nuclei_damage_+,nr_+_dna_damage_glia_nuclei,nr_+_dna_damage_all_nuclei,nr_-_dna_damage_glia_nuclei,nr_glia_+_nuclei,...,tissue_location,staining_id,animal_id,sex,genotype,dna_damage_stain_quality_manual,manual_qc,glia_stain_quality_auto,dna_damage_stain_quality_auto,staining_qc_passed
47,47,DSB Iba1 16_40X_CA1,0.936508,1.923913,0.836207,1.796296,92,108,97,189,...,CA1,16,887,male,APP/PS1,poor,failed,suboptimal,optimal,False
48,48,DSB Iba1 16_40X_CA3,4.513514,4.717514,4.40625,4.726257,177,179,8,185,...,CA3,16,887,male,APP/PS1,poor,failed,suboptimal,optimal,False
49,49,DSB Iba1 16_40X_CTX1,0.642857,1.6875,0.648352,1.882979,16,94,26,42,...,CTX1,16,887,male,APP/PS1,poor,failed,optimal,optimal,False
50,50,DSB Iba1 16_40X_CTX2,2.240664,2.268908,2.327068,2.389961,238,259,3,241,...,CTX2,16,887,male,APP/PS1,poor,failed,suboptimal,suboptimal,False
51,51,DSB Iba1 16_40X_CTX3,2.019231,2.1,1.738589,2.053922,100,204,4,104,...,CTX3,16,887,male,APP/PS1,poor,failed,suboptimal,optimal,False
52,52,DSB Iba1 16_40X_DG,0.662896,1.382075,0.653509,1.37963,212,216,230,438,...,DG,16,887,male,APP/PS1,poor,failed,suboptimal,optimal,False
53,53,DSB Iba1 17_40X_CA1,1.818182,2.5,2.765306,3.985294,8,136,3,11,...,CA1,17,892,female,APP/PS1,poor,failed,optimal,optimal,False
54,54,DSB Iba1 17_40X_CA3,1.315789,1.923077,3.617486,4.76259,13,139,6,19,...,CA3,17,892,female,APP/PS1,poor,failed,optimal,optimal,False
55,55,DSB Iba1 17_40X_CTX1,2.411765,2.928571,1.276923,2.141935,14,155,3,17,...,CTX1,17,892,female,APP/PS1,poor,failed,optimal,optimal,False
56,56,DSB Iba1 17_40X_CTX2,2.846154,2.901961,2.55298,2.640411,51,292,1,52,...,CTX2,17,892,female,APP/PS1,poor,failed,optimal,optimal,False
