<h2>Data Analysis - Batch Processing - Quantification of cell populations</h2>

The following notebook is able to process the .csv files resulting from Batch Processing (Average Intensity or Colocalization) and:

1. Define cell populations based on single or multiple markers (positive, negative or a combination of both)
2. Plot resulting data using Plotly.
3. Extract numbers of cells positive for a marker based on colocalization (using a user-defined threshold).
4. Aggregate all per labels results in a single .csv file ("BP_populations_marker_+_summary_{method}.csv")
4. Save summary % results on a cell population basis in .csv file ("BP_populations_marker_+_summary_{method}.csv").

In [5]:
from pathlib import Path
from utils_data_analysis import calculate_perc_pops, plot_perc_pop_per_filename_roi

In [6]:
# Define the path containing your results
results_path = Path("./results/test_data/3D/MEC0.1")

# Input the method used to define cells as positive for a marker ("avg_int", "coloc") #TODO: "pixel_class"
method = "avg_int"

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# WARNING!!!: These settings overwrite the ones you used during 003_BP_Avg_intensity to define your populations (what is considered positive)
# ATTENTION: These settings do not affect or change the analysis results of 003_BP_Colocalization
# Define the min_max average intensity parameters to select your populations of interest
# You have the possibility to define populations for the same marker (i.e. neun high and neun low)
# max_values are set to 255 since the test input images are 8-bit, higher bit depths can result in higher max avg_int values
min_max_per_marker = [
    {"marker": "ki67", "min_max": (110,255), "population":"ki67"},
    {"marker": "neun", "min_max": (20,80), "population":"neun_low"},
    {"marker": "neun", "min_max": (80,255), "population":"neun_high"},
    {"marker": "calbindin", "min_max": (10,255), "population":"calbindin"},]

# Define cell populations based on multiple markers (i.e. double marker positive (True, True) or marker1 positive and marker2 negative (True, False))
# Based on populations in min_max_per_marker (see above) in case multiple pops per marker are defined, as in the case of "neun"
# For cell_pop defined by a single population marker add a + so it does not have the same name as population in min_max_per_marker
cell_populations = [
    {"cell_pop": "neun_high+", "subpopulations": [("neun_high", True)]},
    {"cell_pop": "neun_low+", "subpopulations": [("neun_low", True)]},
    {"cell_pop": "non_prolif", "subpopulations": [("ki67", False)]},
    {"cell_pop": "prolif_neun_high", "subpopulations": [("neun_high", True), ("ki67", True)]},
    {"cell_pop": "prolif_neun_low", "subpopulations": [("neun_low", True), ("ki67", True)]},
    {"cell_pop": "non_prolif_neun_high", "subpopulations": [("neun_high", True), ("ki67", False)]},
    {"cell_pop": "non_prolif_neun_low", "subpopulations": [("neun_low", True), ("ki67", False)]},
    {"cell_pop": "neun_high_+_calbindin_+", "subpopulations": [("neun_high", True), ("calbindin", True)]},
    {"cell_pop": "neun_low_+_calbindin_+", "subpopulations": [("neun_low", True), ("calbindin", True)]},]

In [7]:
# Extract model and segmentation type from results Path
# Calculate percentages of each cell population, save them as a summary .csv
percentage_true, model_name, segmentation_type = calculate_perc_pops(results_path, method, min_max_per_marker, cell_populations)

percentage_true

Unnamed: 0,filename,ROI,ki67,neun_low,neun_high,calbindin,neun_high+,neun_low+,non_prolif,prolif_neun_high,prolif_neun_low,non_prolif_neun_high,non_prolif_neun_low,neun_high_+_calbindin_+,neun_low_+_calbindin_+
0,HI1_CONTRA_M8_S6_TR1,CA,1.976285,26.416337,58.300395,63.306983,58.300395,26.416337,98.023715,0.527009,0.724638,57.773386,25.6917,48.418972,12.121212
1,HI1_CONTRA_M8_S6_TR1,DG,5.395173,65.215334,16.706105,28.300994,16.706105,65.215334,94.604827,0.189304,2.886891,16.516801,62.328443,15.286323,11.831519
2,HI1_CONTRA_M8_S6_TR2,CA,4.944697,27.065712,58.425504,77.033182,58.425504,27.065712,95.055303,2.602472,0.650618,55.823032,26.415094,54.456734,17.957059
3,HI1_CONTRA_M8_S6_TR2,DG,8.037943,67.948078,16.674988,40.639041,16.674988,67.948078,91.962057,0.998502,4.59311,15.676485,63.354968,15.62656,21.617574
4,HI1_CONTRA_M8_S7_TR1,CA,0.15444,18.455598,74.749035,95.984556,74.749035,18.455598,99.84556,0.07722,0.07722,74.671815,18.378378,73.899614,17.220077
5,HI1_CONTRA_M8_S7_TR1,DG,1.320321,76.349024,18.197474,87.428243,18.197474,76.349024,98.679679,0.0,1.262916,18.197474,75.086108,17.910448,65.269805
6,HI1_CONTRA_M8_S7_TR2,CA,0.0,20.629921,69.448819,50.15748,69.448819,20.629921,100.0,0.0,0.0,69.448819,20.629921,40.15748,7.952756
7,HI1_CONTRA_M8_S7_TR2,DG,0.644405,76.157001,12.712361,42.12068,12.712361,76.157001,99.355595,0.0,0.644405,12.712361,75.512595,12.478032,27.182191
8,HI1_IPSI_M8_S6_TR1,CA,1.17773,68.736617,2.890792,35.653105,2.890792,68.736617,98.82227,0.0,0.214133,2.890792,68.522484,2.248394,28.158458
9,HI1_IPSI_M8_S6_TR1,DG,2.669762,56.065003,3.192107,28.43877,3.192107,56.065003,97.330238,0.174115,0.812536,3.017992,55.252467,2.959954,18.862449


In [8]:
# Plot the resulting cell population percentages of a per filename per ROI basis
plot_perc_pop_per_filename_roi(percentage_true, model_name, segmentation_type)