<h2>Data Analysis - Batch Processing - Quantification of cell populations</h2>

The following notebook is able to process the .csv files resulting from Batch Processing (Average Intensity or Colocalization) and:

1. Define cell populations based on single or multiple markers (positive, negative or a combination of both)
2. Plot resulting data using Plotly.
3. Extract numbers of cells positive for a marker based on colocalization (using a user-defined threshold).
4. Aggregate all per labels results in a single .csv file ("BP_populations_marker_+_summary_{method}.csv")
4. Save summary % results on a cell population basis in .csv file ("BP_populations_marker_+_summary_{method}.csv").

In [13]:
from pathlib import Path
from utils_data_analysis import calculate_perc_pops, plot_perc_pop_per_filename_roi

In [14]:
# Define the path containing your results
results_path = Path("./results/test_data/2D/Cellpose")

# Input the method used to define cells as positive for a marker ("avg_int", "coloc") #TODO: "pixel_class"
method = "avg_int"

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
markers = [("ki67", 0, "nucleus"), ("neun", 1, "nucleus"), ("calbindin", 2, "cytoplasm")]

# Define the min_max average intensity parameters to select your populations of interest (for avg_int method)
# You have the possibility to define populations for the same marker (i.e. neun high and neun low)
# max_values are set to 255 since the test input images are 8-bit, higher bit depths can result in higher max avg_int values
min_max_per_marker = [{"marker": "ki67", "min_max": (200,255), "population":"ki67"},
                      {"marker": "neun", "min_max": (50,115), "population":"neun_low"},
                      {"marker": "neun", "min_max": (115,255), "population":"neun_high"},
                      {"marker": "calbindin", "min_max": (65,255), "population":"calbindin"},]

# Define cell populations based on multiple markers (i.e. double marker positive (True) or marker positive (True) and marker2 negative (False))
# Based on populations in min_max_per_marker in case multiple pops per marker are defined, as in the case of "neun"
# For cell_pop defined by a single populations marker add a + so it does not have the same name as population in min_max_per_marker
cell_populations = [
    {"cell_pop": "neun_high+", "subpopulations": [("neun_high", True)]},
    {"cell_pop": "neun_low+", "subpopulations": [("neun_low", True)]},
    {"cell_pop": "non_prolif", "subpopulations": [("ki67", False)]},
    {"cell_pop": "prolif_neun_high", "subpopulations": [("neun_high", True), ("ki67", True)]},
    {"cell_pop": "prolif_neun_low", "subpopulations": [("neun_low", True), ("ki67", True)]},
    {"cell_pop": "non_prolif_neun_high", "subpopulations": [("neun_high", True), ("ki67", False)]},
    {"cell_pop": "non_prolif_neun_low", "subpopulations": [("neun_low", True), ("ki67", False)]},
    {"cell_pop": "neun_high_+_calbindin_+", "subpopulations": [("neun_high", True), ("calbindin", True)]},
    {"cell_pop": "neun_low_+_calbindin_+", "subpopulations": [("neun_low", True), ("calbindin", True)]},]

In [15]:
# Extract model and segmentation type from results Path
# Calculate percentages of each cell population, save them as a summary .csv
percentage_true, model_name, segmentation_type = calculate_perc_pops(results_path, method, min_max_per_marker, cell_populations)

percentage_true

Unnamed: 0,filename,ROI,ki67,neun_low,neun_high,calbindin,neun_high+,neun_low+,non_prolif,prolif_neun_high,prolif_neun_low,non_prolif_neun_high,non_prolif_neun_low,neun_high_+_calbindin_+,neun_low_+_calbindin_+
0,HI1_CONTRA_M8_S6_TR1,CA,0.708661,17.559055,68.582677,0.0,68.582677,17.559055,99.291339,0.07874,0.314961,68.503937,17.244094,0.0,0.0
1,HI1_CONTRA_M8_S6_TR1,DG,4.103535,56.25,22.916667,0.757576,22.916667,56.25,95.896465,0.0,1.515152,22.916667,54.734848,0.757576,0.0
2,HI1_CONTRA_M8_S6_TR2,CA,1.603053,19.389313,66.641221,0.076336,66.641221,19.389313,98.396947,0.076336,0.305344,66.564885,19.083969,0.0,0.076336
3,HI1_CONTRA_M8_S6_TR2,DG,4.816054,57.056856,22.876254,2.474916,22.876254,57.056856,95.183946,0.06689,1.337793,22.809365,55.719064,2.408027,0.06689
4,HI1_CONTRA_M8_S7_TR1,CA,0.087184,13.251962,76.809067,0.261552,76.809067,13.251962,99.912816,0.087184,0.0,76.721883,13.251962,0.174368,0.087184
5,HI1_CONTRA_M8_S7_TR1,DG,0.509091,65.163636,21.745455,4.872727,21.745455,65.163636,99.490909,0.072727,0.218182,21.672727,64.945455,4.872727,0.0
6,HI1_CONTRA_M8_S7_TR2,CA,0.0,14.54389,72.719449,0.086059,72.719449,14.54389,100.0,0.0,0.0,72.719449,14.54389,0.0,0.086059
7,HI1_CONTRA_M8_S7_TR2,DG,0.206897,63.034483,15.034483,0.0,15.034483,63.034483,99.793103,0.0,0.206897,15.034483,62.827586,0.0,0.0
8,HI1_IPSI_M8_S6_TR1,CA,0.206186,43.917526,3.402062,0.515464,3.402062,43.917526,99.793814,0.0,0.0,3.402062,43.917526,0.0,0.0
9,HI1_IPSI_M8_S6_TR1,DG,1.872659,40.012484,3.870162,0.374532,3.870162,40.012484,98.127341,0.062422,0.249688,3.80774,39.762797,0.0,0.062422


In [16]:
# Plot the resulting cell population percentages of a per filename per ROI basis
plot_perc_pop_per_filename_roi(percentage_true, model_name, segmentation_type)