<h2>Data Analysis - Batch Processing - Quantification of cell populations</h2>

The following notebook is able to process the .csv files resulting from Batch Processing (Average Intensity or Colocalization) and:

1. Define cell populations based on single or multiple markers (positive, negative or a combination of both)
2. Plot resulting data using Plotly.
3. Extract numbers of cells positive for a marker based on colocalization (using a user-defined threshold).
4. Aggregate all per labels results in a single .csv file ("BP_populations_marker_+_summary_{method}.csv")
4. Save summary % results on a cell population basis in .csv file ("BP_populations_marker_+_summary_{method}.csv").

In [1]:
from pathlib import Path
from utils_data_analysis import calculate_perc_pops, plot_perc_pop_per_filename_roi

In [20]:
# Define the path containing your results
results_path = Path(r"./results/Ker c11 staining\2D\test")

# Input the method used to define cells as positive for a marker ("avg_int", "coloc", "obj_class") #TODO: "pixel_class"
method = "avg_int"

# Define the channels you want to analyze using the following structure:
# markers = [(channel_name, channel_nr, cellular_location),(..., ..., ...)]
markers = [("kerc11", 2, "cytoplasm")]

# WARNING!!!: These settings overwrite the ones you used during 003_BP_Avg_intensity to define your populations (what is considered positive)
# ATTENTION: These settings do not affect or change the analysis results of 003_BP_Colocalization
# Define the min_max average intensity parameters to select your populations of interest
# You have the possibility to define populations for the same marker (i.e. neun high and neun low)
# max_values are set to 255 since the test input images are 8-bit, higher bit depths can result in higher max avg_int values
min_max_per_marker = [{"marker": "kerc11", "min_max": (4,50), "population":"kerc11+"}]

# Define cell populations based on multiple markers (i.e. double marker positive (True, True) or marker1 positive and marker2 negative (True, False))
# Based on populations in min_max_per_marker (see above) in case multiple pops per marker are defined, as in the case of "neun"
# For cell_pop defined by a single population marker add a + so it does not have the same name as population in min_max_per_marker
cell_populations = [
    {"cell_pop": "kerc11_positive", "subpopulations": [("kerc11+", True)]}]

In [21]:
# Extract model and segmentation type from results Path
# Calculate percentages of each cell population, save them as a summary .csv
percentage_true, model_name, segmentation_type = calculate_perc_pops(results_path, method, min_max_per_marker, cell_populations)

percentage_true

Unnamed: 0,filename,ROI,kerc11+,kerc11_positive
0,1361 436 MD Ker c11 - 2016-04-15 07.36.57,full_image,99.578158,99.578158
1,1361 436+912 MI Ker c11 - 2016-04-15 07.47.04,full_image,21.708432,21.708432
2,1362 436 MD Ker c11 - 2016-04-15 07.57.50,full_image,99.671808,99.671808
3,1362 436+912 MI Ker c11 - 2016-04-15 08.02.34,full_image,46.739674,46.739674
4,1363 436 MD Ker c11 - 2016-04-15 08.13.16,full_image,54.484714,54.484714
5,1363 436+907 MI Ker c11 - 2016-04-15 08.25.41,full_image,67.939534,67.939534
6,1364 436 MD Ker c11 - 2016-04-15 08.30.40,full_image,32.731003,32.731003
7,1364 436+907 MI Ker c11 - 2016-04-15 09.04.08,full_image,60.347209,60.347209
8,1368 436+907 MI Ker c11 - 2016-04-15 09.21.50,full_image,64.514092,64.514092
9,1368 436+912 MD Ker c11 - 2016-04-15 09.16.00,full_image,99.927977,99.927977


In [19]:
# Plot the resulting cell population percentages of a per filename per ROI basis
plot_perc_pop_per_filename_roi(percentage_true, model_name, segmentation_type)