# Cell Type Assignment

---
This notebook is based on the code from "https://github.com/GVS-Lab/germinal_center/" by Daniel Paysan and Saradha Venkatachalapathy (2023) 

---


---

## Setting up the environment

As a first step, we again load a number of external software packages, that we will use.



In [None]:
import os
cwd=os.getcwd()
print(cwd)

In [1]:
# import libraries
import sys
from pathlib import Path
from glob import glob
import pandas as pd
import numpy as np
from collections import Counter
import os
from tqdm.notebook import tqdm
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
from scipy.stats import pearsonr, spearmanr
from skimage.measure import regionprops
import cv2
from statannotations.Annotator import Annotator
import warnings

import sklearn

warnings.filterwarnings("ignore")
seed = 1234
plt.rcParams["figure.dpi"] = 300

#%load_ext nb_black

We also load a number of function defined within this repository. Please refer to the code of these for a better understanding of what they do.

In [2]:
sys.path.append("../..")

from src.utils.cell_type_detection import *
from src.utils.data_viz import *
from src.utils.data_processing import clean_data, remove_correlated_features
from src.utils.discrimination import *

2025-01-17 20:23:07.993233: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-17 20:23:07.995298: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2025-01-17 20:23:07.995306: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


---

## Read in the preprocessed data


We second read in the data that was generated by the preceding image processing, namely the segmentation and feature profiling tasks.

In [3]:
# Set up directories
root_dir = "../../DeepMel_data/stitched"
merged_image_dir = os.path.join(root_dir, "Merged")
#gc_mask_dir = os.path.join(root_dir, "mask")

#merged_image_dir = os.path.join(root_dir, "merged") #input raw image
analysis_dir = os.path.join(root_dir, "DeepMel_2X1_ROI2.4.5.6.7_InOutTLS_feature_generation_notebook_Results_20250117")

#
dapi_image_dir = os.path.join(analysis_dir, "dapi_raw")
cd3_image_dir = os.path.join(analysis_dir, "cd3_raw")
cd20_image_dir = os.path.join(analysis_dir, "cd20_raw")

scaled_dapi_image_dir = os.path.join(analysis_dir, "dapi_scaled")
scaled_cd3_image_dir = os.path.join(analysis_dir, "cd3_scaled")
scaled_cd20_image_dir = os.path.join(analysis_dir, "cd20_scaled")

segmented_nuclei_dir = os.path.join(analysis_dir, "segmented_nucleus")
nuclei_rois_dir = os.path.join(analysis_dir, "segmented_nuclei_ijroi")
nuclear_features_dir = os.path.join(analysis_dir, "chrometric_features")
segmented_cells_dir = os.path.join(analysis_dir, "segmented_cells")

raw_dapi_levels_dir = os.path.join(analysis_dir, "dapi_rawlevel")
raw_cd3_levels_dir = os.path.join(analysis_dir, "cd3_rawlevel")
raw_cd20_levels_dir = os.path.join(analysis_dir, "cd20_rawlevel")

cellular_cd3_levels_dir = os.path.join(analysis_dir, "cd3_scaledlevel")
cellular_cd20_levels_dir = os.path.join(analysis_dir, "cd20_scaledlevel")

#germinal_center_loc_dir = os.path.join(analysis_dir, "position_wrt_germinal_center")

spatial_cordiates_dir = os.path.join(analysis_dir, "spatial_cordiates")
consolidated_features_dir = os.path.join(analysis_dir, "consolidated_features")


In [4]:
nuc_features = pd.read_csv(os.path.join(consolidated_features_dir, "nuc_features.csv"), index_col=0)

spatial_cord = pd.read_csv(os.path.join(consolidated_features_dir, "spatial_coordiates.csv"), index_col=0)
spatial_cord.index = spatial_cord["nuc_id"]

cd3_levels = pd.read_csv(os.path.join(consolidated_features_dir, "cd3_levels.csv"), index_col=0)
cd20_levels = pd.read_csv(os.path.join(consolidated_features_dir, "cd20_levels.csv"), index_col=0)

While this is not required, we recommend renaming the chrometric features according to their updated description, this is achieved by running the code below.

In [None]:
# NOT RUN!!!!
nuc_feature_description = pd.read_csv(
    "https://github.com/GVS-Lab/chrometrics/blob/main/chrometric_feature_description.csv", index_col=0
)
feature_name_dict = dict(
    zip(
        list(nuc_feature_description.loc[:, "feature"]),
        list(nuc_feature_description.loc[:, "long_name"]),
    )
)
nuc_features = nuc_features.rename(columns=feature_name_dict)

Note that the linked ``.csv`` file also contains a description of the features, which might be helpful to better understand what these features are.

---

## Identify cell type labels

To identify the cell type labels, we will use the expression of marker proteins that were measured. In this example these are only the CD3 labels but the procedure shown below can be similarly run if there are many more marker stains available and profiled using the preceding imaging processing described in the feature generation notebook.

To identify if a cell is stained positively for a given marker, we look at the average intensity of the corresponding protein within the identified cellular mask. Assuming sufficient specificity of the staining, we should observe a bimodal distribution of that quantity when looking at the mean expression for all cells in a given image. Cells that are positive for the marker will contribute to the higher mode and those that are negative to the lower mode. We thus, identify cells that are positive for a marker by fitting a 2-component Gaussian mixture model for the average cellular intensities of the marker protein and label cells as positive that are assigned to the component with the larger mode and others as negative.

In [5]:
(_, fovs) = pd.factorize(cd3_levels["image"].astype("category"))
img_names = fovs.categories
cd3_positive_cells = get_positive_cells_batch(cd3_levels, img_names)

(_, fovs) = pd.factorize(cd20_levels["image"].astype("category"))
img_names = fovs.categories
cd20_positive_cells = get_positive_cells_batch(cd20_levels, img_names)

Note that this is done individually for each image, as the intensity distribution of the markers might vary between images.

We store the identified cell type labels as a new feature called ``cd3_status`` as part of our nuclear features.

In [7]:
nuc_features["cd3_status"] = "negative"
nuc_features.loc[
    nuc_features.nuc_id.isin(cd3_positive_cells), "cd3_status"
] = "positive"

nuc_features["cd20_status"] = "negative"
nuc_features.loc[
    nuc_features.nuc_id.isin(cd20_positive_cells), "cd20_status"
] = "positive"

nuc_features.head(5)

Unnamed: 0,Unnamed: 0.1,label,min_calliper,max_calliper,smallest_largest_calliper,min_radius,max_radius,med_radius,avg_radius,mode_radius,...,moments_hu-0,moments_hu-1,moments_hu-2,moments_hu-3,moments_hu-4,moments_hu-5,moments_hu-6,nuc_id,cd3_status,cd20_status
0,0,1,22,30,0.733333,10.060333,15.47021,13.015767,12.89732,[10.06033257],...,0.165832,0.002018,4.9e-05,1.04956e-06,7.505779e-12,4.715221e-08,2.761999e-14,2X1_ROI2_inside1_1,positive,negative
1,1,2,23,27,0.851852,10.431119,13.846464,11.947439,11.998488,[10.43111948],...,0.161765,0.000314,0.000167,6.368191e-07,1.727502e-12,3.197225e-09,-6.341798e-12,2X1_ROI2_inside1_2,positive,positive
2,2,3,23,29,0.793103,11.006346,15.39611,12.331931,12.59895,[11.00634551],...,0.163833,0.000793,0.000184,8.64595e-07,-7.730831e-13,-1.392427e-08,-1.086376e-11,2X1_ROI2_inside1_3,positive,negative
3,3,4,24,34,0.705882,10.842629,16.826671,13.408338,13.51641,[10.84262896],...,0.169017,0.003072,4.3e-05,1.246372e-06,3.925141e-12,1.269106e-08,-8.208301e-12,2X1_ROI2_inside1_4,negative,positive
4,4,5,23,31,0.741935,10.248505,15.651777,12.071983,12.498454,[10.24850524],...,0.167725,0.002385,8.3e-05,1.000233e-06,-7.101288e-12,-4.6467e-08,5.729124e-12,2X1_ROI2_inside1_5,positive,negative


In [8]:
nuc_features.tail(5)

Unnamed: 0,Unnamed: 0.1,label,min_calliper,max_calliper,smallest_largest_calliper,min_radius,max_radius,med_radius,avg_radius,mode_radius,...,moments_hu-0,moments_hu-1,moments_hu-2,moments_hu-3,moments_hu-4,moments_hu-5,moments_hu-6,nuc_id,cd3_status,cd20_status
2575,2575,2576,7,10,0.7,2.891467,4.519906,3.967353,3.92873,[4.51990577],...,0.162332,0.001131,0.000112,1e-06,-1.35625e-11,-3.930957e-08,-3.446695e-12,2X1_ROI7_outside2_2576,negative,negative
2576,2576,2577,12,17,0.705882,4.922895,8.847196,6.539613,6.679315,[5.12200139],...,0.173966,0.004048,0.000139,4e-06,-1.360235e-11,-1.395102e-07,8.497533e-11,2X1_ROI7_outside2_2577,negative,negative
2577,2577,2578,6,10,0.6,2.588979,5.193798,3.678764,3.749541,[2.9159578],...,0.181602,0.006723,0.000429,3.1e-05,2.908823e-09,1.805473e-06,-2.177573e-09,2X1_ROI7_outside2_2578,negative,negative
2578,2578,2579,9,19,0.473684,2.851489,9.572797,5.451635,5.969072,[2.85148853],...,0.223237,0.022552,0.000532,5.1e-05,-3.684819e-09,-5.302607e-06,-7.630608e-09,2X1_ROI7_outside2_2579,negative,negative
2579,2579,2580,7,15,0.466667,2.804885,7.554596,4.552911,4.892669,[2.80488524],...,0.212122,0.019112,0.000176,2.5e-05,1.504078e-09,2.329691e-06,7.581141e-10,2X1_ROI7_outside2_2580,negative,negative


In [9]:
# Save the added marker status to a new file
nuc_features.to_csv(consolidated_features_dir+"/" +"DeepMel_2X1_ROI2.4.5.6.7_nuc_features_with_marker_status_20250117.csv")