# Cell Neighbors Analysis
There are two components of this notebook: **neighborhood diversity** and **cell distance analysis**. Each is independent of the other, so you can choose only to run what you're interested in; however, you must run the four cells below to set up the notebook paths and generate the distance matrices for your data.

In [1]:
import os
import pandas as pd 
from ark.utils import example_dataset
import ark.settings as settings
from ark.analysis import spatial_analysis_utils
from ark.analysis.neighborhood_analysis import create_neighborhood_matrix
from ark.analysis.cell_neighborhood_stats import generate_neighborhood_diversity_analysis, generate_cell_distance_analysis

blah


## Path Setup
* `base_dir`: the path to the main folder containing all of your imaging data. This directory will also store all of the directories/files created during analysis
* `spatial_analysis_dir`: the path to the directory containing the spatial analysis output
* `cell_table_path`: the path to the cell table that contains columns for fov, cell label, and cell phenotype (generated from `3_Pixie_Cluster_Cells.ipynb`)
* `dist_mat_dir`: the path to directory containing the distance matrices for your data
* `neighbors_mat_dir`: the path to directory containing the neighborhood matrix for your data
* `output_dir`: path for a new directory that will be created for the output below

In [2]:
base_dir = "/Users/boberlto/Library/CloudStorage/GoogleDrive-boberlto@stanford.edu/Shared drives/BRUCE_data/cell_tables/combined_panel"

If you would like to test the features in Ark with an example dataset, run the cell below. It will download a dataset consisting of 11 FOVs with 22 channels. You may find more information about the example dataset in the [README](../README.md#example-dataset).

If you are using your own data, skip the cell below.

* `overwrite_existing`: If set to `False`, it will not overwrite existing data in the `data/example_dataset`. Recommended setting to `False` if you are running Notebooks 1,2,3 and 4 in succession. Set to `True` if you are just running Notebook 4.

In [3]:
# example_dataset.get_example_dataset(dataset="post_clustering", save_dir=base_dir, overwrite_existing=False)

In [4]:
spatial_analysis_dir = os.path.join(base_dir, "spatial_analysis_updated")
# segmentation_dir = os.path.join(base_dir, "segmentation/deepcell_output")
cell_table_path = os.path.join(base_dir, "20240922_spatial_notebook_prepped_combined_panel.csv")
dist_mat_dir = os.path.join(spatial_analysis_dir, "dist_mats")

neighbors_mat_dir = os.path.join(spatial_analysis_dir, "neighborhood_mats")
output_dir = os.path.join(spatial_analysis_dir, "cell_neighbor_analysis")

In [5]:
# os.listdir(dist_mat_dir)

In [6]:
import os

import numpy as np
import pandas as pd
import scipy
import skimage.measure
import sklearn.metrics
import xarray as xr
from alpineer import io_utils, load_utils, misc_utils
from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
from statsmodels.stats.multitest import multipletests
from tqdm.notebook import tqdm

import ark.settings as settings
from ark.utils._bootstrapping import compute_close_num_rand


def calc_dist_matrix(cell_table, save_path, fov_id='fov', label_id='label'):
    """Generate matrix of distances between center of pairs of cells.

    Saves each one individually to `save_path`.

    Args:
        cell_table (str):
            data frame with fov, label, and centroid information
        save_path (str):
            path to save the distance matrices
        fov_id (str):
            the column name containing the fov
        labe_id (str):
            the column name containing the cell label
    """

    # check that both label_dir and save_path exist
    io_utils.validate_paths([save_path])

    # load all the file names in label_dir
    fovs = cell_table[fov_id].unique()

    # iterate for each fov
    with tqdm(total=len(fovs), desc="Distance Matrix Generation", unit="FOVs") \
            as dist_mat_progress:
        for fov in fovs:
            dist_mat_progress.set_postfix(FOV=fov)

            fov_table = cell_table[cell_table[fov_id]==fov]

            # get centroid and label info
            centroids = [(row['centroid-0'], row['centroid-1']) for indx, row in fov_table.iterrows()]
            centroid_labels = list(fov_table[label_id])

            # generate the distance matrix, then assign centroid_labels as coords
            dist_matrix = cdist(centroids, centroids).astype(np.float32)
            dist_mat_xarr = xr.DataArray(dist_matrix, coords=[centroid_labels, centroid_labels])
            

            # save the distance matrix to save_path
            dist_mat_xarr.to_netcdf(
                os.path.join(save_path, fov + '_dist_mat.xr'),
                format='NETCDF3_64BIT'
            )

            dist_mat_progress.update(1)

In [7]:
# generate distance matrices if needed
if not os.path.exists(dist_mat_dir):
    os.makedirs(dist_mat_dir)
    
    # # # read in cell table
    cell_table = pd.read_csv(cell_table_path)
    
    # calculate distance matrices
    calc_dist_matrix(cell_table, dist_mat_dir, fov_id='fov', label_id='label')

# create neighbors matrix and output directories
for directory in [neighbors_mat_dir, output_dir]:
    if not os.path.exists(directory):
        os.makedirs(directory)

Distance Matrix Generation:   0%|          | 0/574 [00:00<?, ?FOVs/s]

## Cell Neighborhood Diversity
This part of the notebook can be used to determine the homogeneity/diversity of the neighbors surrounding each of the cells in our images.

### 1. Neighborhood Matrices
You must specify which neighbors matrix should be used based on the pixel radius and cell type column.
- `pixel_radius`: radius used to define the neighbors of each cell
- `cell_type_col`: column in your cell table containing the cell phenotypes

**If you have not previously a generated neighbors matrix with the `pixel_radius` and `cell_type_col` specified, it will be created below.**

In [8]:
all_data = pd.read_csv(cell_table_path)

In [9]:
pixel_radius = 50
cell_type_col = "cell_meta_cluster"

counts_path = os.path.join(neighbors_mat_dir, f"neighborhood_counts-{cell_type_col}_radius{pixel_radius}.csv")
freqs_path = os.path.join(neighbors_mat_dir, f"neighborhood_freqs-{cell_type_col}_radius{pixel_radius}.csv")

In [10]:
# Check for existing neighbors matrix
if os.path.exists(counts_path) and os.path.exists(freqs_path):
    neighbor_counts = pd.read_csv(counts_path)
    neighbor_freqs = pd.read_csv(freqs_path) 

else:
    # Create new matrix with the radius and cell column specified above
    neighbor_counts, neighbor_freqs = create_neighborhood_matrix(
        all_data, dist_mat_dir, distlim=pixel_radius, cell_type_col=cell_type_col)

    # Save neighbor matrices
    neighbor_counts.to_csv(counts_path, index=False)
    neighbor_freqs.to_csv(freqs_path, index=False)

modified fov check


Neighbors Matrix Generation:   0%|          | 0/574 [00:00<?, ?FOVs/s]

In [11]:
pixel_radius = 50
cell_type_col = "cell_meta_cluster_IT"

counts_path = os.path.join(neighbors_mat_dir, f"neighborhood_counts-{cell_type_col}_radius{pixel_radius}.csv")
freqs_path = os.path.join(neighbors_mat_dir, f"neighborhood_freqs-{cell_type_col}_radius{pixel_radius}.csv")

In [12]:
# Check for existing neighbors matrix
if os.path.exists(counts_path) and os.path.exists(freqs_path):
    neighbor_counts = pd.read_csv(counts_path)
    neighbor_freqs = pd.read_csv(freqs_path) 

else:
    # Create new matrix with the radius and cell column specified above
    neighbor_counts, neighbor_freqs = create_neighborhood_matrix(
        all_data, dist_mat_dir, distlim=pixel_radius, cell_type_col=cell_type_col)

    # Save neighbor matrices
    neighbor_counts.to_csv(counts_path, index=False)
    neighbor_freqs.to_csv(freqs_path, index=False)

modified fov check


Neighbors Matrix Generation:   0%|          | 0/574 [00:00<?, ?FOVs/s]

In [13]:
pixel_radius = 50
cell_type_col = "cell_meta_cluster_ML"

counts_path = os.path.join(neighbors_mat_dir, f"neighborhood_counts-{cell_type_col}_radius{pixel_radius}.csv")
freqs_path = os.path.join(neighbors_mat_dir, f"neighborhood_freqs-{cell_type_col}_radius{pixel_radius}.csv")

In [14]:
# Check for existing neighbors matrix
if os.path.exists(counts_path) and os.path.exists(freqs_path):
    neighbor_counts = pd.read_csv(counts_path)
    neighbor_freqs = pd.read_csv(freqs_path) 

else:
    # Create new matrix with the radius and cell column specified above
    neighbor_counts, neighbor_freqs = create_neighborhood_matrix(
        all_data, dist_mat_dir, distlim=pixel_radius, cell_type_col=cell_type_col)

    # Save neighbor matrices
    neighbor_counts.to_csv(counts_path, index=False)
    neighbor_freqs.to_csv(freqs_path, index=False)

modified fov check


Neighbors Matrix Generation:   0%|          | 0/574 [00:00<?, ?FOVs/s]

### 2. Compute Shannon Diversity
The code below will calculate the Shannon Diversity Index for each cell input. **The resulting file will be saved to the save_path (`neighborhood_diversity_radius{pixel_radius}.csv`) in the new cell_neighbor_analysis subdirectory.**

In [15]:
cell_type_columns = ["cell_meta_cluster"] 
save_path = os.path.join(output_dir, f'neighborhood_diversity_{cell_type_columns}_radius{pixel_radius}.csv')

diversity_data = generate_neighborhood_diversity_analysis(neighbors_mat_dir, pixel_radius, cell_type_columns)
diversity_data.to_csv(save_path, index=False)

Calculate Neighborhood Diversity:   0%|          | 0/574 [00:00<?, ?FOVs/s]

## Cell Distances
This part of the notebook can be used to analyze the proximty/distance between cell phenotypes in samples. 

### 1. Compute Average Distances
For every cell in an image, the code below will compute the average distance of the *k* closest cells of each phenotype. You must specify *k* and the cell phenotype column name below.

- `k`: how many nearest cells of a specific phenotype to average the distance over
- `cell_type_col`: column in your cell table containing the cell phenotypes

**The resulting file will be saved to the save_path (`{cell_type_col}_avg_dists-nearest_{k}.csv`) in the new cell_neighbor_analysis subdirectory.**

In [16]:
k = 5
cell_type_col = "cell_meta_cluster"

In [17]:
all_data = pd.read_csv(cell_table_path)
    
save_path = os.path.join(output_dir, f"{cell_type_col}_avg_dists-nearest_{k}.csv")
distance_data = generate_cell_distance_analysis(all_data, dist_mat_dir, save_path, k, cell_type_col)

Calculate Average Distances:   0%|          | 0/574 [00:00<?, ?FOVs/s]