# Cell Neighbor Analysis
There are two components of this notebook: **neighborhood diversity** and **phenotype distance analysis**. Each is independent of the other, so you can choose only to run what you're interested in.

In [None]:
import os
import pandas as pd 

import ark.settings as settings
from ark.analysis.neighborhood_analysis import create_neighborhood_matrix
from ark.analysis.cell_neighborhood_stats import neighborhood_diversity_analysis, cell_neighbor_distance_analysis

## Cell Neighborhood Diversity
This part of the notebook can be used to determine the homogeneity/diversity of the neighbors surrounding each of the cells in our images.

### 1. Path Setup
* `base_dir`: the path to the main folder containing all of your imaging data. This directory will also store all of the directories/files created during analysis
* `spatial_analysis_dir`: the path to the directory containing the spatial analysis output
* `cell_table_path`: the path to the cell table that contains columns for fov, cell label, and cell phenotype (generated from `3_Pixie_Cluster_Cells.ipynb`)
* `dist_mat_dir`: the path to directory containing the distance matrices for your data
* `neighbors_mat_dir`: the path to directory containing the neighborhood matrix for your data
* `output_dir`: path for a new directory that will be created for the output below

In [None]:
base_dir = '../data/example_dataset'

spatial_analysis_dir = os.path.join(base_dir, "spatial_analysis")
cell_table_path = os.path.join(base_dir, "segmentation/cell_table/cell_table_size_normalized_cell_labels.csv")
dist_mat_dir = os.path.join(spatial_analysis_dir, "dist_mats")

neighbors_mat_dir = os.path.join(spatial_analysis_dir, "neighborhood_mats")
output_dir = os.path.join(spatial_analysis_dir, "cell_neighbor_analysis")

for directory in [neighbors_mat_dir, output_dir]:
    if not os.path.exists(directory):
        os.makedirs(directory)

### 2. Neighborhood Matrices
You must specify which neighbors matrix should be used based on the pixel radius and cell type column. Provide multiple cell cluster columns and the neighbors matrix will be checked for each.
- `pixel_radius`: radius used to define the neighbors of each cell
- `cell_type_columns`: list of columns in your cell table containing different classifications of cell phenotypes

**If you have not previously generated neighbors matrices with the `pixel_radius` and `cell_type_columns` specified, they will be created below.**

In [None]:
pixel_radius = 50
cell_type_columns = [settings.CELL_TYPE, ]

In [None]:
for cell_type_col in cell_type_columns:
    freqs_path = os.path.join(neighbors_mat_dir, f"neighborhood_freqs-{cell_type_col}_radius{pixel_radius}.csv")
    
    # Check for existing neighbors matrix and if not, create a new one
    if not os.path.exists(freqs_path):
        print(f"Generating neighbors matrix for {cell_type_col}.")
        
        all_data = pd.read_csv(cell_table_path)
        _, neighbor_freqs = create_neighborhood_matrix(
            all_data, dist_mat_dir, distlim=pixel_radius, cell_type_col=cell_type_col)
        
        # Save neighbors frequency matrix
        neighbor_freqs.to_csv(freqs_path, index=False)

### 3. Compute Shannon Diversity
The code below will calculate the Shannon Diversity Index for each cell input. The diversity will be calculated for each cell cluster column specified above, then combined into a single results table. **The resulting file will be saved to the save_path (`neighborhood_diversity_radius{pixel_radius}.csv`) in the new cell_neighbor_analysis subdirectory.**

In [None]:
save_path = os.path.join(output_dir, f'neighborhood_diversity_radius{pixel_radius}.csv')

diversity_data = neighborhood_diversity_analysis(neighbors_mat_dir, pixel_radius, cell_type_columns)
diversity_data.to_csv(save_path, index=False)

## Phenotype Distances
This part of the notebook can be used to analyze the proximty/distance between cell phenotypes in samples. 

### 1. Path Setup
* `base_dir`: the path to the main folder containing all of your imaging data. This directory will also store all of the directories/files created during analysis
* `spatial_analysis_dir`: the path to the directory containing the spatial analysis output
* `cell_table_path`: the path to the cell table that contains columns for fov, cell label, and cell phenotype (generated from `3_Pixie_Cluster_Cells.ipynb`)
* `dist_mat_dir`: the path to directory containing the distance matrices for your data
* `output_dir`: path for a new directory that will be created for the output below

In [None]:
base_dir = '../data/example_dataset'

spatial_analysis_dir = os.path.join(base_dir, "spatial_analysis")
cell_table_path = os.path.join(base_dir, "segmentation/cell_table/cell_table_size_normalized_cell_labels.csv")
dist_mat_dir = os.path.join(spatial_analysis_dir, "dist_mats")

output_dir = os.path.join(spatial_analysis_dir, "cell_neighbor_analysis")
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

### 2. Compute Average Distances
For every cell in an image, the code below will compute the average distance of the *k* closest cells of each phenotype. You must specify *k* and the cell phenotype column name below.

- `k`: how many nearest cells of a specific phenotype to average the distance over
- `cell_type_columns`: list of columns in your cell table containing different classifications of cell phenotypes

**The resulting file will be saved to the save_path (`{cell_type_col}_avg_dists-nearest_{k}.csv`) in the new cell_neighbor_analysis subdirectory.**

In [None]:
k = 5
cell_type_col = settings.CELL_TYPE

In [None]:
save_path = os.path.join(output_dir, f"{cell_type_col}_avg_dists-nearest_{k}.csv")
all_data = pd.read_csv(cell_table_path)

distance_data = cell_neighbor_distance_analysis(all_data, dist_mat_dir, save_path, k=k)