# How to use generated data from Nucleus Cluster Analysis Script

The goal of this notebook is to explain the outputs of the Nucleus Cluster Analysis Script, and to demonstrate how these results can be used.

### Overview of scrip outputs:
* [file_name]_[output-folder-name]_clusters.pkl #list of lists
* [file_name]_[output-folder-name]_graph_stats.pkl #list of lists
* [file_name]_[output-folder-name]_global_CCA.pkl #dataframe
* [file_name]_fat_and_nucs.pkl #modified dataframe

Now we'll go into the details of each output.


### [file_name]_[output-folder-name]_clusters.pkl

This is a list of a list, where each element contains a list of each nucleus coordinate in that cluster: [[(x1,y1), (x2,y2)], [(x3,y3)], ... ]. 


In [None]:
import numpy as np
import pickle

with open("", 'rb') as f:
    clusters = pickle.load(f)

print(f" total number of nucleus clusters in WSI: {len(clusters)}")
print(f" number of nuclei in first cluster: {len(clusters[0])}")
print(f" coordinates of nuclei in first cluster: {clusters[0]}")


### [file_name]_[output-folder-name]_graph_stats.pkl

This information is structured as a list of lists of nucleus information:


graph_stats[0] = average clustering coefficient (list of a float with the length of one)<br>

The remaining elements are all lists, with each list having the length of the number of relevant nuclei detected.

graph_stats[1] = clustering coefficients (list)<br>
graph_stats[2] = number of neighbors (list)<br>
graph_stats[3] = number of non_neighbors (list)<br>
graph_stats[4] = average distance (in pixels) to neighbors (list)<br> 
graph_stats[5] = average distance (in pixels) to non-neighbor (list)<br>
graph_stats[6] = number of common neighbors (list)<br>
graph_stats[7] = degree (list)<br>
graph_stats[8] = is nucleus in a cluster (list)


In [None]:
import numpy as np
import pickle

with open("", 'rb') as f:
    graph_stats = pickle.load(f)

print(f"average clustering coefficient: {graph_stats[0]}")
#each of the remaining list elements each have the same length:
for i in range(1,8):
    i = int(i)
    print(f"length of graph_stats[{i}] = {len(graph_stats[i])}")

#to find the aberage distance to a neighbor for nucleus number x
x = 10 #nucleus number
print(f"average distance to neighbors for nucleus number {x} =  {graph_stats[4][x]}")
print(f"nucleus number {x} in a cluster? {graph_stats[8][x]}")



### [file_name]_[output-folder-name]_global_CCA.pkl

This dataframe puts nuclei information on a global (ie WSI) scale. Information about fatty area, nuclei, as well as area taken up by nuclei clusters is contained in this dataframe. It contains the same information as [file_name]_fat_and_nucs.pkl dataframe, with the addition of the information about nucleus cluster area. To see how to use this information, see fat_and_nucleus_data_analysis_example.ipynb.

The columns are organized as follows: 


* WSI Information
    * `mpps` = patch_level micrometers per pixel. Stays constant

* Patch Information
    * `original_key`

* Subpatch Information
    * `subpatch_size(px)` #stays constant
    * `subpatch_key` 
    * `global_x_coords(px)` = x coordinates of subpatch, with respect to WSI
    * `global_y_coords(px)`= y coordinates of subpatch, with respect to WSI
    * `fat_area(px)` = fat area in pixels in the subpatch
    * `black_area(px)`= black area in pixels in the subpatch
    * `area(px)_taken_up_by_nucleus_clusters_max_dis_{max_distance}(micrometers)` = area taken up by nucleus clusters, determined by calculating the convex hull around the nuclei in a cluster. Name of the column will vary based on micrometers used as max distance.
    * nucleus information for each relevant nucleus in the subpatch. Each of the following is a list with each element of the list representing the data of a nucleus
        * `nuclei_coords_global(px)` = list of list of nucleus coordinates (coordinates with respect to the WSI)
        * `nuclei_area(pxs)` = list of nuclei areas 
        * `nuclei_contour_global(px)` = list of nucei contours (with respect to the WSI)
        * `nuclei_bbox_global(px)` = list of bounding boxes (with respect to the WSI)
        * `nuclei_type` = list of nucleus types, if using HoVer-Net weights that support type prediction. See HoVer-Net documentation for more info.
        * `nuclei_type_probability` = list of nucleus type probailities, if using HoVer-Net weights that support type predictions.
        * `number_of_nuclei_in_subpatch` = (int) Number of nuclei located in the subpatch.


In [None]:
import pandas as pd
global_CCA_df = pd.read_pickle("XXXX")
print(global_CCA_df.columns)

### Calculate cell based scores:
#### Requirements:
* Information about fatty vesicles from fat_and_nucleus_detection.py ([file_name]_data.csv) #gives us the number of detected fat objects
* [file_name]_[output-folder-name]_graph_stats.pkl #gives us the number of detected nuclei, and if these are in a cluster.

In [None]:
import os
import pandas as pd
import pickle


# load the data for the same image
fat_object_df = pd.read_csv("")

with open("", 'rb') as f:
    graph_stats = pickle.load(f)

def find_cell_based_fat_percentage_for_one_WSI(fat_object_df, nucleus_information):

    is_in_cluster = list(nucleus_information[8])
    total_num_nuclei = len(is_in_cluster) #each nucleus has a boolean representing if the nucleus is in a cluster or not.
    num_clustered_cells = sum(is_in_cluster)
    num_non_clustered_cells = total_num_nuclei - num_clustered_cells
    num_fat_cells = len(fat_object_df[fat_object_df["is_fat"]==True])
    
    cell_based_score_without_cluster_information = (num_fat_cells / (total_num_nuclei + num_fat_cells))* 100
    cell_based_scores_with_cluster_information = (num_fat_cells / (num_non_clustered_cells + num_fat_cells))* 100
    return cell_based_score_without_cluster_information, cell_based_scores_with_cluster_information

cell_based_score_without_cluster_information, cell_based_scores_with_cluster_information = find_cell_based_fat_percentage_for_one_WSI(fat_object_df=fat_object_df, nucleus_information=graph_stats)

print(f"cell based fat percentage without differentiating between clustered and non-clustered nuclei: {cell_based_score_without_cluster_information}")
print(f"cell based fat percentage while differentiating between clustered and non-clustered nuclei: {cell_based_scores_with_cluster_information}") 
