# How to use generated data from Fat and Nucleus Detection Script

The goal of this notebook is to explain the outputs of the fat and nucleus detection scripts, and to demonstrate how these results can be used.

### Overview
Dataframes generated as outputs:
* [file_name]_data.csv
* [file_name]_subpatch_df.pkl
* [file_name]_raw_global.pkl #if using nucleus detection
* [file_name]_fat_and_nucs.pkl #if using nucleus detection

Now we'll go into the details of each dataframe.

### [file_name]_data.csv

This dataframe contains general information about the WSI as well as information about white objects that could represent fatty objects. Based on the parameters defined when running the script, these objects are classified as "fat" or "not fat". Each object represents one line of this dataframe.
The dataframe has the following columns:
* WSI Information (stays constant for whole WSI):
    * `WSI` = name of the WSI
    * `global_threshold` = the threshold that is calculated using Otsu's threshold on the entire WSI (at a lower resolution to make it loadable)
    * `patch_level` = the level at which the patch is processed at. 
    * `th_mpp` = the micrometers per pixel at which the global threshold is calculated at.
    * `Mpp` = the patch-level micrometers per pixel
    * `Magnification` = the magnification at which the WSI was scanned.

* Patch Information (stays constant for whole patch):
    * `patch_key` = the patch key in which the object is located in.
    * `patch_x` = the x coordinate (pixel) of the patch in which the object is located, with respect to the WSI. 
    * `patch_y` = the y coordinate (pixel) of the patch in which the object is located, with respect to the WSI.
    * `Black_area` = the total tissue (*not* including fatty area) in the processed patch.

* Object information:
    * `Centroid_x `= the local x coordinate (pixel) of the object, with respect to the patch.
    * `Centroid_y` = the local y coordinate (pixel) of the object, with respect to the patch. 
    * `Area` = the area of the object (pixel)
    * `Bbox_x` = the x coordinate of the object's bounding box (pixel)
    * `Bbox_y`= the y coordinate of the object's bounding box (pixel)
    * `Bbox_width` = width of the object's bounding box (pixel)
    * `Bbox_height`  = height of the object's bounding box (pixel)
    * `Extent` = the object's extent.
    * `Axis_ratio` = the object's axis ratio
    * `Circularity` = the object's circularity
    * `is_fat`= if the object is considered fat, given the parameter settings (boolean)

In [None]:
### How to work with [file_name]_data.csv ###

import pandas as pd
import numpy as np

# load the dataframe:
object_data_csv = pd.read_csv( "XXXXXXX")
print(f"Information about {object_data_csv['WSI'].iloc[0]}")
print(f"--- magnification = {object_data_csv['Magnification'].iloc[0]}")
print(f"--- mpps at patch level = {object_data_csv['Mpp'].iloc[0]}")

# select only fat objects
fat_df = object_data_csv[object_data_csv["is_fat"] == True]
print(f"--- number of fat objects in this WSI = {len(fat_df)}")


# find number of patches in dataframe
num_patches_in_df = len(np.unique(object_data_csv["patch_key"]))
print(f"--- number of patches in dataframe = {num_patches_in_df}")

# get total tissue area of WSI. Also return only fatty area.
def analyze_tissue_area_objects_df(objects_df):
    total_black_area = 0 
    total_fat_area = 0
    for patch_key in np.unique(objects_df["patch_key"]):
        total_black_area += object_data_csv[object_data_csv["patch_key"]== patch_key]["Black_area"].iloc[0] #black area stays constant for whole patch.
        fat_objects_in_patch = object_data_csv[(object_data_csv["patch_key"]==patch_key) & (object_data_csv["is_fat"] == True)] #select only objects in the given patch that represent fatty objects.
        fat_areas_in_patch = [float(object_area) for object_area in fat_objects_in_patch["Area"].to_list()]
        for fat_area in fat_areas_in_patch:
            total_fat_area += fat_area
    total_tissue = total_black_area + total_fat_area
    return total_tissue, total_fat_area

total_tissue, total_fat_area = analyze_tissue_area_objects_df(objects_df = object_data_csv)
print(f"total tissue (in pixels) found in WSI = {total_tissue}.")
print(f"----- fat area (in pixels) = {total_fat_area}.")
print(f"----- tissue area without fat (in pixels) = {total_tissue - total_fat_area}.")
print(f"-> percent fat in WSI = {(total_fat_area/ total_tissue) * 100}.")



### [file_name]_subpatch_df.pkl

This dataframe is generated when a larger patch is processed in smaller patches to be able to process the image at a lower resolution. This could be used to analyze fat distribution within the WSI. 
The dataframe has the following structure, with each row representing a subpatch:

* `original_key` = the key of the (large) patch that is then divided into subpatches
* `subpatch_size(px)` = edge length of the subpatch #stays constant
* `subpatch_key` = key of the subpatch
* `global_x_coords(px)` = the x coordinate of the subpatch, with respect to the entire WSI
* `global_y_coords(px)` = the y coordinate of the subpatch, with respect to the entire WSI
* `fat_area(px)` = the number of fat pixels in that subpatch
* `black_area(px)` = the number of black pixels (tissue area not including fat pixels) in that subpatch
* `mpps` = the micrometers per pixel at which the patch was processed #stays constant


This dataframe can also be used to find total tissue area, and to calculate the fat percentage.


In [None]:
### How to work with [file_name]_subpatch_df.pkl ###

import pandas as pd
subpatch_df = pd.read_pickle("XXXXXXXX")

print(f"number of subpatches in a patch: {max(subpatch_df['subpatch_key'].to_list())}")
print(f"length of a subpatch: {subpatch_df['subpatch_size(px)'].iloc[0]} ")

#find the total tissue_area, calculate the fat percentage
def analyze_tissue_area_subpatch_df(subpatch_df):
    black_area = subpatch_df["black_area(px)"].sum()
    fat_area = subpatch_df["fat_area(px)"].sum()
    total_tissue_area = black_area + fat_area
    return total_tissue_area, fat_area

total_tissue, total_fat_area = analyze_tissue_area_subpatch_df(subpatch_df)
print(f"total tissue (in pixels) found in WSI = {total_tissue}.")
print(f"----- fat area (in pixels) = {total_fat_area}.")
print(f"----- tissue area without fat (in pixels) = {total_tissue - total_fat_area}.")
print(f"-> percent fat in WSI = {(total_fat_area/ total_tissue) * 100}.")


### [file_name]_raw_global.pkl 
This dataframe provides information about nuclei detected by HoVer-Net. Each row in this dataframe represents a nucleus that was detected. The columns are structured as following:
* `Name` = name of WSI
* `patch_key` = the patch key in which the nucleus is located in.
* `local_centroid(px)` = the centorid of the nucleus with respect to the patch
* `centroid_global(px)` = the centroid of the nucleus with respect to the WSI
* `bbox_global(px)` = the bounding box of the nucleus, coordinates with respect to the WSI. Format:[[coords_x_0, coords_x1],[coords_y_0, coords_y_1]]
* `contour_global(px)` = the contour of the nucleus, coordinates with respect to the WSI
* `type` = type of the nuclei, if using HoVer-Net weights that support type prediction. See HoVer-Net documentation for more info.
* `type_probability` = probabilty of the nucleus type. See HoVer-Net documentation for more info.
* `area_nuc(pxs)` = area of the nucleus, in pixels
* `is_a_relevant_nucleus` = to be a relevant (=valid) nuclei, the nucleus must be (a) located in scanned area, and (b) located within an area that contains tissue.


In [None]:
### How to work with [file_name]_raw_global.pkl ###
import pandas as pd
import numpy as np

#load dataframe
raw_nucleus_info = pd.read_pickle("XXXXXX")
print(f"total number of detected nuclei: {len(raw_nucleus_info)}")
relevant_nuclei = raw_nucleus_info[raw_nucleus_info['is_a_relevant_nucleus'] == True] # get only relevant nuclei
print(f"total number of relevant nuclei: {len(relevant_nuclei)}")
print(f"average area of relevant nuclei (in pixels): {np.mean(relevant_nuclei['area_nuc(pxs)'])}")


### [file_name]_fat_and_nucs.pkl

This dataframe combines information about fat distribution and nucleus information. This dataframe is calculated on a subpatch level. Each row represents a subpatch.
Each row on this dataframe represents a patch. The columns are organized as follows:
* WSI Information
    * `mpps` = patch_level micrometers per pixel. Stays constant

* Patch Information
    * `original_key`

* Subpatch Information
    * `subpatch_size(px)` #stays constant
    * `subpatch_key` 
    * `global_x_coords(px)` = x coordinates of subpatch, with respect to WSI
    * `global_y_coords(px)`= y coordinates of subpatch, with respect to WSI
    * `fat_area(px)` = fat area in pixels in the subpatch
    * `black_area(px)`= black area in pixels in the subpatch
    * nucleus information for each relevant nucleus in the subpatch. Each of the following is a list with each element of the list representing the data of a nucleus
        * `nuclei_coords_global(px)` = list of list of nucleus coordinates (coordinates with respect to the WSI)
        * `nuclei_area(pxs)` = list of nuclei areas 
        * `nuclei_contour_global(px)` = list of nucei contours (with respect to the WSI)
        * `nuclei_bbox_global(px)` = list of bounding boxes (with respect to the WSI)
        * `nuclei_type` = list of nucleus types, if using HoVer-Net weights that support type prediction. See HoVer-Net documentation for more info.
        * `nuclei_type_probability` = list of nucleus type probailities, if using HoVer-Net weights that support type predictions.
        * `number_of_nuclei_in_subpatch` = (int) Number of nuclei located in the subpatch.


In [None]:
### How to work with [file_name]_fat_and_nucs.pkl ###
import pandas as pd
import numpy as np

#load the dataframe
fat_and_nucs_df = pd.read_pickle("XXXXXX")

#fat area and tissue area can be calculated analog to the subpatch_df.pkl (analyze_tissue_area_subpatch_df())
print(f"average number of nuclei per subpatch = {np.mean(fat_and_nucs_df['number_of_nuclei_in_subpatch'].to_list())}")

#lets look at one subpatch (that contains nuclei)
subpatch_with_nucs = fat_and_nucs_df[fat_and_nucs_df["number_of_nuclei_in_subpatch"] > 0].iloc[0]
print(f"number of nuclei in chosen subpatch: {subpatch_with_nucs['number_of_nuclei_in_subpatch']}")
print(f"coordinates of nuclei in chosen subpatch: {subpatch_with_nucs['nuclei_coords_global(px)'][0]}") #list of lists
print(f"areas of nuclei in chosen subpatch: {subpatch_with_nucs['nuclei_area(pxs)']}")

