# Cohort Plot Generation

The purpose of this notebook is to be able to generate plots for your cohort:
1. Generate cluster plots for each of the FOVs.
2. Generate plots measuring some kind of continuous variable.

In [None]:
import os
import pandas as pd
from alpineer import io_utils
from ark.utils.plot_utils import cohort_cluster_plot, plot_cohort_continuous_variable
import natsort as ns
import ark.settings as settings

## 1. Cluster Plots

We can mass produce cluster plots with this section of the notebook.

You will need the following:

1. A directory containing segmentation masks labeled with integers [$0, 1, 2, 3, ...N$], where $0$ is the exclusive to the background.
2. A `.csv` consisting of a column of FOV / image names, a column of segmentation IDs for each image and a column of cluster IDs for each image. Here is an example of the format below:


| fov | segmentation_id | cluster_id |
|-----|-----------------|------------|
| 1   | 0               | Background |
| 1   | 1               | Cluster 1  |
| 1   | 2               | Cluster 2  |
| ... | ...             | ...        |
| m   | N               | Cluster C  |

3. (Optional) A `.csv` consisting of a column with the cluster name, and a column for a color. Here is an example of the format below:


| cluster_id | color    |
|------------|----------|
| Cluster 1  | "red"    |
| Cluster 2  | "blue"   |
| ...        | ...      |
| Cluster C  | "yellow" |

### 1. Set the File Paths

In [None]:
base_dir = "../data/example_dataset/"

- `image_dir`: Sets the path to the directory containing the images.
- `seg_dir`: Sets the path to the directory containing the segmentation labels.

In [None]:
image_dir = os.path.join(base_dir, "images")
seg_dir = os.path.join(base_dir, "segmentation", "deepcell_output")

- `fov_cell_cluster_file`: Sets the path to the `.csv` file containing the FOV / image names, segmentation IDs and cluster IDs.

In [None]:
fov_cell_cluster_file = os.path.join(base_dir, "segmentation", "cell_table", "cell_table_size_normalized_cell_labels.csv")

In [None]:
fov_cell_cluster_df = pd.read_csv(
    fov_cell_cluster_file,
)

- `cluster_color_mapping_file`: Sets the path to the `.csv` file containing the cluster names and colors.

In [None]:
cluster_color_mapping_file = os.path.join(base_dir, "segmentation", "cell_table", "cluster_color_mapping.csv")

In [None]:
cluster_color_mapping_df = pd.read_csv(cluster_color_mapping_file)

In [None]:
fovs = ns.natsorted(io_utils.list_files(image_dir))

### 2. Notes on Color Selection

If, instead of using a `.csv` file for the colors, you would like to use a pre-defined color map you can set the `cmap` parameter below to the name of the color map you would like to use. You can find a list of available color maps in matplotlib [here](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html).


Be *very* mindful of your color selection, how you choose to display your information is just as important as the information itself. We have provided a few resources below to help you choose appropriate colors for your data, and come to appreciate the complexity of this topic itself.


**Extra colormap packages to consider:**
- [colorcet](https://colorcet.holoviz.org/) contains a set of colormaps which are specifically designed for categorical data.
- [Fabio Crameri's Scientific Colour Maps](https://www.fabiocrameri.ch/colourmaps.php) contains a set of colormaps which are specifically designed for scientific data. They are also inclusive for various types of color vision deficiencies. 
- [Faboi Crameri's Categorical Colour Maps](https://www.fabiocrameri.ch/categorical-colour-maps/) contains a set of colormaps which are specifically designed for categorical data. They are also inclusive for various types of color vision deficiencies, and are easily readable on black and white print.

**Color Vision deficiencies:**
- You want to make sure that the colors you choose are colorblind friendly. You can use [this tool](https://davidmathlogic.com/colorblind/#%23EA4335-%238E24AA-%234E342E-%234E342E) to check your color selection if you are using a `.csv` file containing custom colors.

**Further reading:**
- [Colour Displays for categorical images](https://strathprints.strath.ac.uk/30312/1/colorpaper_2006.pdf)
- [How rainbow colour maps can distort data and be misleading](https://theconversation.com/how-rainbow-colour-maps-can-distort-data-and-be-misleading-167159)
- Crameri, F., Shephard, G.E. & Heron, P.J. The misuse of colour in science communication. Nat Commun 11, 5444 (2020). https://doi.org/10.1038/s41467-020-19160-7 
  - Highly recommended read

**Other Resources**:
- [Coolers](https://coolors.co/) is an easy tool for generating color palettes. You can lock in colors you like and generate more based off of those colors.
- [Color Brewer](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) is a great tool for generating color palettes, with a visualization example of how the colors will look on a map.

We highly suggest that you utilize these resources and many more that are out there to develop proper, appropriate and inclusive color maps for your data.

In [None]:
cohort_cluster_plot(
    fovs=fovs,
    save_dir="./cluster_masks",
    seg_dir=seg_dir,
    cell_data=fov_cell_cluster_file,
    erode=True,
    fov_col=settings.FOV_ID,
    label_col=settings.CELL_LABEL,
    cluster_col=settings.CELL_TYPE,
    seg_suffix="_whole_cell.tiff",
    cmap=cluster_color_mapping_df,
)

### 2. Continuous Variable Plots

This section of the notebook allows you to plot many images with a color map of your choice. You will need the following:

- `images`: A directory consisting of images measuring a continuous variable of some kind.

The same note about color maps applies here as well.

**Continuous Color map Resources**

- [cmocean](https://matplotlib.org/cmocean/) A collection of commonly used oceanographic color maps adjusted to be perceptually uniform.
- [cmasher](https://cmasher.readthedocs.io/index.html) A collection of color maps designed to be perceptually uniform and colorblind friendly.

In [None]:
images = io_utils.list_files(dir_name="./images")

In [None]:
plot_cohort_continuous_variable(
    images=images,
    image_dir="./images",
    save_dir="./continuous_variables",
    cmap="viridis",
    figsize=(3, 3),
    display_fig=False,
)