# `AnnData` Conversion

The purpose of this notebook is to convert the cell table to a [`AnnData`](https://anndata.readthedocs.io/en/latest/index.html) Object.

`AnnData` stands for Annotated Data, and is a data structure well suited for single cell data. It is a multi-faceted object composed of matrices and DataFrames which can be used to efficiently store and interact with our data.

The following is a representation of the `AnnData` object schema:

<p align="center">
  <img width="50%" src="../docs/_images/Anndata_Schema.png" alt="AnnData Schema"/>
</p>

This notebook will move the following portions of the Cell Table to a `AnnData` object:
- Markers / Channel columns get stored in `.X`
- The X and Y Centroids get stored in `.obs`
- The rest of the cell table gets stored in `.obs` (includes columns such as `area`, `perimeter`, `cell_meta_cluster`)

In [None]:
from anndata import read_zarr
from ark.utils.data_utils import ConvertToAnnData
import os

In [None]:
base_dir = "../data/example_dataset/"

## 0. Download the Example Dataset

Here we are using the example data located in `/data/example_dataset/input_data/`. To modify this notebook to run using your own data, simply change `base_dir` to point to your own sub-directory within the data folder.

* `base_dir`: the path to all of your imaging data. This directory will contain all of the data generated by this notebook, as well as the data previously generated by segmentation and cell clustering.

In [None]:
from ark.utils.example_dataset import get_example_dataset

get_example_dataset(dataset="post_clustering", save_dir= base_dir, overwrite_existing=True)

## 1. Convert the Cell Table to `AnnData` Objects

- `cell_table_path`: The path to the cell table that you wish to convert to `AnnData` objects. 
- `anndata_save_dir`: The directory where you would like to save the `AnnData` objects. This directory will be created if it does not already exist.

In [None]:
cell_table_path = os.path.join(base_dir, "segmentation/cell_table/cell_table_size_normalized_cell_labels.csv")
anndata_save_dir = os.path.join(base_dir, "anndata")

- `markers`: These are the names of the markers that you wish to extract from the Cell Table. You can specify each marker that you would like to use, or you may set it to `"auto"` in order to grab all markers.
- `extra_obs_parameters`: By default the conversion extracts a specific set of columns for the `obs` DataFrame, and all columns to the left of `"label"`. If you would like to add additional columns to the `obs` DataFrame, you can specify them with this parameter.

In [None]:
# markers = ["CD14", "CD163", "CD20", "CD3", "CD31", "CD4", "CD45", "CD68", "CD8", "CK17", "Collagen1", "ECAD",
#               "Fibronectin", "GLUT1", "H3K27me3", "H3K9ac", "HLADR", "IDO", "Ki67", "PD1", "SMA", "Vim"]
markers = "auto"
extra_obs_parameters = None

In [None]:
convert_to_anndata = ConvertToAnnData(cell_table_path, markers=markers, extra_obs_parameters=extra_obs_parameters)

In [None]:
fov_adata_paths = convert_to_anndata.convert_to_adata(save_dir=anndata_save_dir)

We recommend reading both a brief overview of the `AnnData` datatype documentation [here](https://ark-analysis.readthedocs.io/en/latest/_rtd/data_types.html), and the official documentation [here](https://anndata.readthedocs.io/en/latest/index.html).