# `AnnData` Conversion

The purpose of this notebook is to convert the cell table to a [`AnnData`](https://anndata.readthedocs.io/en/latest/index.html) Object.

`AnnData` stands for Annotated Data, and is a data structure well suited for single cell data. It is a multi-faceted object composed of matrices and DataFrames

In [1]:
from dask.distributed import Client
import dask.dataframe as dd
from anndata import AnnData, read_zarr
import os

In [2]:
import dask
dask.config.set({"visualization.engine": "graphviz"});

In [3]:
Client(n_workers = 4, threads_per_worker = 2)

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 8,Total memory: 64.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:52288,Workers: 4
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 64.00 GiB

0,1
Comm: tcp://127.0.0.1:52299,Total threads: 2
Dashboard: http://127.0.0.1:52303/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:52291,
Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-nad5fddy,Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-nad5fddy

0,1
Comm: tcp://127.0.0.1:52300,Total threads: 2
Dashboard: http://127.0.0.1:52304/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:52293,
Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-tgkgz4_m,Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-tgkgz4_m

0,1
Comm: tcp://127.0.0.1:52301,Total threads: 2
Dashboard: http://127.0.0.1:52307/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:52295,
Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-2eoiorkr,Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-2eoiorkr

0,1
Comm: tcp://127.0.0.1:52302,Total threads: 2
Dashboard: http://127.0.0.1:52309/status,Memory: 16.00 GiB
Nanny: tcp://127.0.0.1:52297,
Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-ssd6iipk,Local directory: /var/folders/fy/q2szypn9325d_0g0nq049k300000gq/T/dask-scratch-space/worker-ssd6iipk


In [4]:
base_dir = "../data/example_dataset/"

## 0. Download the Example Dataset

Here we are using the example data located in `/data/example_dataset/input_data`. To modify this notebook to run using your own data, simply change `base_dir` to point to your own sub-directory within the data folder.

* `base_dir`: the path to all of your imaging data. This directory will contain all of the data generated by this notebook, as well as the data previously generated by segmentation and cell clustering.

In [5]:
from ark.utils.example_dataset import get_example_dataset

get_example_dataset(dataset="post_clustering", save_dir= base_dir, overwrite_existing=True)



In [6]:
cell_table_path = os.path.join(base_dir, "segmentation/cell_table/cell_table_size_normalized_cell_labels.csv")

- `markers`: These are the names of the markers that you wish to extract from the Cell Table. You can specify each marker that you would like to use, or you may set it to `None` in order to grab all markers.

In [7]:
# markers = ["CD14", "CD163", "CD20", "CD3", "CD31", "CD4", "CD45", "CD68", "CD8", "CK17", "Collagen1", "ECAD",
#              "Fibronectin", "GLUT1", "H3K27me3", "H3K9ac", "HLADR", "IDO", "Ki67", "PD1", "SMA", "Vim"]
markers = None

In [8]:
from ark.utils.data_utils import ConvertToAnnData, load_anndatas

In [9]:
convert_to_anndata = ConvertToAnnData(cell_table_path, markers=markers)

In [10]:
f = convert_to_anndata.convert_to_adata(save_dir=os.path.join(base_dir, "anndata"))

In [11]:
fov0 = read_zarr("../data/example_dataset/anndata/fov0.zarr")

In [12]:
fovs_AC = load_anndatas(anndata_dir=os.path.join(base_dir, "anndata"), join_obsm="inner")



In [16]:
from ark.utils.data_utils import AnnDataIterDataPipe

fovs_dp = AnnDataIterDataPipe(fovs_AC).filter(lambda x: x.obs.fov.iloc[0] in ("fov0", "fov1", "fov2")).shuffle()

In [17]:
from torchdata.dataloader2 import DataLoader2
dl = DataLoader2(datapipe=fovs_dp)

In [18]:
for fov in dl:
    print(fov)

AnnData object with n_obs × n_vars = 1278 × 22
    obs: 'label', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'convex_area', 'equivalent_diameter', 'major_minor_axis_ratio', 'perim_square_over_area', 'major_axis_equiv_diam_ratio', 'convex_hull_resid', 'centroid_dif', 'num_concavities', 'fov', 'cell_meta_cluster'
    obsm: 'spatial'
AnnData object with n_obs × n_vars = 745 × 22
    obs: 'label', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'convex_area', 'equivalent_diameter', 'major_minor_axis_ratio', 'perim_square_over_area', 'major_axis_equiv_diam_ratio', 'convex_hull_resid', 'centroid_dif', 'num_concavities', 'fov', 'cell_meta_cluster'
    obsm: 'spatial'
AnnData object with n_obs × n_vars = 669 × 22
    obs: 'label', 'area', 'eccentricity', 'major_axis_length', 'minor_axis_length', 'perimeter', 'convex_area', 'equivalent_diameter', 'major_minor_axis_ratio', 'perim_square_over_area', 'major_axis_equiv_diam_ratio', '

ImportError: attempted relative import with no known parent package