# CytoNormPy - AnnData objects

In this vignette, we showcase a typical analysis workflow using anndata objects.

First, we import the necessary libraries and create the anndata object.

In [1]:
import cytonormpy as cnp

import anndata as ad
import pandas as pd
import os
import numpy as np

from cytonormpy import FCSFile

## AnnData creation

We use the internal representation to create an AnnData object as follows:

In [2]:
input_directory = "../_resources/"
fcs_files = [file for file in os.listdir(input_directory) if file.endswith(".fcs")]
adatas = []

metadata = pd.read_csv(os.path.join(input_directory, "metadata_sid.csv"))
for file_no, file in enumerate(fcs_files):
    fcs = FCSFile(input_directory = input_directory,
                  file_name = file)
    events = fcs.original_events
    md_row = metadata.loc[metadata["file_name"] == file, :].to_numpy()
    obs = np.repeat(md_row, events.shape[0], axis = 0)
    var_frame = fcs.channels
    obs_frame = pd.DataFrame(
        data = obs,
        columns = metadata.columns,
        index = pd.Index([f"{file_no}-{str(i)}" for i in range(events.shape[0])])
    )
    adata = ad.AnnData(
        obs = obs_frame,
        var = var_frame,
        layers = {"compensated": events}
    )
    adata.obs_names_make_unique()
    adata.var_names_make_unique()
    adatas.append(adata)

dataset = ad.concat(adatas, axis = 0, join = "outer", merge = "same")
dataset.obs = dataset.obs.astype("object")
dataset.var = dataset.var.astype("object")
dataset.obs_names_make_unique()
dataset.var_names_make_unique()

In [3]:
dataset

AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated'

## Data setup

We instantiate the cytonorm object and add a data transformer that will transform our data to the asinh space and the clusterer that will cluster the cells.

In [4]:
cn = cnp.CytoNorm()

t = cnp.AsinhTransformer()
fs = cnp.FlowSOM(n_clusters = 10)

cn.add_transformer(t)
cn.add_clusterer(fs)



Next, we run the `run_anndata_setup()` method.

In [5]:
cn.run_anndata_setup(dataset,
                     layer = "compensated",
                     key_added = "normalized")

## Clustering

We run the FlowSOM clustering and pass a `cluster_cv_threshold` of 2. This value is used to evaluate if the distribution of files within one cluster is sufficient. A warning will be raised if that is not the case.

In [6]:
cn.run_clustering(cluster_cv_threshold = 2)

## Calculation

Finally, we calculate the quantiles per batch and cluster, calculate the spline functions and transform the expression values accordingly.

The data will automatically be saved to the anndata object in the layer "normalized". In order to change the layer name, use the keyword `key_added` in the `run_anndata_setup()` method from above.

In [7]:
cn.calculate_quantiles()
cn.calculate_splines(goal = "batch_mean")
cn.normalize_data()

  self.distrib = mean_func(


normalized file Gates_PTLG021_Unstim_Control_2.fcs
normalized file Gates_PTLG028_Unstim_Control_2.fcs
normalized file Gates_PTLG034_Unstim_Control_2.fcs


In [8]:
dataset

AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated', 'normalized'