# CytoNormPy - AnnData objects

In this vignette, we showcase a typical analysis workflow using anndata objects.

First, we import the necessary libraries and create the anndata object.

In [1]:
import cytonormpy as cnp

import anndata as ad
import pandas as pd
import os
import numpy as np

import anndata as ad

from cytonormpy import FCSFile

## AnnData creation

We use the internal representation to create an AnnData object as follows:

In [2]:
def _fcs_to_anndata(input_directory,
                    file,
                    file_no,
                    metadata) -> ad.AnnData:
    fcs = FCSFile(input_directory = input_directory,
                  file_name = file)
    events = fcs.original_events
    md_row = metadata.loc[metadata["file_name"] == file, :].to_numpy()
    obs = np.repeat(md_row, events.shape[0], axis = 0)
    var_frame = fcs.channels
    obs_frame = pd.DataFrame(
        data = obs,
        columns = metadata.columns,
        index = pd.Index([f"{file_no}-{str(i)}" for i in range(events.shape[0])])
    )
    adata = ad.AnnData(
        obs = obs_frame,
        var = var_frame,
        layers = {"compensated": events}
    )
    adata.obs_names_make_unique()
    adata.var_names_make_unique()
    return adata

In [3]:
input_directory = "../_resources/"
fcs_files = [
    'Gates_PTLG021_Unstim_Control_1.fcs',
    'Gates_PTLG021_Unstim_Control_2.fcs',
    'Gates_PTLG028_Unstim_Control_1.fcs',
    'Gates_PTLG028_Unstim_Control_2.fcs',
    'Gates_PTLG034_Unstim_Control_1.fcs',
    'Gates_PTLG034_Unstim_Control_2.fcs'
]
adatas = []
metadata = pd.read_csv(os.path.join(input_directory, "metadata_sid.csv"))
for file_no, file in enumerate(fcs_files):
    adatas.append(
        _fcs_to_anndata(input_directory, file, file_no, metadata)
    )

dataset = ad.concat(adatas, axis = 0, join = "outer", merge = "same")
dataset.obs = dataset.obs.astype("object")
dataset.var = dataset.var.astype("object")
dataset.obs_names_make_unique()
dataset.var_names_make_unique()

In [4]:
dataset

AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated'

## Data setup

We instantiate the cytonorm object and add a data transformer that will transform our data to the asinh space and the clusterer that will cluster the cells.

In [5]:
cn = cnp.CytoNorm()

t = cnp.AsinhTransformer()
fs = cnp.FlowSOM(n_clusters = 10)

cn.add_transformer(t)
cn.add_clusterer(fs)



Next, we run the `run_anndata_setup()` method.

In [6]:
cn.run_anndata_setup(dataset,
                     layer = "compensated",
                     key_added = "normalized")

## Clustering

We run the FlowSOM clustering and pass a `cluster_cv_threshold` of 2. This value is used to evaluate if the distribution of files within one cluster is sufficient. A warning will be raised if that is not the case.

In [7]:
cn.run_clustering(cluster_cv_threshold = 2)

## Calculation

Finally, we calculate the quantiles per batch and cluster, calculate the spline functions and transform the expression values accordingly.

The data will automatically be saved to the anndata object in the layer "normalized". In order to change the layer name, use the keyword `key_added` in the `run_anndata_setup()` method from above.

In [8]:
cn.calculate_quantiles()
cn.calculate_splines(goal = "batch_mean")
cn.normalize_data()

  self.distrib = mean_func(


normalized file Gates_PTLG028_Unstim_Control_2.fcs
normalized file Gates_PTLG021_Unstim_Control_2.fcs
normalized file Gates_PTLG034_Unstim_Control_2.fcs


In [9]:
dataset

AnnData object with n_obs × n_vars = 6000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    var: 'pns', 'png', 'pne', 'channel_numbers'
    layers: 'compensated', 'normalized'

In order to run the algorithm on new data, we can just pass the updated anndata and specify the necessary file names.

We will first create the new anndata object that contains an additional file.

In [10]:
filename = "Gates_PTLG034_Unstim_Control_2_dup.fcs"
metadata = pd.DataFrame(
    data = [[filename, "other", 3]],
    columns = ["file_name", "reference", "batch"]
)
new_adata = _fcs_to_anndata(input_directory, filename, 7, metadata)

dataset = ad.concat([dataset, new_adata], axis = 0, join = "outer")
dataset

AnnData object with n_obs × n_vars = 7000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    layers: 'compensated', 'normalized'

Currently, all 'normalized' values for the new file are NaN:

In [11]:
dataset[dataset.obs["file_name"] == filename,:].to_df(layer = "normalized").head()

Unnamed: 0,Time,Event_length,Y89Di,Pd102Di,Pd104Di,Pd105Di,Pd106Di,Pd108Di,Pd110Di,In113Di,...,Yb171Di,Yb172Di,Yb173Di,Yb174Di,Lu175Di,Yb176Di,Ir191Di,Ir193Di,Pt195Di,beadDist
7-0,,,,,,,,,,,...,,,,,,,,,,
7-1,,,,,,,,,,,...,,,,,,,,,,
7-2,,,,,,,,,,,...,,,,,,,,,,
7-3,,,,,,,,,,,...,,,,,,,,,,
7-4,,,,,,,,,,,...,,,,,,,,,,


In [12]:
cn.normalize_data(adata = dataset,
                  file_names = filename,
                  batches = 3)

normalized file Gates_PTLG034_Unstim_Control_2_dup.fcs


The normalized values are now stored inplace!

In [13]:
dataset[dataset.obs["file_name"] == filename,:].to_df(layer = "normalized").head()

Unnamed: 0,Time,Event_length,Y89Di,Pd102Di,Pd104Di,Pd105Di,Pd106Di,Pd108Di,Pd110Di,In113Di,...,Yb171Di,Yb172Di,Yb173Di,Yb174Di,Lu175Di,Yb176Di,Ir191Di,Ir193Di,Pt195Di,beadDist
7-0,134.582993,16.0,0.0,7.228584,7.189367,71.29483,5.702826,104.989067,98.768669,0.0,...,0.0,2.360246,0.0,2.092115,0.883527,23.012224,36.423241,115.555214,0.0,30.672935
7-1,307.86499,25.0,0.002206,12.507555,9.873809,163.776979,-58890.808302,257.224193,95.971925,0.015925,...,8.336418,2.261871,44.503762,292.58863,27.54992,9.856425,45.391734,55.241609,0.0,24.536996
7-2,370.299011,13.0,0.003463,36.799025,13.417882,211.015165,20.976627,276.136718,149.921257,0.004231,...,7.125834,91.484564,2.062176,0.01485,0.014355,0.868086,123.887066,262.643249,0.00123,36.182745
7-3,390.078003,25.0,0.002691,3.249339,6.472832,135.29266,3.016704,168.964218,1647.904436,0.000168,...,2.134535,2.635778,45.804745,7.486548,0.000412,16.518124,78.197299,151.034121,0.0,33.435956
7-4,723.723999,15.0,0.0,4.033677,0.0,23.49243,0.0,48.940914,30.778446,3.79425,...,0.0,0.0,0.0,0.18023,0.0,3.118176,4.195136,9.201713,0.0,31.036688


In [14]:
dataset

AnnData object with n_obs × n_vars = 7000 × 55
    obs: 'file_name', 'reference', 'batch', 'sample_ID'
    layers: 'compensated', 'normalized'