# Zebrafish pigmentation

This tutorial uses data from [Saunders, et al (2019)](https://elifesciences.org/articles/45181). Special thanks also go to [Lauren](https://twitter.com/LSaund11) for the tutorial improvement. 

In this [study](https://elifesciences.org/articles/45181), the authors profiled thousands of neural crest-derived cells from trunks of post-embryonic zebrafish. These cell classes include pigment cells, multipotent pigment cell progenitors, peripheral neurons, Schwann cells, chromaffin cells and others. These cells were collected during an active period of post-embryonic development, which has many similarities to fetal and neonatal development in mammals, when many of these cell types are migrating and differentiating as the animal transitions into its adult form. This study also explores the role of thyroid hormone (TH), a common endocrine factor, on the development of these different cell types. 

Such developmental and other dynamical processes are especially suitable for dynamo analysis as dynamo is designed to accurately estimate direction and magnitude of expression dynamics (`RNA velocity`), predict the entire lineage trajectory of any intial cell state (`vector field`), characterize the structure (`vector field topology`) of full gene expression space, as well as fate commitment potential (`single cell potential`). 

Import the package and silence some warning information (mostly `is_categorical_dtype` warning from anndata)

In [1]:
import warnings
warnings.filterwarnings('ignore')

import dynamo as dyn 
from dynamo.configuration import DKM
import numpy as np

this is like R's sessionInfo() which helps you to debug version related bugs if any. 

In [2]:
dyn.get_all_dependencies_version()

package,dynamo-release,pre-commit,colorcet,cvxopt,hdbscan,loompy,matplotlib,networkx,numba,numdifftools,numpy,pandas,pynndescent,python-igraph,scikit-learn,scipy,seaborn,setuptools,statsmodels,tqdm,trimap,umap-learn
version,1.0.0,2.15.0,2.0.6,1.2.7,0.8.27,3.0.6,3.4.3,2.6.3,0.54.0,0.9.40,1.20.3,1.3.3,0.5.4,0.9.6,0.24.2,1.7.1,0.11.2,58.0.4,0.12.2,4.62.3,1.0.15,0.5.1


## Load data

In [3]:
adata = dyn.sample_data.zebrafish()


|-----> Downloading data to ./data/zebrafish.h5ad


## Run pearson residual functions

In [4]:
print(adata.X.data.mean())
print(adata.X.data.std())

2.93308
7.64057


In [5]:
import pearson_residual_normalization_recipe

In [6]:
pearson_residual_normalization_recipe.select_genes_by_pearson_residual(adata)

|-----> Gene selection and normalization on layer: X
|-----> extracting highly variable genes


In [7]:
print(adata.X.data.mean())
print(adata.X.data.std())

2.93308
7.6405706


In [8]:
adata.uns["pp"].keys()

dict_keys(['hvg'])

## Apply pearson residual normalization on different layers according `select_genes_key`. We may use other select_genes function combined with pearson residual normalization.

In [13]:
pearson_residual_normalization_recipe.normalize_pearson_residuals(adata, select_genes_key=DKM.VAR_GENE_HIGHLY_VARIABLE_KEY)
pearson_residual_normalization_recipe.normalize_pearson_residuals(adata, select_genes_key=DKM.VAR_USE_FOR_PCA)
pearson_residual_normalization_recipe.normalize_layers_pearson_residuals(adata, layers=["spliced", "unspliced"], select_genes_key=DKM.VAR_USE_FOR_PCA)

|-----> normalize with selected genes.
|-----> applying Pearson residuals to X
|-----> replacing layer <X> with pearson residual normalized data.


pp pearson store key: X_pearson_residuals_normalization_params


|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [1.9587s]
|-----> normalize with selected genes.
|-----> applying Pearson residuals to X


pp pearson store key: X_pearson_residuals_normalization_params


|-----> replacing layer <X> with pearson residual normalized data.
|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [1.2813s]
|-----> normalize with selected genes.
|-----> applying Pearson residuals to spliced


pp pearson store key: spliced_pearson_residuals_normalization_params


|-----> replacing layer <spliced> with pearson residual normalized data.
|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [0.4625s]
|-----> normalize with selected genes.
|-----> applying Pearson residuals to unspliced


pp pearson store key: unspliced_pearson_residuals_normalization_params


|-----> replacing layer <unspliced> with pearson residual normalized data.
|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [0.4915s]


In [10]:
# adata = dyn.sample_data.zebrafish()
from dynamo.preprocessing import Preprocessor
preprocessor = Preprocessor(select_genes_function=pearson_residual_normalization_recipe.select_genes_by_pearson_residual,
                            normalize_selected_genes_function=pearson_residual_normalization_recipe.normalize_layers_pearson_residuals)
preprocessor.preprocess_adata(adata)


|-----> Running preprocessing pipeline...
|-----------> <insert> {} to uns['pp'] in AnnData Object.
|-----------> <insert> tkey=None to uns['pp'] in AnnData Object.
|-----------> <insert> experiment_type=conventional to uns['pp'] in AnnData Object.
|-----> making adata observation index unique...
|-----> applying collapse species adata...
|-----> applying convert_gene_name function...
|-----> filtering outlier cells...
|-----------> filtering cells by layer:X
|-----------> filtering cells by layer:spliced
|-----------> filtering cells by layer:unspliced
|-----> skip filtering by layer:protein as it is not in adata.
|-----> <insert> pass_basic_filter to obs in AnnData Object.
|-----------> inplace subsetting adata by filtered genes
|-----> filtering outlier genes...
|-----> applying normalizing by cells function...
|-----> selecting genes...
|-----> Gene selection and normalization on layer: X
|-----> extracting highly variable genes
|-----> normalizing selected genes...
|-----> applyin

pp pearson store key: spliced_pearson_residuals_normalization_params


|-----> replacing layer <spliced> with pearson residual normalized data.
|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [2.5014s]
|-----> applying Pearson residuals to unspliced


pp pearson store key: unspliced_pearson_residuals_normalization_params


|-----> replacing layer <unspliced> with pearson residual normalized data.
|-----> [pearson residual normalization] in progress: 100.0000%
|-----> [pearson residual normalization] finished [2.5837s]
|-----> applying log1p transformation on expression matrix data (adata.X)...
|-----> applying filter genes function...
|-----> Gene selection and normalization on layer: X
|-----> extracting highly variable genes
|-----> appended 0 extra genes as required...
|-----> excluded 0 genes as required...
