# Zebrafish pigmentation

This tutorial uses data from [Saunders, et al (2019)](https://elifesciences.org/articles/45181). Special thanks also go to [Lauren](https://twitter.com/LSaund11) for the tutorial improvement. 

In this [study](https://elifesciences.org/articles/45181), the authors profiled thousands of neural crest-derived cells from trunks of post-embryonic zebrafish. These cell classes include pigment cells, multipotent pigment cell progenitors, peripheral neurons, Schwann cells, chromaffin cells and others. These cells were collected during an active period of post-embryonic development, which has many similarities to fetal and neonatal development in mammals, when many of these cell types are migrating and differentiating as the animal transitions into its adult form. This study also explores the role of thyroid hormone (TH), a common endocrine factor, on the development of these different cell types. 

Such developmental and other dynamical processes are especially suitable for dynamo analysis as dynamo is designed to accurately estimate direction and magnitude of expression dynamics (`RNA velocity`), predict the entire lineage trajectory of any intial cell state (`vector field`), characterize the structure (`vector field topology`) of full gene expression space, as well as fate commitment potential (`single cell potential`). 

Import the package and silence some warning information (mostly `is_categorical_dtype` warning from anndata)

In [5]:
import warnings
warnings.filterwarnings('ignore')

import dynamo as dyn 
from dynamo.configuration import DKM
import numpy as np

this is like R's sessionInfo() which helps you to debug version related bugs if any. 

## Load data

In [6]:
adata = dyn.sample_data.pancreatic_endocrinogenesis()
adata_origin = dyn.sample_data.pancreatic_endocrinogenesis()

|-----> Downloading data to ./data/endocrinogenesis_day15.h5ad
|-----> Downloading data to ./data/endocrinogenesis_day15.h5ad


In [7]:
print(adata.X.min(), adata.X.max(), adata.X.mean())
adata.layers["spliced"] = adata.layers["spliced"].toarray()
temp = adata.layers["spliced"][~np.isnan(adata.layers["spliced"])]
print("unspliced data ")
print(temp.min(), temp.max(), temp.mean(),  temp.std())
adata.layers["unspliced"] = adata.layers["unspliced"].toarray()
temp = adata.layers["unspliced"][~np.isnan(adata.layers["unspliced"])]
print(temp.min(), temp.max(), temp.mean(), temp.std())

0.0 2286.0 0.23841056


MemoryError: Unable to allocate 395. MiB for an array with shape (103480608,) and data type float32

## Apply pearson residual normalization on different layers according `select_genes_key`. We may use other select_genes function combined with pearson residual normalization.

In [None]:
# adata = dyn.sample_data.zebrafish()
from dynamo.preprocessing import Preprocessor
import pearson_residual_normalization_recipe
preprocessor = Preprocessor()
preprocessor.preprocess_adata(adata)
dyn.tl.reduceDimension(adata,basis="pca")
dyn.pl.umap(adata, color=["S_score", "G2M_score", "clusters"], figsize=(12, 12))
dyn.tl.dynamics(adata, model='stochastic', cores=3) 
dyn.pl.streamline_plot(adata, color=['clusters'], basis='umap', show_legend='on data', show_arrowed_spines=True);


In [None]:
adata

In [None]:
dyn.tl.reduceDimension(adata,basis="pca")
dyn.pl.umap(adata, color=["S_score", "G2M_score", "clusters"], figsize=(12, 12))

In [None]:
adata

In [None]:
dyn.tl.dynamics(adata, model='stochastic', cores=3) 

In [None]:
adata.var_names[adata.var[DKM.VAR_GENE_HIGHLY_VARIABLE_KEY]]
print("#highly variable genes", (~adata.var["highly_variable_rank"].isna()).sum())

highly_variable_genes = adata.var_names[~adata.var["highly_variable_rank"].isna()]
adata.var["highly_variable_rank"][highly_variable_genes].sort_values(ascending=False);

In [None]:
dyn.pl.streamline_plot(adata, color=['clusters'], basis='umap', show_legend='on data', show_arrowed_spines=True);


In [None]:
dyn.pl.umap(adata, color=['tfec', 'pnp4a'])

In [None]:
adata[:, highly_variable_genes].var["highly_variable_rank"].sort_values(ascending=False)

In [None]:
dyn.pl.phase_portraits(adata, genes=['Abcb7', 'Hectd3'],  figsize=(6, 4), color='clusters')

In [None]:
print(adata.X.min(), adata.X.max(), adata.X.mean(),  temp.std())
temp = adata.layers["spliced"][~np.isnan(adata.layers["spliced"])]
print(temp.min(), temp.max(), temp.mean(),  temp.std())
temp = adata.layers["unspliced"][~np.isnan(adata.layers["unspliced"])]
print(temp.min(), temp.max(), temp.mean(), temp.std())

In [None]:
adata.uns["pp"]