# RNA Velocity Basics

Here you will learn the basics of RNA velocity analysis.

For illustration, it is applied to endocrine development in the pancreas, with lineage commitment to four major fates: α, β, δ and ε-cells. <br/> 
See [here](https://scvelo.readthedocs.io/scvelo.datasets.pancreas.html) for more details. It can be applied to your own data along the same lines. 

The notebook is also available at
[Google Colab](https://colab.research.google.com/github/theislab/scvelo_notebooks/blob/master/VelocityBasics.ipynb)
and [nbviewer](https://nbviewer.jupyter.org/github/theislab/scvelo_notebooks/blob/master/VelocityBasics.ipynb).

In [1]:
# update to the latest version, if not done yet.
!pip install scvelo --upgrade --quiet

In [2]:
import scvelo as scv
scv.logging.print_version()

Running scvelo 0.2.4 (python 3.8.12) on 2021-11-03 12:49.


In [3]:
import dynamo
import dynamo as dyn
from dynamo.preprocessing import Preprocessor
import pearson_residual_normalization_recipe

dyn_adata = dyn.sample_data.zebrafish()
print("original data shape:", dyn_adata.shape)

preprocessor = Preprocessor()
preprocessor.config_monocle_recipe(dyn_adata) # use monocle as default base config
preprocessor.config_seurat_recipe()
preprocessor.preprocess_adata(dyn_adata);

|-----> Downloading data to ./data/zebrafish.h5ad
|-----------> <insert> {} to uns['pp'] in AnnData Object.
|-----> Running preprocessing pipeline...
|-----------> <insert> {} to uns['pp'] in AnnData Object.
|-----------> <insert> tkey=None to uns['pp'] in AnnData Object.
|-----------> <insert> experiment_type=None to uns['pp'] in AnnData Object.
|-----> making adata observation index unique...
|-----> applying collapse species adata...
|-----> applying convert_gene_name function...
|-----> filtering outlier cells...
|-----> cell filter kwargs:{'filter_bool': None, 'layer': 'all', 'min_expr_genes_s': 50, 'min_expr_genes_u': 25, 'min_expr_genes_p': 2, 'max_expr_genes_s': inf, 'max_expr_genes_u': inf, 'max_expr_genes_p': inf, 'shared_count': None}
|-----------> filtering cells by layer:X
|-----------> filtering cells by layer:spliced


original data shape: (4181, 16940)


|-----------> filtering cells by layer:unspliced
|-----> skip filtering by layer:protein as it is not in adata.
|-----> <insert> pass_basic_filter to obs in AnnData Object.
|-----------> inplace subsetting adata by filtered genes
|-----> filtering outlier genes...
|-----> gene filter kwargs:{'filter_bool': None, 'layer': 'all', 'min_cell_s': 41.81, 'min_cell_u': 20.905, 'min_cell_p': 20.905, 'min_avg_exp_s': 0, 'min_avg_exp_u': 0, 'min_avg_exp_p': 0, 'max_avg_exp': inf, 'min_count_s': 0, 'min_count_u': 0, 'min_count_p': 0, 'shared_count': 30}
|-----> selecting genes...
|-----> filtering genes by dispersion...
|-----> select genes by recipe: seurat
|-----------> choose 2000 top genes
|-----> <insert> pp_gene_means to var in AnnData Object.
|-----> <insert> gene_vars to var in AnnData Object.
|-----> <insert> gene_highly_variable to var in AnnData Object.
|-----> number of selected highly variable genes: 2000
|-----> [filter genes by dispersion] in progress: 100.0000%
|-----> [filter gen

preprocessor pca inputs:
count    463787.000000
mean          0.894726
std           0.772611
min           0.117937
25%           0.379343
50%           0.665709
75%           1.099282
max           6.724962
dtype: float64


|-----> [preprocess] in progress: 100.0000%
|-----> [preprocess] finished [2.9535s]


In [4]:
scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True  # set max width size for presenter view
scv.set_figure_params('scvelo')  # for beautified visualization


### Load the Data

The analysis is based on the in-built [pancreas data](https://scvelo.readthedocs.io/scvelo.datasets.pancreas).<br/>
To run velocity analysis on your own data, read your file (loom, h5ad, csv …) to an AnnData object with `adata = scv.read('path/file.loom', cache=True)`. If you want to merge your loom file into an already existing AnnData object, use `scv.utils.merge(adata, adata_loom)`.

In [5]:
adata = dynamo.sample_data.zebrafish()

|-----> Downloading data to ./data/zebrafish.h5ad


### Preprocess the Data by scelo

In [6]:
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000, flavor="seurat")
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)


Filtered out 11388 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
computing neighbors
    finished (0:00:02) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:00) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)


In [10]:
print(adata.obsm["X_pca"].shape)
print(dyn_adata.obsm["X_pca"].shape)

(4181, 30)
(4167, 30)


AnnData object with n_obs × n_vars = 4181 × 2000
    obs: 'split_id', 'sample', 'Size_Factor', 'condition', 'Cluster', 'Cell_type', 'umap_1', 'umap_2', 'batch', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
    var: 'gene_count_corr', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'pca', 'neighbors'
    obsm: 'X_pca'
    varm: 'PCs'
    layers: 'spliced', 'unspliced', 'Ms', 'Mu'
    obsp: 'distances', 'connectivities'

In [17]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# plt.plot(adata.obsm["X_pca"].flatten(), dyn_adata.obsm["X_pca"].flatten()).set_title("X_pca comparison")
print("scvelo seurat X_pca stats:")
print(pd.Series(adata.obsm["X_pca"].flatten()).describe())
print("dynamo seurat X_pca stats:")
pd.Series(dyn_adata.obsm["X_pca"].flatten()).describe()

scvelo seurat X_pca stats:
count    1.254300e+05
mean    -2.070514e-07
std      1.605636e+00
min     -8.343900e+00
25%     -6.412283e-01
50%     -2.956188e-02
75%      5.880693e-01
max      1.131858e+01
dtype: float64
dynamo seurat X_pca stats:


count    1.250100e+05
mean     2.404593e-07
std      1.253125e+00
min     -5.792617e+00
25%     -4.875923e-01
50%     -1.182934e-02
75%      4.618027e-01
max      1.245750e+01
dtype: float64