Skip to content
Inclusive model of expression dynamics with scSLAM-seq and multiomics, vector field reconstruction and potential landscape mapping
Python
Branch: master
Clone or download
Xiaojie Qiu
Latest commit 1e797f7 Aug 23, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs update acknowledgement in source file Jul 15, 2019
dynamo add Wang method for least action path calculation Aug 24, 2019
tests add Wang method for least action path calculation Aug 24, 2019
.gitignore update velocity.py file, write class for VectorField and Potential Jul 11, 2019
.readthedocs.yml add .readthedocs.yml Jul 15, 2019
.travis.yml first push of dynamo Jul 9, 2019
CODE_OF_CONDUCT.md add fate to support predict cell fate over abitrary time range Aug 16, 2019
CONTRIBUTING.md update CONTRIBUTAING.md Jul 17, 2019
LICENSE first push of dynamo Jul 9, 2019
README.md some tweaking of the text Jul 29, 2019
pyproject.toml
pytest.ini first push of dynamo Jul 9, 2019
setup.py add Wang method for least action path calculation Aug 24, 2019
version.py first push of dynamo Jul 9, 2019

README.md

Dynamo: Mapping Vector Field of Single Cells

Dynamo

Understanding how gene expression in single cells progress over time is vital for revealing the mechanisms governing cell fate transitions. RNA velocity, which infers immediate changes in gene expression by comparing levels of new (unspliced) versus mature (spliced) transcripts (La Manno et al. 2018), represents an important advance to these efforts. A key question remaining is whether it is possible to predict the most probable cell state backward or forward over arbitrary time-scales. To this end, we introduce an inclusive model (termed Dynamo) capable of predicting cell states over extended time periods, that incorporates promoter state switching, transcription, splicing, translation and RNA/protein degradation by taking advantage of scRNA-seq and the co-assay of transcriptome and proteome. We also implement scSLAM-seq by extending SLAM-seq to plate-based scRNA-seq (Hendriks et al. 2018; Erhard et al. 2019; Cao, Zhou, et al. 2019) and augment the model by explicitly incorporating the metabolic labelling of nascent RNA. We show that through careful design of labelling experiments and an efficient mathematical framework, the entire kinetic behavior of a cell from this model can be robustly and accurately inferred. Aided by the improved framework, we show that it is possible to analytically reconstruct the transcriptomic vector field from sparse and noisy vector samples generated by single cell experiments. The analytically reconstructed vector further enables global mapping of potential landscapes that reflects the relative stability of a given cell state, and the minimal transition time and most probable paths between any cell states in the state space This work thus foreshadows the possibility of predicting long-term trajectories of cells during a dynamic process instead of short time velocity estimates. Our methods are implemented as an open source tool, dynamo.

Why single cell SLAM-seq datasets give better RNA velocity estimations

In the scSLAM-seq paper (Erhard et al. 2019), the authors mentioned that they used "new and total RNA levels obtained by scSLAM-seq to replace intronic and exonic read levels and determine ‘NTR velocities’ (personally I think it is still just RNA velocity but takes advantages of metabolic labelling instead of splicing data). And they show that "NTR velocities" produce much better results comparing to splicing data based RNA velocity. However, they didn't provide explicit explanation for this conclusion. In the following, I will try to demonstrate the underlying mathematical reasons (please also check our jupyter notebooks of the application on the recent published NASC-seq dataset (Hendriks et al. 2019)). To start with, if we denote labelled new mRNA as $n$ and total mRNA as $t$, then the above statement essentially replaces the following ODE from RNA velocity paper (La Manno et al. 2018):

$$\dot{s} = \beta u - \gamma s$$

by

$$\dot{t} = n - \gamma t,$$

where $t, n$ are the total mRNA and labelled new mRNA respectively. Note that the amount of labelled new mRNA can be calculated as $\lambda \alpha$ for a fixed time, for example, 1 hour as long as $\alpha$ is the transcription rate corresponds to that time period. $\lambda$ is the rate of labelling for a gene. This equation basically can be read as that the labelled new RNA, like unspliced mRNA represents cell's future state, and will "convert" into current state of total RNA, like spliced mRNA, and that again similar to spliced mRNA, the total RNA will degrade. Under this model, during the labelling period, if the total RNA remains constant, then the newly synthesized (labeled RNA) and the degraded RNA are equal, then we don’t velocity. But if we have more RNA are synthesized (and thus labelled), the velocity is positive and vice versa.

Although the 4sU misincoporation rate for each base T is low (~1.5%), a gene has many copies of T, thus the overal labelling rate for a gene that measured by sequencing adds up and get pretty close to one (> 60%). This capturing rate is significantly higher comparing to the ~20% intron reads in regular scRNA-seq which used in the original RNA velocity paper. Thus RNA velocity based on scSLAM-seq data is dramatically better than the splicing only data.

Installation

Note that this is our first alpha version of Dynamo (as of July 9th, 2019). Dynamo is still under active development. Stable version of Dynamo will be released when it is ready. Until then, please use Dynamo with caution. We welcome any bugs reports (via GitHub issue reporter) and especially code contribution (via GitHub pull requests) of Dynamo from users to make it an accessible, useful and extendable tool. For discussion about different usage cases, comments or suggestions related to our manuscript and questions regarding the underlying mathematical formulation of dynamo, we provided a google group goolge group. Dynamo developers can be reached by xqiu.sc@gmail.com. To install the newest version of dynamo, you can git clone our repo and then use::

pip install directory_to_dynamo_release_repo/

Alternatively, You can install Dynamo from source, using the following script:

pip install git+https://github.com:aristoteleo/dynamo-release

Citation

Xiaojie Qiu, Yan Zhang, Dian Yang, Shayan Hosseinzadeh, Li Wang, Ruoshi Yuan, Song Xu, Yian Ma, Joseph Replogle, Spyros Darmanis, Jianhua Xing, Jonathan S Weissman (2019): Mapping Vector Field of Single Cells. BioRxiv

biorxiv link: https://www.biorxiv.org/content/10.1101/696724v1

Theory behind dynamo

For the vector field reconstruction and potential landscape mapping, please refer to our preprint. We also released the complete derivation of the matrix form of the moment generation functions for parameter estimation in full_derivation.pdf file in the dynamo-notebook GitHub repo.

The dynamo-notebook repo also provides tutorials on how to use dynamo for reconstructing vector field, calculating least action path and potential of cell states.

Acknowledgement

We would like to sincerely thank the developers of velocyto (La Manno Gioele and others), scanpy (Alex Wolf and others) and svelo (Volker Bergen and others) on their amazing tools which demonstrate the best practice of scientific programming in Python. Dynamo takes various technical inspiration from those packages. Dynamo is (we are trying to and hope users can contribute to) fully compatible with those tools and velocity estimation from either scvelo or velocyto can both be used as input to learn the functional form of vector field for predicting the cell fate over extended time period and mapping global cell state potential.

Contribution

If you want to contribute to the development of dynamo, please check out CONTRIBUTION instruction: Contribution

Documentation

The documentation of dynamo package is available at readthedocs

You can’t perform that action at this time.