Examples | Tools | Comparable Software | Installation | References
!! This is just a testing version that solely includes DPT and Diffusion Maps. !!
!! Comments are welcome. !!
Tools for analyzing and simulating single-cell data that aim at an understanding of dynamic biological processes from snapshots of transcriptome or proteome.
-
dpt.py - Perform Diffusion Pseudotime analysis of data as of Haghverdi et al., Nat. Meth. 13, 845 (2016).
-
diffmap.py - Compute Diffusion Map representation of data as of Coifman et al., PNAS, 102, 7426 (2005).
The following examples assume you use the Python scripts in tools, which work without installation. You might modify these scripts to your own taste, for example, by adding more examples in tools/preprocess.py. In case you prefer working with jupyter notebooks, you might look at examples in examples/examples.ipynb.
Download or clone the repository and cd
into its root directory. The package
has been tested using a Anaconda
environments for Python 2 and 3.
Data of Paul et al. (2015)
Segment 1 corresponds to a branch of granulocyte/macrophage progenitors (GMP), segment 3 corresponds to a branch of megakaryocyte/erythrocyte progenitors (MEP).
$ python tools/dpt.py paul15
Data of Moignard et al. (2015)
Segment 3 corresponds to a branch of erythorocytes, segment 1 and 2 to a branch of endothelial cells.
$ python tools/dpt.py moignard15
In case you just want to get a quick visualization using the diffusion map representation.
$ python tools/diffmap.py moignard15
We are not satisfied with taking the logarithm of the count matrix before
running DPT for the data of Paul et al. (2015) as in example
paul15
above. We copy the entry paul15
from the dicionary examples
in
scanpy/preprocess.py and paste it into the dictionary
examples
in tools/preprocess.py. We then rename the key
of the new entry to "paul15_nolog"
. We do the same with the function
paul15
, where we remove the log transform and rename it to
paul15_nolog
.
Running paul15_nolog
, we observe a considerably changed representation. Here,
we identify segment 3 with the branch of granulocyte/macrophage progenitors
(GMP) and segment 2 with the branch of megakaryocyte/erythrocyte progenitors
(MEP).
$ python tools/dpt.py paul15_nolog
Simulated myeloid progenitor data (Krumsiek et al., 2011)
Here, we are going to simulate some data using a literature curated boolean gene regulatory network, which is believed to describe myeloid differentiation (Krumsiek et al., 2011). Using sim.py, the boolean model is translated into a stochastic ordinary differential equation (Wittman et al., 2009). Simulations result in branching time series of gene expression, where each branch corresponds to a certain cell fate of common myeloid progenitors (megakaryocytes, erythrocytes, granulocytes and monocytes).
$ python tools/sim.py krumsiek11
If the order is shuffled, as in a snapshot, the same data looks as follows
Let us reconstruct an order according to estimating geodesic distance with DPT. By that, we obtain the branching lineage using
$ python tools/dpt.py krumsiek11
The left panel illustrates how the data is organized according to a pseudotime and different segments. Pseudotime is an estimator of geodesic distance on the manifold from an initial point. Segments are discrete partitions of the data. Both can be visualized in the diffusion map representation.
Here, each tool is described in more detail.
diffmap.py implements diffusion maps Coifman et al. (2005), which has been proposed for visualizing single-cell data by Haghverdi et al. (2015). Also, diffmap.py accounts for modifications to the original algorithm proposed by Haghverdi et al. (2016).
dpt.py implements Diffusion Pseudotime as introduced by Haghverdi et al. (2016).
The functions of these two tools compare to the R package destiny of Angerer et al. (2015).
This section compiles software packages that are comparable to scanpy, but differ substantially in implementation, usage and tools provided. A more comprehensive list can be found here.
- Destiny - [R] - Diffusion Maps and Diffusion Pseudotime in R as of Angerer et al. (2015).
For usage of the scripts in tools from the root of the repository, no installation is needed.
If you want to import scanpy
from anywhere on your system, you can install it
locally via
$ pip install .
You can also install the package with symlinks, so that changes on your version of the package become immediately available
$ pip install -e .
Your work on the scripts in tools will not be affected by
installation. These scripts insert the root of the repository at the beginning
of the search path via sys.path.insert(0,'.')
and hence load scanpy
locally.
Angerer et al. (2015), destiny - diffusion maps for large-scale single-cell data in R, Bioinformatics 32, 1241.
Coifman et al. (2005), Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps, PNAS 102, 7426.
Haghverdi et al. (2015), Diffusion maps for high-dimensional single-cell analysis of differentiation data, Bioinformatics 31, 2989.
Haghverdi et al. (2016), Diffusion pseudotime robustly reconstructs branching cellular lineages, Nature Methods 13, 845.
Moignard et al. (2015), Decoding the regulatory network of early blood development from single-cell gene expression measurements, Nature Biotechnology 33, 269.
Paul et al. (2015), Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors, Cell 163, 1663.
Wittman et al. (2009), Transforming Boolean models to continuous models: methodology and application to T-cell receptor signaling, BMC Systems Biology 3, 98.