AI-assisted Python port of the DDRTree R package — Discriminative Dimensionality Reduction via learning a Tree — from the KDD'15 paper by Qi Mao, Li Wang, Steve Goodison and Yijun Sun.
Tracks the R CRAN release DDRTree 0.1.6 (2026-02-24).
DDRTree simultaneously:
- Reduces high-dimensional data to a low-dimensional latent space
Z, - Learns an explicit smooth principal tree graph embedded in that space, and
- Obtains a soft clustering of points onto the tree nodes.
It is the dimensionality-reduction backbone used by Monocle 2 for single-cell pseudotime / branching-trajectory inference, but is a general-purpose algorithm for any data with a tree-like intrinsic structure.
# From PyPI (distribution: ddrtree-python, import: ddrtree)
pip install ddrtree-python # NumPy backend only
pip install ddrtree-python[torch] # + PyTorch backend (CPU / CUDA)For local development:
git clone https://github.com/Bio-Babel/DDRTree-python.git
cd DDRTree-python
pip install -e ".[dev]"The core depends on numpy, scipy, and scikit-learn. The optional
torch extra enables the GPU-friendly backend — no C/C++ extensions are
built either way.
import numpy as np
from ddrtree import DDRTree
# X is a D x N matrix (features x samples), matching the R convention.
rng = np.random.default_rng(0)
X = rng.standard_normal((10, 200))
res = DDRTree(X, dimensions=2, max_iter=20,
sigma=1e-3, lambda_=None, ncenter=50,
gamma=10.0, tol=1e-3, verbose=False)
res.Z # 2 x 200 reduced-dimension embedding
res.Y # 2 x 50 principal-graph node coordinates
res.W # 10 x 2 orthogonal projection basis
res.stree # N x N scipy.sparse MST weights (first K x K block populated)
res.objective_vals # objective at each iterationDDRTree dispatches to one of two computational backends via the
backend argument. The public function signature is otherwise unchanged.
Backend selection is always explicit — there is no auto-detection.
backend |
Executes on | Typical use |
|---|---|---|
"numpy" |
CPU (NumPy) | Default. Reference path, aligned with R. |
"torch" |
CPU / CUDA | GPU acceleration (Borůvka MST, fast BLAS). |
# GPU path — requires torch with CUDA
res = DDRTree(X, ncenter=500, backend="torch", device="cuda")
# Half precision (torch backend only). Memory ½, throughput ~1.5–2× on
# CUDA, ~1e-3 relative drift in the converged embedding.
res = DDRTree(X, ncenter=500, backend="torch", device="cuda", dtype="float32")Torch backend runs a pure-torch parallel Borůvka (O(log K) rounds)
for the MST step — no host round trip per iteration. Explicit
mst_algorithm="prim" or "kruskal" routes MST through NumPy / SciPy
for strict parity testing against the R gold standard; see
tests/test_boruvka_integration.py.
- PCA initialisation. NumPy uses
scipy.sparse.linalg.svds(iterative, Lanczos, matches R'sirlba). Torch uses a directtorch.linalg.svdfor the truncated branch — identical subspace, small per-iteration numerical drift. - K-means. Both backends call
sklearn.cluster.KMeanson CPU (small K, negligible overhead). No GPU K-means yet. - Cholesky fallback. When the
tmp_Msystem drifts non-PSD both backends fall back to LU and emit aRuntimeWarning. The Y-update Cholesky is strict in both backends — the(λ/γ)L + Γsystem is PSD by construction, any failure there is surfaced rather than masked.
The test suite runs the same inputs through the R DDRTree package and compares
the results, for both backends. Eigen-vector sign flips (inherent to
eigen-decompositions) are handled in tests. See
tests/scripts/generate_gold_standard.R, tests/test_ddrtree.py (NumPy)
and tests/test_backend_torch_gold.py (Torch).
Qi Mao, Li Wang, Steve Goodison, Yijun Sun. Dimensionality Reduction via Graph Structure Learning. KDD'15. https://dl.acm.org/doi/10.1145/2783258.2783309