Skip to content

akaiHuang/mlx-qre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlx-qre

Quantum Relative Entropy + Petz Recovery toolkit on Apple Silicon via MLX.

$$\Sigma = D(\rho | \sigma) = \mathrm{Tr}[\rho(\ln\rho - \ln\sigma)]$$

A complete, self-contained library for quantum information quantities (QRE, von Neumann entropy, Rényi / JSD), quantum channels (thermal attenuator, depolarizing, dephasing) and Petz recovery analysis (recovery map, fidelity, retrodiction). Built as the computational companion to the Σ = 2 ln Q entropy-production framework — see petz-recovery-unification and tau-chrono.

Performance note. Eigendecomposition runs on the Metal GPU via MLX. For small matrices (N < 500), NumPy + Accelerate (CPU) is typically faster due to lower dispatch overhead — use NumPy if you only need a handful of small QREs. The GPU path becomes useful for batched evaluation and N ≥ 500, where it pulls ahead of NumPy. See benchmark_results.md for the full breakdown.

Installation

pip install -e .

Requires Python 3.10+ and Apple Silicon (M1/M2/M3/M4).

Quick Start

import mlx.core as mx
from mlx_qre import quantum_relative_entropy, random_density_matrix

# Two 100x100 density matrices on GPU
rho = random_density_matrix(100)
sigma = random_density_matrix(100)

# Compute D(rho || sigma) — eigendecomposition runs on Metal GPU
D = quantum_relative_entropy(rho, sigma)
mx.eval(D)
print(f"D(rho || sigma) = {D.item():.6f}")

# Batched: 500 pairs simultaneously
rho_batch = random_density_matrix(50, batch_size=500)
sigma_batch = random_density_matrix(50, batch_size=500)
D_batch = quantum_relative_entropy(rho_batch, sigma_batch)

Stochastic Lanczos backend (large N)

For N >= 1000 the O(N^3) eigendecomposition is the bottleneck and the Metal eigh kernel can hit GPU command-buffer timeouts past N = 2000. mlx-qre ships a Stochastic Lanczos Quadrature (SLQ) estimator that replaces the eigendecomposition with O(k * m * N^2) matvecs.

This implementation runs entirely on MLX (no NumPy fallback in the hot path). All m probe vectors are stacked as a single (N, m) matrix and each Lanczos step is a single block matmul A @ V, which both amortises GPU dispatch overhead and lets MLX tile the m probes across the GPU. Only the small (k, k) tridiagonal eigh is delegated to MLX's CPU eigh stream (k <= 30 -- CPU is the right place for it).

import mlx.core as mx
from mlx_qre import (
    random_density_matrix,
    quantum_relative_entropy,
    von_neumann_entropy_lanczos,
    quantum_relative_entropy_lanczos,
)

rho = random_density_matrix(2000)
sigma = random_density_matrix(2000)

# Direct API
S = von_neumann_entropy_lanczos(rho, k=25, m=20, seed=0)
D = quantum_relative_entropy_lanczos(rho, sigma, k=25, m=20, seed=0)

# Or via the main API with method="lanczos"
D_mx = quantum_relative_entropy(rho, sigma, method="lanczos", k=25, m=20, seed=0)

Typical accuracy at default k=25, m=20:

  • S(rho): ~1-2% median relative error (clean SLQ on a single matrix)
  • D(rho || sigma): ~3-7% median relative error (cross-term has higher variance because Tr[rho ln sigma] is not a single-matrix spectral function). Increase m for tighter accuracy.

Speed crossover (M1 Max, dense complex Haar-random density matrices, SLQ at k=25, m=20):

N MLX eigh (ms) NumPy eigh (ms) SLQ pure-MLX (ms)
100 4 4 19
500 160 287 20
1000 1077 1988 24
2000 timeout 24048 37

SLQ wins from N >= 500 and the gap blows up at N >= 1000 (~50-80x vs MLX eigh, ~80-660x vs NumPy eigh). The pure-MLX block matvec hot path is also ~30-140x faster than the previous NumPy-Accelerate hot path on the same machine. See benchmark_results.md for the full breakdown.

Features

Module Function Description
qre quantum_relative_entropy(rho, sigma, method="exact"|"lanczos") D(rho || sigma) via Metal eigendecomposition or SLQ
qre von_neumann_entropy(rho) S(rho) = -Tr[rho ln rho]
qre relative_entropy_pure_state(psi, sigma) Efficient D for pure states: -ln(psi|sigma|psi)
lanczos von_neumann_entropy_lanczos(rho, k, m) S(rho) via Stochastic Lanczos Quadrature
lanczos quantum_relative_entropy_lanczos(rho, sigma, k, m) D via Hutchinson + Lanczos
lanczos stochastic_lanczos_logtr(A, k, m) Tr[A ln A] for any Hermitian PSD A
classical kl_divergence(p, q) Classical KL divergence
classical jensen_shannon_divergence(p, q) Symmetric JSD
classical renyi_divergence(p, q, alpha) Renyi divergence of order alpha
channels thermal_attenuator(eta) Gravitational channel eta = -g_00
channels channel_entropy_production(K, rho, sigma) Sigma through channel
channels depolarizing_channel(p) Depolarizing noise
channels dephasing_channel(gamma) Dephasing noise
petz petz_recovery_map(K, sigma) Construct Petz recovery R
petz petz_recovery_fidelity(rho, sigma, K) F(rho, R o N(rho))
petz verify_petz_bound(rho, sigma, K) Check F >= exp(-Sigma/2)
petz retrodiction_quality(rho, sigma, K) tau = 1 - F

Use Cases

  • Gravitational entropy production: Sigma_grav = D(N_eta(rho) || N_eta(sigma)) with eta = 1/Q^2
  • Quantum channel analysis: entropy production, data processing inequality
  • Petz recovery bounds: F >= exp(-Sigma/2), retrodiction quality
  • Quantum ML: kernel methods using QRE as a distance measure
  • Neural entropy: EEG/neural signal entropy production analysis

Benchmark

python -m mlx_qre.benchmark

Compares MLX (Apple Silicon GPU) vs NumPy (CPU) across matrix sizes N = 10 to 1000. Summary on M1 Max:

N MLX (ms) NumPy (ms) MLX vs NumPy
10 0.74 0.04 0.06× (NumPy wins)
100 3.95 3.57 0.90×
500 158 278 1.76×
1000 1042 2010 1.93×

For batched QRE at N=100 the GPU sustains ~460 pairs/sec. Use NumPy for one-off small problems; reach for mlx-qre for batched / large-N work and for the integrated Petz / channel utilities below. See benchmark_results.md for the full table.

Tests

pip install -e ".[dev]"
pytest tests/ -v

Theory

The quantum relative entropy D(rho || sigma) is the quantum generalization of KL divergence. In the retrocausality framework:

  • Sigma = 2 ln Q: unified entropy production formula
  • Petz bound: F >= exp(-Sigma/2) quantifies retrodiction cost
  • tau = 1 - F: retrodiction deficit (0 = perfect, 1 = irreversible)
  • Zero-entropy limit: Sigma -> 0 implies perfect retrodiction (no time arrow)

License

MIT License. Copyright (c) 2026 Sheng-Kai Huang.

About

GPU-accelerated Quantum Relative Entropy for Apple Silicon (MLX)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages