Research: low-precision acceleration for projected Hamiltonian construction and eigensolve

## Background

Mixed-precision eigenvalue decomposition is constrained by the forward error bound: `|λ_computed - λ_true| >= ε_machine × κ(λ)`. For chemical accuracy (1.6 mHa), FP16 (ε ≈ 1e-3) and below are fundamentally insufficient for direct eigensolve when ||H|| ≈ O(1-100) Ha.

However, low precision CAN accelerate specific stages of the computation while preserving FP64 final accuracy.

## Research directions

### P1: FP32 projected Hamiltonian construction

Build H_proj in FP32 instead of FP64 (50% memory reduction). Only upcast to FP64 before eigensolve. Validated by CIM-QS(H)CI (arXiv:2603.13160, March 2026).

Target: `matrix_elements_fast()` in `hamiltonians/molecular/hamiltonian.py`

### P2: Chebyshev filtered subspace iteration with TF32/BF16

Replace `scipy.sparse.linalg.expm_multiply` in SKQD with Chebyshev polynomial filtering using TF32 tensor cores for matvec. R-ChFSI (arXiv:2503.22652) demonstrated 2.1x speedup with BF16 communication and TF32 filtering, with tolerance to inexact matvec.

### P3: Randomized basis selection with FP16 projection

Use FP16 random projection for initial basis selection (noise-tolerant by construction), then FP64 for the small dense eigenvalue problem. Theoretical foundation in arXiv:2601.19250 (Jan 2026).

### P4: cuSOLVER BF16x9 math mode

cuSOLVER 13.2 supports `CUSOLVER_FP32_EMULATED_BF16X9_MATH` for syevd. Internal GEMMs use 9x BF16 tensor core operations to emulate FP32 accuracy with higher throughput on Blackwell GPUs. Requires calling cuSOLVER directly (not exposed via PyTorch).

## What's already done

- `mixed_precision_eigh` (PR #5): FP32 solve + FP64 Rayleigh quotient refinement with TF32 enabled. 6.3x speedup on DGX Spark GB10 at n=2000.

## References

- Higham & Mary, "Mixed Precision Algorithms in Numerical Linear Algebra", Acta Numerica (2022)
- CIM-QS(H)CI (arXiv:2603.13160) — FP32 Hamiltonian construction validated
- R-ChFSI (arXiv:2503.22652) — TF32/BF16 Chebyshev filtering, 2.1x speedup
- JCTC 2026 — BF16 preconditioning for DFT eigensolver on AI-focused GPUs
- Xu et al. (arXiv:2601.19250) — Precision-adaptive randomized SVD
- cuSOLVER 13.2 docs — BF16x9 emulated math mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research: low-precision acceleration for projected Hamiltonian construction and eigensolve #7

Background

Research directions

P1: FP32 projected Hamiltonian construction

P2: Chebyshev filtered subspace iteration with TF32/BF16

P3: Randomized basis selection with FP16 projection

P4: cuSOLVER BF16x9 math mode

What's already done

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Research: low-precision acceleration for projected Hamiltonian construction and eigensolve #7

Description

Background

Research directions

P1: FP32 projected Hamiltonian construction

P2: Chebyshev filtered subspace iteration with TF32/BF16

P3: Randomized basis selection with FP16 projection

P4: cuSOLVER BF16x9 math mode

What's already done

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions