feat: mixed-precision eigensolve, persistent integral cache, pre-commit hooks by thc1006 · Pull Request #5 · QuantumNoLab/qvartools

thc1006 · 2026-03-26T13:16:58Z

TLDR

Three independent improvements: GPU-accelerated eigendecomposition, joblib-based integral caching, and pre-commit hook configuration. No algorithm changes, no public API breaks. All changes are in files untouched by PR #4 (zero conflict verified).

What changed

1. Mixed-Precision Eigensolve (`_utils/gpu/linear_algebra.py`)

mixed_precision_eigh(H) — FP32 solve + FP64 Rayleigh quotient refinement.

Benchmarked on DGX Spark NVIDIA GB10 (UMA 128GB):

n	CPU FP64	GPU Mixed	Speedup
1000	0.119s	0.418s	0.3x
2000	0.570s	0.090s	6.3x
4000	4.089s	1.551s	2.6x

Handles known torch.linalg.eigh FP32 GPU bug (NaN/Inf on low-rank matrices) with automatic fallback to FP64. Small matrices (n<=64) skip mixed-precision. Output always FP64.

2. Persistent Integral Cache (`hamiltonians/integrals.py`)

cached_compute_molecular_integrals() — joblib.Memory disk caching for PySCF integral computations. Second call with same geometry/basis returns instantly from disk, saving 10-50s per molecule.

get_integral_cache(cache_dir) for custom cache location
clear_integral_cache() to remove cached data
Default: ~/.cache/qvartools/integrals (configurable via QVARTOOLS_CACHE_DIR)

3. Pre-commit Hooks (`.pre-commit-config.yaml`)

Ruff lint + format hooks compatible with both pre-commit and prek (Rust reimplementation, 5-6x faster, adopted by CPython/FastAPI). Enforces same policy locally that CI enforces remotely.

Test plan

10 mixed-precision tests (8 CPU + 2 GPU on DGX Spark)
5 persistent cache tests (with PySCF)
Full suite: 308 passed, 0 failed
ruff check + format clean (145 files)
Zero conflict with PR refactor: internal optimizations for ConnectionCache and eigensolver #4 (dry-run merge verified)

FP32 solve + FP64 Rayleigh quotient refinement. Benchmarked on DGX Spark GB10: 6.3x speedup at n=2000, 2.6x at n=4000 vs CPU FP64. Handles known torch.linalg.eigh FP32 GPU bug (NaN/Inf on low-rank matrices) with automatic fallback to FP64. Small matrices (n<=64) skip mixed-precision entirely. 10 tests: correctness, sorting, dtype, orthonormality, GPU (2 tests on DGX Spark NVIDIA GB10). Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

cached_compute_molecular_integrals() wraps compute_molecular_integrals with joblib.Memory disk caching. Second call with same geometry/basis returns instantly from disk, saving 10-50s of PySCF computation. - get_integral_cache(cache_dir) for custom cache location - clear_integral_cache(cache_dir) to remove cached data - Default cache: ~/.cache/qvartools/integrals (configurable via QVARTOOLS_CACHE_DIR env var) - 5 tests: import, correctness, cache hit, different geometry, clear Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Compatible with both pre-commit and prek (Rust reimplementation, 5-6x faster, adopted by CPython and FastAPI). Enforces the same ruff check + ruff format policy locally that CI enforces remotely — catches issues before push. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces three independent developer-facing improvements: a mixed-precision eigensolver utility (Torch-based), a persistent on-disk cache for PySCF integral computation (joblib.Memory), and a pre-commit configuration to enforce Ruff lint/format locally.

Changes:

Added mixed_precision_eigh(H) for FP32 eigensolve with FP64 Rayleigh-quotient refinement and FP64 fallback handling.
Added persistent integral caching utilities: get_integral_cache, cached_compute_molecular_integrals, and clear_integral_cache.
Added Ruff-based pre-commit hooks plus new unit tests for both features.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`src/qvartools/_utils/gpu/linear_algebra.py`	Adds `mixed_precision_eigh` mixed-precision eigendecomposition utility.
`src/qvartools/hamiltonians/integrals.py`	Adds joblib-based persistent caching helpers around `compute_molecular_integrals`.
`tests/test_utils/test_mixed_precision.py`	Adds correctness/behavior tests for `mixed_precision_eigh` (CPU + optional GPU).
`tests/test_utils/test_hamiltonian_cache.py`	Adds tests validating the integral caching and cache clearing behavior.
`.pre-commit-config.yaml`	Adds Ruff lint/format hooks for local enforcement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T13:21:48Z

+    # Symmetrize to eliminate floating-point asymmetry
+    H = 0.5 * (H + H.T)
+
+    # For small matrices or already-FP64 on CPU: direct FP64 solve
+    if n <= _MIXED_PREC_MIN_SIZE or (H.dtype == torch.float64 and device.type == "cpu"):
+        H_fp64 = H.to(dtype=torch.float64)
+        vals, vecs = torch.linalg.eigh(H_fp64)
+        order = torch.argsort(vals)
+        return vals[order].cpu(), vecs[:, order].cpu()
+
+    # --- Mixed precision path ---
+    H_fp64 = H.to(dtype=torch.float64, device=device)
+
+    # Step 1: FP32 solve for speed
+    H_fp32 = H_fp64.to(dtype=torch.float32)
+    try:
+        vals_fp32, vecs_fp32 = torch.linalg.eigh(H_fp32)
+
+        # Validate: check for NaN/Inf (known FP32 GPU bug)
+        if not torch.isfinite(vals_fp32).all() or not torch.isfinite(vecs_fp32).all():
+            raise RuntimeError("FP32 eigh produced non-finite values")
+
+    except (RuntimeError, torch.linalg.LinAlgError):
+        logger.warning("FP32 eigh failed; falling back to FP64.")
+        vals_fp64, vecs_fp64 = torch.linalg.eigh(H_fp64)
+        order = torch.argsort(vals_fp64)
+        return vals_fp64[order].cpu(), vecs_fp64[:, order].cpu()
+
+    # Step 2: Rayleigh quotient refinement in FP64
+    vecs_fp64 = vecs_fp32.to(dtype=torch.float64)
+    Hv = H_fp64 @ vecs_fp64  # (n, n) @ (n, n) in FP64
+    # E_i = v_i^T H v_i / (v_i^T v_i)
+    numerator = (vecs_fp64 * Hv).sum(dim=0)  # diag(V^T H V)
+    denominator = (vecs_fp64 * vecs_fp64).sum(dim=0)  # diag(V^T V)
+    vals_refined = numerator / denominator


mixed_precision_eigh claims to accept “any dtype”, but the implementation assumes a real symmetric matrix: it symmetrizes with H.T (no conjugate) and the Rayleigh-quotient refinement uses elementwise products without conjugation. If callers pass a complex Hermitian matrix, the refinement will be mathematically wrong. Consider either (a) explicitly rejecting complex inputs (raise with a clear error) or (b) supporting complex Hermitian inputs by using conjugate-transpose for symmetrization and v.conj() in the Rayleigh quotient.

Copilot · 2026-03-26T13:21:48Z

+    For matrices larger than 64x64, solves in float32 (leveraging TF32
+    matmul on NVIDIA GPUs for ~10x speedup), then refines eigenvalues
+    via Rayleigh quotient in float64 to recover full precision.
+
+    For small matrices or float64 input on CPU, delegates directly to
+    ``torch.linalg.eigh`` in float64.
+
+    Parameters
+    ----------
+    H : torch.Tensor
+        Real symmetric matrix, shape ``(n, n)``.  Any dtype accepted;
+        output is always float64.
+
+    Returns
+    -------
+    eigenvalues : torch.Tensor
+        Eigenvalues sorted ascending, shape ``(n,)``, dtype float64,
+        on CPU.
+    eigenvectors : torch.Tensor
+        Corresponding eigenvectors as columns, shape ``(n, n)``,
+        dtype float64, on CPU.
+
+    Notes
+    -----
+    ``torch.linalg.eigh`` has a known issue with float32 on GPU for
+    large low-rank matrices (may produce NaN/Inf).  This function
+    detects such failures and falls back to float64 automatically.
+
+    Reference: Higham & Mary, "Mixed Precision Algorithms in Numerical
+    Linear Algebra", Acta Numerica (2022).
+    """
+    n = H.shape[0]
+    device = H.device
+
+    # Symmetrize to eliminate floating-point asymmetry
+    H = 0.5 * (H + H.T)
+
+    # For small matrices or already-FP64 on CPU: direct FP64 solve
+    if n <= _MIXED_PREC_MIN_SIZE or (H.dtype == torch.float64 and device.type == "cpu"):
+        H_fp64 = H.to(dtype=torch.float64)
+        vals, vecs = torch.linalg.eigh(H_fp64)
+        order = torch.argsort(vals)
+        return vals[order].cpu(), vecs[:, order].cpu()
+
+    # --- Mixed precision path ---
+    H_fp64 = H.to(dtype=torch.float64, device=device)
+
+    # Step 1: FP32 solve for speed
+    H_fp32 = H_fp64.to(dtype=torch.float32)
+    try:


The mixed-precision path currently runs on CPU for large FP32 inputs as well (only skips when H is FP64 on CPU). On CPU there’s no TF32 acceleration benefit, so this adds extra casts/memory and does an FP64 matmul refinement, which is likely slower than a straight FP64 torch.linalg.eigh. Consider gating the mixed-precision branch to device.type == "cuda" (and otherwise always do the direct FP64 solve), or update the docstring to reflect intended CPU behavior.

Copilot · 2026-03-26T13:21:49Z

+        """Second call with same args should use cache (faster)."""
+        import time
+
+        from qvartools.hamiltonians.integrals import get_integral_cache
+
+        cache = get_integral_cache(str(tmp_path / "cache"))
+        geometry = [("H", (0.0, 0.0, 0.0)), ("H", (0.0, 0.0, 0.74))]
+
+        # First call — computes
+        t0 = time.perf_counter()
+        r1 = cache(geometry, basis="sto-3g")
+        t_first = time.perf_counter() - t0
+
+        # Second call — should be cached
+        t0 = time.perf_counter()
+        r2 = cache(geometry, basis="sto-3g")
+        t_second = time.perf_counter() - t0
+
+        # Cached call should be significantly faster
+        assert t_second < t_first or t_first < 0.1  # very fast compute is OK too


test_second_call_is_cached relies on wall-clock timing (t_second < t_first) to assert caching, which is inherently flaky on shared/variable CI environments. Since this is specifically testing joblib.Memory behavior, it should use a deterministic signal (e.g., joblib’s check_call_in_cache(...) / cache metadata inspection, or verifying the cache directory contents change only on the first call) rather than performance measurements.

Suggested change

"""Second call with same args should use cache (faster)."""

import time

from qvartools.hamiltonians.integrals import get_integral_cache

cache = get_integral_cache(str(tmp_path / "cache"))

geometry = [("H", (0.0, 0.0, 0.0)), ("H", (0.0, 0.0, 0.74))]

# First call — computes

t0 = time.perf_counter()

r1 = cache(geometry, basis="sto-3g")

t_first = time.perf_counter() - t0

# Second call — should be cached

t0 = time.perf_counter()

r2 = cache(geometry, basis="sto-3g")

t_second = time.perf_counter() - t0

# Cached call should be significantly faster

assert t_second < t_first or t_first < 0.1 # very fast compute is OK too

"""Second call with same args should use cache (reuse on-disk data)."""

import pathlib

from qvartools.hamiltonians.integrals import get_integral_cache

cache_dir = tmp_path / "cache"

cache = get_integral_cache(str(cache_dir))

geometry = [("H", (0.0, 0.0, 0.0)), ("H", (0.0, 0.0, 0.74))]

# First call — should create cache entries on disk

r1 = cache(geometry, basis="sto-3g")

cache_files_after_first = sorted(

p.relative_to(cache_dir)

for p in pathlib.Path(cache_dir).rglob("*")

if p.is_file()

)

# Ensure that something was actually cached

assert len(cache_files_after_first) > 0

# Second call — should reuse existing cache entries (no new files)

r2 = cache(geometry, basis="sto-3g")

cache_files_after_second = sorted(

p.relative_to(cache_dir)

for p in pathlib.Path(cache_dir).rglob("*")

if p.is_file()

)

assert cache_files_after_second == cache_files_after_first

Copilot · 2026-03-26T13:21:49Z

+            # joblib may leave directory but should have no cached results
+            cached_files = list(cache_path.rglob("*.npy"))


test_clear_cache only checks for leftover *.npy files, but joblib caches may store results as other file types (e.g., pickles/metadata). This can let the test pass even if cached data remains. Consider asserting that the cache directory does not exist after clear_integral_cache, or that it contains no files of any kind (excluding empty directories).

Suggested change

# joblib may leave directory but should have no cached results

cached_files = list(cache_path.rglob("*.npy"))

# joblib may leave directory but should have no cached results of any type

cached_files = [p for p in cache_path.rglob("*") if p.is_file()]

mixed_precision_eigh: - Reject complex input with TypeError (was silently wrong) - Gate mixed-precision to CUDA only (no benefit on CPU) - Enable TF32 tensor cores during FP32 solve (Blackwell/Hopper) - Restore TF32 setting in finally block - Move import torch to module top (remove noqa: E402) - Fix docstring: real benchmarks (6.3x not ~10x) integrals.py: - Move logging/os/shutil imports to file top - Add safety check in clear_integral_cache (refuse to delete directories without 'qvartools' or 'cache' in path) Tests: - Add complex input rejection test - Replace timing-based cache test with deterministic file check - Fix clear_cache test to check all files (not just *.npy) - Use tmp_path in all cache tests (no home dir pollution) Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

George930502 · 2026-03-26T14:47:48Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Critical fixes: - #1/#2: Use persistent prev_coeffs/prev_batch_configs for PT2 scoring (was using best_coeffs before it was defined in current iteration) - QuantumNoLab#5: Eviction checks cumulative_basis size (was checking best_batch_configs) - QuantumNoLab#7: Convert numpy indices to torch.LongTensor before indexing CUDA tensor - QuantumNoLab#8: Default energy_weight=0.0, entropy_weight=0.0 (was 0.1/0.05, silently changing existing behavior) Lower priority fixes: - QuantumNoLab#6: CIPSI docstring mentions build_sparse_hamiltonian requirement - QuantumNoLab#9: Warn when energy_weight>0 but hamiltonian=None Not applicable (won't fix): - QuantumNoLab#3/QuantumNoLab#4: config_integer_hash returns tuple for n_sites>=64; our CAS max is 58Q (<64), so int cast is safe Co-authored-by: leo07010 <leo07010@gmail.com>

thc1006 added 3 commits March 26, 2026 21:12

Copilot AI review requested due to automatic review settings March 26, 2026 13:16

Copilot started reviewing on behalf of thc1006 March 26, 2026 13:17 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

This was referenced Mar 26, 2026

Research: low-precision acceleration for projected Hamiltonian construction and eigensolve #7

Open

chore: GPU memory cleanup between pipeline stages + module-scope test fixtures #8

Merged

George930502 merged commit 1f140be into QuantumNoLab:main Mar 26, 2026
5 checks passed

thc1006 deleted the feat/mixed-precision-and-cache branch March 26, 2026 15:17

thc1006 mentioned this pull request Mar 28, 2026

Refactor: eliminate unnecessary torch/numpy/cupy conversions in pipeline data flow #12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mixed-precision eigensolve, persistent integral cache, pre-commit hooks#5

feat: mixed-precision eigensolve, persistent integral cache, pre-commit hooks#5
George930502 merged 4 commits into
QuantumNoLab:mainfrom
thc1006:feat/mixed-precision-and-cache

thc1006 commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

George930502 commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# joblib may leave directory but should have no cached results
		cached_files = list(cache_path.rglob("*.npy"))

Conversation

thc1006 commented Mar 26, 2026

TLDR

What changed

1. Mixed-Precision Eigensolve (_utils/gpu/linear_algebra.py)

2. Persistent Integral Cache (hamiltonians/integrals.py)

3. Pre-commit Hooks (.pre-commit-config.yaml)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

George930502 commented Mar 26, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Mixed-Precision Eigensolve (`_utils/gpu/linear_algebra.py`)

2. Persistent Integral Cache (`hamiltonians/integrals.py`)

3. Pre-commit Hooks (`.pre-commit-config.yaml`)