refactor: internal optimizations for ConnectionCache and eigensolver by thc1006 · Pull Request #4 · QuantumNoLab/qvartools

thc1006 · 2026-03-26T12:14:07Z

TLDR

Six internal optimizations to ConnectionCache and eigensolver dispatch. No public API changes, all existing imports remain valid. 317 tests pass, 42% coverage.

What changed

ConnectionCache (4 improvements)

Unified duplicate implementations -- The standalone ConnectionCache in loss_functions.py is replaced with a re-export from _utils.hashing.connection_cache. Added get_or_compute(config, hamiltonian) method that auto-computes on cache miss, eliminating the need for a separate class.
FIFO to LRU eviction -- get() now promotes entries to most-recently-used. Eviction removes the least-recently-used entry instead of the oldest inserted. This keeps frequently accessed configs (e.g., HF state) in cache.
Powers tensor caching -- The powers-of-2 tensor used for hashing is now computed once and reused across all get()/put() calls, eliminating per-call tensor allocation overhead.
Batch encoding -- Added hash_batch(configs) (single matmul) and get_batch(configs) for vectorized cache operations on batches of configurations.

Eigensolver (2 improvements)

Adaptive Davidson dispatch -- solve_generalized_eigenvalue() now accepts a davidson_threshold parameter (default 500). For dense matrices with n > threshold, the Davidson iterative solver is used automatically instead of full O(n^3) dense eigh. Handles generalized eigenvalue problems via Cholesky transform of S, with graceful fallback to dense eigh if Cholesky or Davidson fails.
Sparse-aware GPU path -- _solve_gpu() now checks if input is sparse before attempting GPU solve. Sparse matrices use cupyx.scipy.sparse.linalg.eigsh instead of densifying first, preventing OOM on large projected Hamiltonians.

Test plan

19 new tests across ConnectionCache and eigensolver modules
Full suite: 317 passed, 0 failed, 42% coverage
ruff check + ruff format clean (156 files)
Backward compatibility verified: all existing imports work, loss_functions.ConnectionCache is same class as _utils.hashing.ConnectionCache
davidson_threshold defaults to 500, preserving existing behavior for small matrices

- Add get_or_compute(config, hamiltonian) for auto-compute-on-miss pattern, unifying the two ConnectionCache interfaces - Change eviction from FIFO to LRU: get() promotes entry to most-recently-used, eviction removes least-recently-used - Cache powers-of-2 tensor across calls instead of rebuilding per hash - Add 7 new tests: get_or_compute (4), LRU eviction (2), powers (1) - All 300 tests pass, 0 regressions Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Vectorized batch hashing via single matmul (configs * powers).sum(). get_batch returns list of cached results with LRU promotion. 5 new tests covering shape, consistency, uniqueness, and batch lookup. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Remove the standalone ConnectionCache class from loss_functions.py and re-export from the canonical _utils.hashing.connection_cache module. Update call site to use get_or_compute() instead of the old get(config, hamiltonian) interface. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Add davidson_threshold parameter to solve_generalized_eigenvalue(). For dense matrices with n > threshold (default 500), automatically use the Davidson iterative solver instead of full O(n^3) dense eigh. Handles generalized eigenvalue problems via Cholesky transform of S, with graceful fallback to dense eigh if Cholesky or Davidson fails. 4 new tests covering adaptive dispatch, threshold parameter, and correctness verification against numpy reference. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

When _solve_gpu receives sparse H or S, it now attempts CuPy sparse eigsh before falling back to dense GPU eigh. This prevents OOM on large systems where H is sparse (e.g., 10K x 10K projected Hamiltonian). 3 new tests covering sparse input handling, fallback paths, and dense GPU correctness. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Copilot

Pull request overview

Refactors and optimizes internal connection caching and eigensolver dispatch to reduce redundant computation, improve cache efficiency, and better handle large/sparse eigenproblems without changing import paths.

Changes:

Replaced the inlined ConnectionCache in loss_functions.py with a re-exported implementation and updated local-energy computation to use get_or_compute.
Enhanced ConnectionCache with LRU eviction, cached powers tensor, and batch hashing/lookup helpers.
Added adaptive Davidson dispatch for large dense problems and a sparse-aware GPU path for eigenvalue solves, plus expanded test coverage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/qvartools/_utils/hashing/connection_cache.py`	Implements LRU cache behavior, powers tensor reuse, and batch hash/lookup APIs.
`src/qvartools/flows/training/loss_functions.py`	Switches to the shared `ConnectionCache` and uses `get_or_compute` in local energy computation.
`src/qvartools/diag/eigen/eigenvalue.py`	Adds Davidson-based dispatch for large dense matrices and sparse-aware GPU solving.
`tests/test_utils/test_connection_cache.py`	Adds tests for `get_or_compute`, LRU behavior, powers caching, and batch APIs.
`tests/test_diag/test_eigensolver.py`	Adds tests for adaptive Davidson dispatch and sparse-aware GPU behavior/fallbacks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T12:20:42Z

@@ -31,81 +32,6 @@
 ]


loss_functions.ConnectionCache is still exported in __all__, but it now re-exports _utils.hashing.connection_cache.ConnectionCache, whose get() signature/behavior differs from the previously inlined class (the old get(config, hamiltonian) always computed on miss). This is a breaking change for callers that used ConnectionCache.get(config, hamiltonian), despite the PR description claiming no public API changes. Consider keeping a thin compatibility wrapper in this module (or making ConnectionCache.get accept an optional hamiltonian and compute on miss, delegating to get_or_compute).

- Cache powers tensor on target device to avoid repeated CPU->GPU copies - Use matmul (configs @ powers) instead of elementwise multiply+sum in hash_batch for better BLAS/cuBLAS utilization - Replace O(n^2) np.allclose(S, np.eye(n)) identity check with O(n) diagonal + off-diagonal norm check to avoid allocating full identity - Fix inconsistent order[:k] slice in _solve_gpu sparse path Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

1. Sparse GPU failure now returns None instead of densifying (OOM risk) 2. Add n_sites >= 64 overflow guard with clear error message 3. ConnectionCache.get() accepts optional hamiltonian for backward compat with the old loss_functions.ConnectionCache.get(config, hamiltonian) 4. Davidson dispatch only activates when which="SA" (smallest algebraic) 5. Thread which parameter through _solve_gpu to sparse GPU eigsh path Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

hash_batch bypassed _hash's overflow check. Add the same _MAX_SITES_INT64 guard to prevent silent int64 overflow for n_sites >= 64. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T12:34:54Z

+        # Transform to standard form: L^{-1} H L^{-T} where S = L L^T
+        try:
+            L = scipy.linalg.cholesky(S_dense, lower=True)
+            L_inv = scipy.linalg.solve_triangular(L, np.eye(L.shape[0]), lower=True)
+            H_work = L_inv @ H_dense @ L_inv.T
+        except scipy.linalg.LinAlgError:


The generalized-to-standard transform forms an explicit inverse L_inv by solving against a dense identity, then does two dense matmuls. For n > davidson_threshold, this allocates O(n^2) extra memory and adds an O(n^3) step that can dominate runtime, undermining the purpose of Davidson. Prefer applying L^{-1} / L^{-T} via solve_triangular (or cho_solve) without constructing L_inv/np.eye(n).

Copilot · 2026-03-26T12:34:55Z

+from qvartools._utils.hashing.connection_cache import ConnectionCache
 from qvartools.hamiltonians.hamiltonian import Hamiltonian



Switching loss_functions.ConnectionCache to the _utils.hashing.connection_cache.ConnectionCache introduces a hard limit of 63 sites (it raises ValueError on larger configs). The previous in-file cache keyed by tuple(config.tolist()) supported arbitrary lengths, so this can break physics-guided training for larger Hamiltonians. Consider adding an overflow-safe key path (e.g., reuse config_integer_hash/tuple keys for n_sites>=64) or keeping the old tuple-key fallback to preserve behavior.

Copilot · 2026-03-26T12:34:55Z

+    _MAX_SITES_INT64 = 63  # int64 can represent 2^0 .. 2^62 without overflow
+
+    def _hash(self, config: torch.Tensor) -> int:
+        """Convert a binary configuration to an integer hash.

-    def get(self, config: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor] | None:
+        Raises
+        ------
+        ValueError
+            If ``n_sites`` exceeds 63 (int64 overflow).  For larger
+            systems use :func:`config_integer_hash` which splits into
+            ``(hi, lo)`` tuples.
+        """
+        n = config.shape[0]
+        if n > self._MAX_SITES_INT64:
+            raise ValueError(
+                f"ConnectionCache supports at most {self._MAX_SITES_INT64} sites "
+                f"(got {n}). Use config_integer_hash for larger systems."
+            )
+        powers = self._get_powers(n, config.device)


_hash() now raises for n_sites > 63, but the docstring suggests using config_integer_hash for larger systems while ConnectionCache itself cannot store tuple keys. If this cache is used for fermionic problems with >=64 sites, this will be a runtime break. Consider extending the cache key type to int | tuple[int,int] (or using config_integer_hash internally) so large systems remain supported without overflow.

1. Replace explicit L_inv with solve_triangular in Davidson Cholesky transform — avoids O(n^2) np.eye allocation and extra O(n^3) matmul 2. Reduce Davidson test matrix from 800x800 to 300x300 for faster CI 3. Fall back to tuple keys for n_sites >= 64 instead of raising ValueError — preserves backward compatibility with the old loss_functions.ConnectionCache that used tuple(config.tolist()) 4. Update hash_batch and get_batch to handle tuple key fallback Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

George930502 · 2026-03-26T14:47:53Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

Previously deferred 3 items — all now fixed: #1: Cr₂ singlet test: assert e_core in [-2040, -2030] Ha range (septet would give different e_core, catches missing fix_spin_) QuantumNoLab#4: get_integral_cache() now returns CAS-aware wrapper that bypasses joblib when cas is not None (was a real cache-escape bug) QuantumNoLab#7: Add test_cache_actually_bypassed_for_cas using mock.patch to verify compute_molecular_integrals is called directly Also from new comment: QuantumNoLab#13: Remove dead tuple/list branch for nelecas in _compute_cas_integrals (public API is tuple[int, int], nelecas is always int)

Critical fixes: - #1/#2: Use persistent prev_coeffs/prev_batch_configs for PT2 scoring (was using best_coeffs before it was defined in current iteration) - QuantumNoLab#5: Eviction checks cumulative_basis size (was checking best_batch_configs) - QuantumNoLab#7: Convert numpy indices to torch.LongTensor before indexing CUDA tensor - QuantumNoLab#8: Default energy_weight=0.0, entropy_weight=0.0 (was 0.1/0.05, silently changing existing behavior) Lower priority fixes: - QuantumNoLab#6: CIPSI docstring mentions build_sparse_hamiltonian requirement - QuantumNoLab#9: Warn when energy_weight>0 but hamiltonian=None Not applicable (won't fix): - QuantumNoLab#3/QuantumNoLab#4: config_integer_hash returns tuple for n_sites>=64; our CAS max is 58Q (<64), so int cast is safe Co-authored-by: leo07010 <leo07010@gmail.com>

Copilot fixes: - #1: top_idx on same device as unique_new (CUDA compat) - #2: use prev_energy (consistent with prev_coeffs eigenvector) - QuantumNoLab#3: eviction log only when eviction actually occurred - QuantumNoLab#4: remove dead test code, clarify as smoke test New: - 64+Q hash support (remove int() casts, use hash values directly) - Cr₂-CAS(12,32) @ 64Q and Cr₂-CAS(12,36) @ 72Q registry entries - Both MOLECULE_REGISTRY and _MOLECULE_INFO_REGISTRY synced (26 total) Co-authored-by: leo07010 <leo07010@gmail.com>

thc1006 added 5 commits March 26, 2026 20:05

Copilot AI review requested due to automatic review settings March 26, 2026 12:14

Copilot started reviewing on behalf of thc1006 March 26, 2026 12:14 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

thc1006 added 3 commits March 26, 2026 20:22

fix: add n_sites overflow guard to hash_batch

83d0b6d

hash_batch bypassed _hash's overflow check. Add the same _MAX_SITES_INT64 guard to prevent silent int64 overflow for n_sites >= 64. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>

thc1006 requested a review from Copilot March 26, 2026 12:29

Copilot started reviewing on behalf of thc1006 March 26, 2026 12:29 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

thc1006 requested a review from George930502 March 26, 2026 12:54

This was referenced Mar 26, 2026

feat: mixed-precision eigensolve, persistent integral cache, pre-commit hooks #5

Merged

chore: GPU memory cleanup between pipeline stages + module-scope test fixtures #8

Merged

George930502 merged commit c29917e into QuantumNoLab:main Mar 26, 2026
5 checks passed

thc1006 deleted the refactor/internal-optimizations branch March 26, 2026 15:17

thc1006 mentioned this pull request Mar 26, 2026

refactor: redesign pipeline groups 05/07/08, expand to 24 pipelines #9

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: internal optimizations for ConnectionCache and eigensolver#4

refactor: internal optimizations for ConnectionCache and eigensolver#4
George930502 merged 9 commits into
QuantumNoLab:mainfrom
thc1006:refactor/internal-optimizations

thc1006 commented Mar 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

Copilot AI Mar 26, 2026

Uh oh!

George930502 commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		from qvartools._utils.hashing.connection_cache import ConnectionCache
		from qvartools.hamiltonians.hamiltonian import Hamiltonian

Conversation

thc1006 commented Mar 26, 2026

TLDR

What changed

ConnectionCache (4 improvements)

Eigensolver (2 improvements)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

George930502 commented Mar 26, 2026

Code review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants