refactor: internal optimizations for ConnectionCache and eigensolver#4
Conversation
- Add get_or_compute(config, hamiltonian) for auto-compute-on-miss pattern, unifying the two ConnectionCache interfaces - Change eviction from FIFO to LRU: get() promotes entry to most-recently-used, eviction removes least-recently-used - Cache powers-of-2 tensor across calls instead of rebuilding per hash - Add 7 new tests: get_or_compute (4), LRU eviction (2), powers (1) - All 300 tests pass, 0 regressions Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Vectorized batch hashing via single matmul (configs * powers).sum(). get_batch returns list of cached results with LRU promotion. 5 new tests covering shape, consistency, uniqueness, and batch lookup. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Remove the standalone ConnectionCache class from loss_functions.py and re-export from the canonical _utils.hashing.connection_cache module. Update call site to use get_or_compute() instead of the old get(config, hamiltonian) interface. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Add davidson_threshold parameter to solve_generalized_eigenvalue(). For dense matrices with n > threshold (default 500), automatically use the Davidson iterative solver instead of full O(n^3) dense eigh. Handles generalized eigenvalue problems via Cholesky transform of S, with graceful fallback to dense eigh if Cholesky or Davidson fails. 4 new tests covering adaptive dispatch, threshold parameter, and correctness verification against numpy reference. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
When _solve_gpu receives sparse H or S, it now attempts CuPy sparse eigsh before falling back to dense GPU eigh. This prevents OOM on large systems where H is sparse (e.g., 10K x 10K projected Hamiltonian). 3 new tests covering sparse input handling, fallback paths, and dense GPU correctness. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Refactors and optimizes internal connection caching and eigensolver dispatch to reduce redundant computation, improve cache efficiency, and better handle large/sparse eigenproblems without changing import paths.
Changes:
- Replaced the inlined
ConnectionCacheinloss_functions.pywith a re-exported implementation and updated local-energy computation to useget_or_compute. - Enhanced
ConnectionCachewith LRU eviction, cached powers tensor, and batch hashing/lookup helpers. - Added adaptive Davidson dispatch for large dense problems and a sparse-aware GPU path for eigenvalue solves, plus expanded test coverage.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/qvartools/_utils/hashing/connection_cache.py |
Implements LRU cache behavior, powers tensor reuse, and batch hash/lookup APIs. |
src/qvartools/flows/training/loss_functions.py |
Switches to the shared ConnectionCache and uses get_or_compute in local energy computation. |
src/qvartools/diag/eigen/eigenvalue.py |
Adds Davidson-based dispatch for large dense matrices and sparse-aware GPU solving. |
tests/test_utils/test_connection_cache.py |
Adds tests for get_or_compute, LRU behavior, powers caching, and batch APIs. |
tests/test_diag/test_eigensolver.py |
Adds tests for adaptive Davidson dispatch and sparse-aware GPU behavior/fallbacks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -31,81 +32,6 @@ | |||
| ] | |||
There was a problem hiding this comment.
loss_functions.ConnectionCache is still exported in __all__, but it now re-exports _utils.hashing.connection_cache.ConnectionCache, whose get() signature/behavior differs from the previously inlined class (the old get(config, hamiltonian) always computed on miss). This is a breaking change for callers that used ConnectionCache.get(config, hamiltonian), despite the PR description claiming no public API changes. Consider keeping a thin compatibility wrapper in this module (or making ConnectionCache.get accept an optional hamiltonian and compute on miss, delegating to get_or_compute).
- Cache powers tensor on target device to avoid repeated CPU->GPU copies - Use matmul (configs @ powers) instead of elementwise multiply+sum in hash_batch for better BLAS/cuBLAS utilization - Replace O(n^2) np.allclose(S, np.eye(n)) identity check with O(n) diagonal + off-diagonal norm check to avoid allocating full identity - Fix inconsistent order[:k] slice in _solve_gpu sparse path Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
1. Sparse GPU failure now returns None instead of densifying (OOM risk) 2. Add n_sites >= 64 overflow guard with clear error message 3. ConnectionCache.get() accepts optional hamiltonian for backward compat with the old loss_functions.ConnectionCache.get(config, hamiltonian) 4. Davidson dispatch only activates when which="SA" (smallest algebraic) 5. Thread which parameter through _solve_gpu to sparse GPU eigsh path Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
hash_batch bypassed _hash's overflow check. Add the same _MAX_SITES_INT64 guard to prevent silent int64 overflow for n_sites >= 64. Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Transform to standard form: L^{-1} H L^{-T} where S = L L^T | ||
| try: | ||
| L = scipy.linalg.cholesky(S_dense, lower=True) | ||
| L_inv = scipy.linalg.solve_triangular(L, np.eye(L.shape[0]), lower=True) | ||
| H_work = L_inv @ H_dense @ L_inv.T | ||
| except scipy.linalg.LinAlgError: |
There was a problem hiding this comment.
The generalized-to-standard transform forms an explicit inverse L_inv by solving against a dense identity, then does two dense matmuls. For n > davidson_threshold, this allocates O(n^2) extra memory and adds an O(n^3) step that can dominate runtime, undermining the purpose of Davidson. Prefer applying L^{-1} / L^{-T} via solve_triangular (or cho_solve) without constructing L_inv/np.eye(n).
| from qvartools._utils.hashing.connection_cache import ConnectionCache | ||
| from qvartools.hamiltonians.hamiltonian import Hamiltonian | ||
|
|
There was a problem hiding this comment.
Switching loss_functions.ConnectionCache to the _utils.hashing.connection_cache.ConnectionCache introduces a hard limit of 63 sites (it raises ValueError on larger configs). The previous in-file cache keyed by tuple(config.tolist()) supported arbitrary lengths, so this can break physics-guided training for larger Hamiltonians. Consider adding an overflow-safe key path (e.g., reuse config_integer_hash/tuple keys for n_sites>=64) or keeping the old tuple-key fallback to preserve behavior.
| _MAX_SITES_INT64 = 63 # int64 can represent 2^0 .. 2^62 without overflow | ||
|
|
||
| def _hash(self, config: torch.Tensor) -> int: | ||
| """Convert a binary configuration to an integer hash. | ||
|
|
||
| def get(self, config: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor] | None: | ||
| Raises | ||
| ------ | ||
| ValueError | ||
| If ``n_sites`` exceeds 63 (int64 overflow). For larger | ||
| systems use :func:`config_integer_hash` which splits into | ||
| ``(hi, lo)`` tuples. | ||
| """ | ||
| n = config.shape[0] | ||
| if n > self._MAX_SITES_INT64: | ||
| raise ValueError( | ||
| f"ConnectionCache supports at most {self._MAX_SITES_INT64} sites " | ||
| f"(got {n}). Use config_integer_hash for larger systems." | ||
| ) | ||
| powers = self._get_powers(n, config.device) |
There was a problem hiding this comment.
_hash() now raises for n_sites > 63, but the docstring suggests using config_integer_hash for larger systems while ConnectionCache itself cannot store tuple keys. If this cache is used for fermionic problems with >=64 sites, this will be a runtime break. Consider extending the cache key type to int | tuple[int,int] (or using config_integer_hash internally) so large systems remain supported without overflow.
1. Replace explicit L_inv with solve_triangular in Davidson Cholesky transform — avoids O(n^2) np.eye allocation and extra O(n^3) matmul 2. Reduce Davidson test matrix from 800x800 to 300x300 for faster CI 3. Fall back to tuple keys for n_sites >= 64 instead of raising ValueError — preserves backward compatibility with the old loss_functions.ConnectionCache that used tuple(config.tolist()) 4. Update hash_batch and get_batch to handle tuple key fallback Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
Previously deferred 3 items — all now fixed: #1: Cr₂ singlet test: assert e_core in [-2040, -2030] Ha range (septet would give different e_core, catches missing fix_spin_) QuantumNoLab#4: get_integral_cache() now returns CAS-aware wrapper that bypasses joblib when cas is not None (was a real cache-escape bug) QuantumNoLab#7: Add test_cache_actually_bypassed_for_cas using mock.patch to verify compute_molecular_integrals is called directly Also from new comment: QuantumNoLab#13: Remove dead tuple/list branch for nelecas in _compute_cas_integrals (public API is tuple[int, int], nelecas is always int)
Critical fixes: - #1/#2: Use persistent prev_coeffs/prev_batch_configs for PT2 scoring (was using best_coeffs before it was defined in current iteration) - QuantumNoLab#5: Eviction checks cumulative_basis size (was checking best_batch_configs) - QuantumNoLab#7: Convert numpy indices to torch.LongTensor before indexing CUDA tensor - QuantumNoLab#8: Default energy_weight=0.0, entropy_weight=0.0 (was 0.1/0.05, silently changing existing behavior) Lower priority fixes: - QuantumNoLab#6: CIPSI docstring mentions build_sparse_hamiltonian requirement - QuantumNoLab#9: Warn when energy_weight>0 but hamiltonian=None Not applicable (won't fix): - QuantumNoLab#3/QuantumNoLab#4: config_integer_hash returns tuple for n_sites>=64; our CAS max is 58Q (<64), so int cast is safe Co-authored-by: leo07010 <leo07010@gmail.com>
Copilot fixes: - #1: top_idx on same device as unique_new (CUDA compat) - #2: use prev_energy (consistent with prev_coeffs eigenvector) - QuantumNoLab#3: eviction log only when eviction actually occurred - QuantumNoLab#4: remove dead test code, clarify as smoke test New: - 64+Q hash support (remove int() casts, use hash values directly) - Cr₂-CAS(12,32) @ 64Q and Cr₂-CAS(12,36) @ 72Q registry entries - Both MOLECULE_REGISTRY and _MOLECULE_INFO_REGISTRY synced (26 total) Co-authored-by: leo07010 <leo07010@gmail.com>
TLDR
Six internal optimizations to ConnectionCache and eigensolver dispatch. No public API changes, all existing imports remain valid. 317 tests pass, 42% coverage.
What changed
ConnectionCache (4 improvements)
Unified duplicate implementations -- The standalone
ConnectionCacheinloss_functions.pyis replaced with a re-export from_utils.hashing.connection_cache. Addedget_or_compute(config, hamiltonian)method that auto-computes on cache miss, eliminating the need for a separate class.FIFO to LRU eviction --
get()now promotes entries to most-recently-used. Eviction removes the least-recently-used entry instead of the oldest inserted. This keeps frequently accessed configs (e.g., HF state) in cache.Powers tensor caching -- The powers-of-2 tensor used for hashing is now computed once and reused across all
get()/put()calls, eliminating per-call tensor allocation overhead.Batch encoding -- Added
hash_batch(configs)(single matmul) andget_batch(configs)for vectorized cache operations on batches of configurations.Eigensolver (2 improvements)
Adaptive Davidson dispatch --
solve_generalized_eigenvalue()now accepts adavidson_thresholdparameter (default 500). For dense matrices with n > threshold, the Davidson iterative solver is used automatically instead of full O(n^3) dense eigh. Handles generalized eigenvalue problems via Cholesky transform of S, with graceful fallback to dense eigh if Cholesky or Davidson fails.Sparse-aware GPU path --
_solve_gpu()now checks if input is sparse before attempting GPU solve. Sparse matrices usecupyx.scipy.sparse.linalg.eigshinstead of densifying first, preventing OOM on large projected Hamiltonians.Test plan
ruff check+ruff formatclean (156 files)loss_functions.ConnectionCacheis same class as_utils.hashing.ConnectionCachedavidson_thresholddefaults to 500, preserving existing behavior for small matrices