Skip to content

refactor: internal optimizations for ConnectionCache and eigensolver#4

Merged
George930502 merged 9 commits into
QuantumNoLab:mainfrom
thc1006:refactor/internal-optimizations
Mar 26, 2026
Merged

refactor: internal optimizations for ConnectionCache and eigensolver#4
George930502 merged 9 commits into
QuantumNoLab:mainfrom
thc1006:refactor/internal-optimizations

Conversation

@thc1006
Copy link
Copy Markdown
Member

@thc1006 thc1006 commented Mar 26, 2026

TLDR

Six internal optimizations to ConnectionCache and eigensolver dispatch. No public API changes, all existing imports remain valid. 317 tests pass, 42% coverage.

What changed

ConnectionCache (4 improvements)

  1. Unified duplicate implementations -- The standalone ConnectionCache in loss_functions.py is replaced with a re-export from _utils.hashing.connection_cache. Added get_or_compute(config, hamiltonian) method that auto-computes on cache miss, eliminating the need for a separate class.

  2. FIFO to LRU eviction -- get() now promotes entries to most-recently-used. Eviction removes the least-recently-used entry instead of the oldest inserted. This keeps frequently accessed configs (e.g., HF state) in cache.

  3. Powers tensor caching -- The powers-of-2 tensor used for hashing is now computed once and reused across all get()/put() calls, eliminating per-call tensor allocation overhead.

  4. Batch encoding -- Added hash_batch(configs) (single matmul) and get_batch(configs) for vectorized cache operations on batches of configurations.

Eigensolver (2 improvements)

  1. Adaptive Davidson dispatch -- solve_generalized_eigenvalue() now accepts a davidson_threshold parameter (default 500). For dense matrices with n > threshold, the Davidson iterative solver is used automatically instead of full O(n^3) dense eigh. Handles generalized eigenvalue problems via Cholesky transform of S, with graceful fallback to dense eigh if Cholesky or Davidson fails.

  2. Sparse-aware GPU path -- _solve_gpu() now checks if input is sparse before attempting GPU solve. Sparse matrices use cupyx.scipy.sparse.linalg.eigsh instead of densifying first, preventing OOM on large projected Hamiltonians.

Test plan

  • 19 new tests across ConnectionCache and eigensolver modules
  • Full suite: 317 passed, 0 failed, 42% coverage
  • ruff check + ruff format clean (156 files)
  • Backward compatibility verified: all existing imports work, loss_functions.ConnectionCache is same class as _utils.hashing.ConnectionCache
  • davidson_threshold defaults to 500, preserving existing behavior for small matrices

thc1006 added 5 commits March 26, 2026 20:05
- Add get_or_compute(config, hamiltonian) for auto-compute-on-miss
  pattern, unifying the two ConnectionCache interfaces
- Change eviction from FIFO to LRU: get() promotes entry to
  most-recently-used, eviction removes least-recently-used
- Cache powers-of-2 tensor across calls instead of rebuilding per hash
- Add 7 new tests: get_or_compute (4), LRU eviction (2), powers (1)
- All 300 tests pass, 0 regressions

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Vectorized batch hashing via single matmul (configs * powers).sum().
get_batch returns list of cached results with LRU promotion.
5 new tests covering shape, consistency, uniqueness, and batch lookup.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Remove the standalone ConnectionCache class from loss_functions.py and
re-export from the canonical _utils.hashing.connection_cache module.
Update call site to use get_or_compute() instead of the old get(config,
hamiltonian) interface.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Add davidson_threshold parameter to solve_generalized_eigenvalue().
For dense matrices with n > threshold (default 500), automatically
use the Davidson iterative solver instead of full O(n^3) dense eigh.

Handles generalized eigenvalue problems via Cholesky transform of S,
with graceful fallback to dense eigh if Cholesky or Davidson fails.

4 new tests covering adaptive dispatch, threshold parameter, and
correctness verification against numpy reference.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
When _solve_gpu receives sparse H or S, it now attempts CuPy sparse
eigsh before falling back to dense GPU eigh. This prevents OOM on
large systems where H is sparse (e.g., 10K x 10K projected
Hamiltonian).

3 new tests covering sparse input handling, fallback paths, and
dense GPU correctness.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 26, 2026 12:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors and optimizes internal connection caching and eigensolver dispatch to reduce redundant computation, improve cache efficiency, and better handle large/sparse eigenproblems without changing import paths.

Changes:

  • Replaced the inlined ConnectionCache in loss_functions.py with a re-exported implementation and updated local-energy computation to use get_or_compute.
  • Enhanced ConnectionCache with LRU eviction, cached powers tensor, and batch hashing/lookup helpers.
  • Added adaptive Davidson dispatch for large dense problems and a sparse-aware GPU path for eigenvalue solves, plus expanded test coverage.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/qvartools/_utils/hashing/connection_cache.py Implements LRU cache behavior, powers tensor reuse, and batch hash/lookup APIs.
src/qvartools/flows/training/loss_functions.py Switches to the shared ConnectionCache and uses get_or_compute in local energy computation.
src/qvartools/diag/eigen/eigenvalue.py Adds Davidson-based dispatch for large dense matrices and sparse-aware GPU solving.
tests/test_utils/test_connection_cache.py Adds tests for get_or_compute, LRU behavior, powers caching, and batch APIs.
tests/test_diag/test_eigensolver.py Adds tests for adaptive Davidson dispatch and sparse-aware GPU behavior/fallbacks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/qvartools/diag/eigen/eigenvalue.py Outdated
Comment thread src/qvartools/_utils/hashing/connection_cache.py Outdated
Comment thread src/qvartools/_utils/hashing/connection_cache.py Outdated
Comment on lines 23 to 32
@@ -31,81 +32,6 @@
]
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loss_functions.ConnectionCache is still exported in __all__, but it now re-exports _utils.hashing.connection_cache.ConnectionCache, whose get() signature/behavior differs from the previously inlined class (the old get(config, hamiltonian) always computed on miss). This is a breaking change for callers that used ConnectionCache.get(config, hamiltonian), despite the PR description claiming no public API changes. Consider keeping a thin compatibility wrapper in this module (or making ConnectionCache.get accept an optional hamiltonian and compute on miss, delegating to get_or_compute).

Copilot uses AI. Check for mistakes.
Comment thread src/qvartools/diag/eigen/eigenvalue.py Outdated
Comment thread src/qvartools/diag/eigen/eigenvalue.py Outdated
thc1006 added 3 commits March 26, 2026 20:22
- Cache powers tensor on target device to avoid repeated CPU->GPU copies
- Use matmul (configs @ powers) instead of elementwise multiply+sum in
  hash_batch for better BLAS/cuBLAS utilization
- Replace O(n^2) np.allclose(S, np.eye(n)) identity check with O(n)
  diagonal + off-diagonal norm check to avoid allocating full identity
- Fix inconsistent order[:k] slice in _solve_gpu sparse path

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
1. Sparse GPU failure now returns None instead of densifying (OOM risk)
2. Add n_sites >= 64 overflow guard with clear error message
3. ConnectionCache.get() accepts optional hamiltonian for backward compat
   with the old loss_functions.ConnectionCache.get(config, hamiltonian)
4. Davidson dispatch only activates when which="SA" (smallest algebraic)
5. Thread which parameter through _solve_gpu to sparse GPU eigsh path

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
hash_batch bypassed _hash's overflow check. Add the same
_MAX_SITES_INT64 guard to prevent silent int64 overflow for
n_sites >= 64.

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +226 to +231
# Transform to standard form: L^{-1} H L^{-T} where S = L L^T
try:
L = scipy.linalg.cholesky(S_dense, lower=True)
L_inv = scipy.linalg.solve_triangular(L, np.eye(L.shape[0]), lower=True)
H_work = L_inv @ H_dense @ L_inv.T
except scipy.linalg.LinAlgError:
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generalized-to-standard transform forms an explicit inverse L_inv by solving against a dense identity, then does two dense matmuls. For n > davidson_threshold, this allocates O(n^2) extra memory and adds an O(n^3) step that can dominate runtime, undermining the purpose of Davidson. Prefer applying L^{-1} / L^{-T} via solve_triangular (or cho_solve) without constructing L_inv/np.eye(n).

Copilot uses AI. Check for mistakes.
Comment thread tests/test_diag/test_eigensolver.py Outdated
Comment on lines +23 to 25
from qvartools._utils.hashing.connection_cache import ConnectionCache
from qvartools.hamiltonians.hamiltonian import Hamiltonian

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching loss_functions.ConnectionCache to the _utils.hashing.connection_cache.ConnectionCache introduces a hard limit of 63 sites (it raises ValueError on larger configs). The previous in-file cache keyed by tuple(config.tolist()) supported arbitrary lengths, so this can break physics-guided training for larger Hamiltonians. Consider adding an overflow-safe key path (e.g., reuse config_integer_hash/tuple keys for n_sites>=64) or keeping the old tuple-key fallback to preserve behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +113
_MAX_SITES_INT64 = 63 # int64 can represent 2^0 .. 2^62 without overflow

def _hash(self, config: torch.Tensor) -> int:
"""Convert a binary configuration to an integer hash.

def get(self, config: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor] | None:
Raises
------
ValueError
If ``n_sites`` exceeds 63 (int64 overflow). For larger
systems use :func:`config_integer_hash` which splits into
``(hi, lo)`` tuples.
"""
n = config.shape[0]
if n > self._MAX_SITES_INT64:
raise ValueError(
f"ConnectionCache supports at most {self._MAX_SITES_INT64} sites "
f"(got {n}). Use config_integer_hash for larger systems."
)
powers = self._get_powers(n, config.device)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_hash() now raises for n_sites > 63, but the docstring suggests using config_integer_hash for larger systems while ConnectionCache itself cannot store tuple keys. If this cache is used for fermionic problems with >=64 sites, this will be a runtime break. Consider extending the cache key type to int | tuple[int,int] (or using config_integer_hash internally) so large systems remain supported without overflow.

Copilot uses AI. Check for mistakes.
1. Replace explicit L_inv with solve_triangular in Davidson Cholesky
   transform — avoids O(n^2) np.eye allocation and extra O(n^3) matmul
2. Reduce Davidson test matrix from 800x800 to 300x300 for faster CI
3. Fall back to tuple keys for n_sites >= 64 instead of raising
   ValueError — preserves backward compatibility with the old
   loss_functions.ConnectionCache that used tuple(config.tolist())
4. Update hash_batch and get_batch to handle tuple key fallback

Signed-off-by: thc1006 <84045975+thc1006@users.noreply.github.com>
@George930502
Copy link
Copy Markdown
Collaborator

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@George930502 George930502 merged commit c29917e into QuantumNoLab:main Mar 26, 2026
5 checks passed
@thc1006 thc1006 deleted the refactor/internal-optimizations branch March 26, 2026 15:17
thc1006 added a commit to thc1006/qvartools that referenced this pull request Mar 30, 2026
Previously deferred 3 items — all now fixed:

#1: Cr₂ singlet test: assert e_core in [-2040, -2030] Ha range
    (septet would give different e_core, catches missing fix_spin_)
QuantumNoLab#4: get_integral_cache() now returns CAS-aware wrapper that bypasses
    joblib when cas is not None (was a real cache-escape bug)
QuantumNoLab#7: Add test_cache_actually_bypassed_for_cas using mock.patch to
    verify compute_molecular_integrals is called directly

Also from new comment:
QuantumNoLab#13: Remove dead tuple/list branch for nelecas in _compute_cas_integrals
     (public API is tuple[int, int], nelecas is always int)
thc1006 added a commit to thc1006/qvartools that referenced this pull request Apr 1, 2026
Critical fixes:
- #1/#2: Use persistent prev_coeffs/prev_batch_configs for PT2 scoring
  (was using best_coeffs before it was defined in current iteration)
- QuantumNoLab#5: Eviction checks cumulative_basis size (was checking best_batch_configs)
- QuantumNoLab#7: Convert numpy indices to torch.LongTensor before indexing CUDA tensor
- QuantumNoLab#8: Default energy_weight=0.0, entropy_weight=0.0 (was 0.1/0.05,
  silently changing existing behavior)

Lower priority fixes:
- QuantumNoLab#6: CIPSI docstring mentions build_sparse_hamiltonian requirement
- QuantumNoLab#9: Warn when energy_weight>0 but hamiltonian=None

Not applicable (won't fix):
- QuantumNoLab#3/QuantumNoLab#4: config_integer_hash returns tuple for n_sites>=64; our CAS max
  is 58Q (<64), so int cast is safe

Co-authored-by: leo07010 <leo07010@gmail.com>
thc1006 added a commit to thc1006/qvartools that referenced this pull request Apr 1, 2026
Copilot fixes:
- #1: top_idx on same device as unique_new (CUDA compat)
- #2: use prev_energy (consistent with prev_coeffs eigenvector)
- QuantumNoLab#3: eviction log only when eviction actually occurred
- QuantumNoLab#4: remove dead test code, clarify as smoke test

New:
- 64+Q hash support (remove int() casts, use hash values directly)
- Cr₂-CAS(12,32) @ 64Q and Cr₂-CAS(12,36) @ 72Q registry entries
- Both MOLECULE_REGISTRY and _MOLECULE_INFO_REGISTRY synced (26 total)

Co-authored-by: leo07010 <leo07010@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants