Refactor optimization pipeline.#520
Conversation
…raction total mem.
…urable
- Add OptFor { Flops, Memsize } and ReorderSum { Reorder, NoReorder } in
a shared optimize/flags.hpp; thread them through single_term_opt and
the top-level optimize() API.
- Add index_to_extent_t alias for the type-erased provider used by the
public API; templated single_term_opt callers still pass callables
directly via the has_index_extent concept.
- Parallelize the outermost Sum's per-summand single-term optimization
with sequant::for_each; recurse sequentially on nested Sums.
- Have opt::reorder reuse pre-binarized eval nodes instead of rebuilding
them inside clusters().
- Harden opt_mixed_product placeholder labels (non-identifier prefix,
starts_with match, digit-walk suffix parse, asserted invariants).
- Fix std::forward misuse in flops_counter / memsize_counter that
prevented compilation under clang.
- Rename optimize/flags.hpp → optimize/options.hpp; move OptFor,
ReorderSum, and index_to_extent_t into it alongside a new
OptimizeOptions struct (opt_for / reorder / idx_to_extent with
sensible defaults).
- Collapse the six optimize() overloads into three (ExprPtr&,
ResultExpr&, ResultExpr&&), each taking OptimizeOptions{} by
default. Existing default-argument call sites are source-compatible.
- Stop including <SeQuant/core/optimize/single_term.hpp> from
optimize.hpp. The public optimize() API now needs only the
lightweight options header, so single_term.hpp (and its range-v3
transitives) no longer leak into consumers. This fixes the CI
unity-build failure in utilities/external-interface, where
range/v3/view/indirect.hpp's use of meta:: became ambiguous against
sequant::meta once a sibling .cpp in the same TU had issued
`using namespace sequant;`.
…nt optimize() API
2dc2175 to
59f4b57
Compare
|
|
||
| // Overloads for backwards compatibility | ||
|
|
||
| /// \deprecated Use the \c OptimizeOptions overload instead. |
There was a problem hiding this comment.
We can (and maybe should) make use of the [[deprecated]] attribute: https://en.cppreference.com/cpp/language/attributes/deprecated
There was a problem hiding this comment.
I thought about that. I might add it.
…ummands Add minimal tests for the refactored optimize() API: - both OptFor::Flops and OptFor::Memsize binarize a product correctly - ReorderSum knob, and bare optimize() == explicit Reorder - parallel summand optimization (set_num_threads 1 vs 4) yields identical results, guarding the optimize_impl(..., parallel_outer=true) path Also document the thread-safety invariants of the parallel branch (work on per-task clones; binarize()'s lazy Index::label() cache write is sequenced after for_each joins) and explain the deliberate IndexSet-vs-vector container choice and the exact-equality scalar sentinel in the flop/memsize counters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Review: Refactor optimization pipeline
Solid, well-structured refactor. I did a focused thread-safety audit of the new parallel path and added minimal regression tests + documentation; details below. Pushed as a1d0264 on this branch.
Parallelism (optimize_impl(..., parallel_outer=true)) — SAFE as written, conditional
A four-angle audit (TensorCanonicalizer · TensorNetwork/bliss · Index/Expr/registry · for_each mechanics) found no data race, resting on two invariants that currently hold:
TensorNetworkclones every tensor on insertion, so the unsynchronizedmutablelazy caches (Expr::hash_value_,Index::label_/full_label_) are only ever written on thread-local clones.Index::operator==/<=>compare immutable members only, so building index sets over shared indices is safe.- The one step that lazily mutates shared
Index/Exprcaches —binarize()viaIndex::label()— runs in the sequential loop afterfor_eachjoins (optimize.cpp), never insidedo_term.
Canonicalizer/bliss global state is read-only during canonicalization; the swap counter is thread_local; the loop writes disjoint, pre-sized new_smands[i] slots.
I documented invariants (1)/(2) in a comment at the parallel branch so a future edit doesn't silently break them by moving a ->hash_value()/Index::label() call into do_term.
Caveats worth knowing:
- The default
Contextandcardinal_tensor_labelsmust be configured before enteringoptimize()(their writes are unsynchronized unlessSEQUANT_CONTEXT_MANIPULATION_THREADSAFE);optimize()only reads them. - On Linux/GCC builds
for_eachusesstd::execution::par_unseq, which ignoresnum_threads()— soset_num_threads(1)will not serialize this loop there (it does on the Apple-Clang/macOS build, which uses the manual-thread fallback).
Tests added (commit a1d0264)
OptimizeOptionsAPI: bothOptFor::FlopsandOptFor::Memsizebinarize a product correctly;ReorderSumknob; bareoptimize()== explicitReorder.- Parallel/sequential equivalence: a 3-summand sum optimized under
set_num_threads(1)vs(4)must produce identical results — the safety net for the invariants above.
Minor doc comments added
flops_countervsmemsize_counter: documented why flops needs<IndexSet>(concatenated operands repeat contracted indices) while memsize can use the default vector (per-operand, no dups).- Documented the exact-equality
== 1./!= 1.scalar sentinel convention in both counters.
Strengths (unchanged from the PR)
- Placeholder-label hardening (
@__opt_+ digit-by-digit parse with asserts) fixes the oldI_-prefixstd::stoi-on-raw-data()collision risk. reorderno longer re-binarizes (precomputed nodes threaded through, withSEQUANT_ASSERT(nodes.size() == expr.size())).- Clean extraction of
init_results/build_subnet_metadata(correctlyinline); deprecated bool overloads correctly drop their default args to avoid ambiguity with theOptimizeOptions = {}overload.
LGTM once CI is green.
No description provided.