Skip to content

v0.4.21

@EdvardGK EdvardGK tagged this 01 Jun 12:22
Replaces the previous `Vec<ProductOutcome>::collect` between phase 2
and phase 3 of `mesh_ifc_streaming_framed` with a lock-free bounded
channel (crossbeam-channel) + per-product reorder buffer. Workers
stream outcomes as they finish tessellating; the main thread drains
in seq order to the sink. Peak in-flight memory is bounded by
`channel_cap × per_product_mesh` (≈ a few MB) instead of scaling
with total product count.

- T=1 fast path: `RAYON_NUM_THREADS=1` bypasses the channel + scope
  entirely and runs the original serial tessellate-and-emit loop,
  so single-thread hosts match the pre-rayon baseline timing.
- T>1: crossbeam-channel (lock-free, ~10× faster than std mpsc's
  mutex+condvar SyncSender on this workload). Cap = num_threads * 16
  for low backpressure on heavy products; HashMap reorder buffer
  (O(1)) keyed by seq id instead of BTreeMap.
- `std::thread::scope` so `&mut sink` stays on the caller's frame
  (no Send bound added to S: ProductSink). Worker panics drop their
  tx clone via rayon's panic-catching machinery; the channel closes
  when the last clone drops; drain exits cleanly and the panic
  re-raises from scope on join — propagates to the PyO3 catch_panic
  wrapper as before.
- Bench (LBK_RIBp_C 41 MB, 34k products, T=8): cfebf02 baseline
  ~318 ms median, GH #25 ~330 ms median. ~3% T=8 cost for the
  bounded RAM win; T=1 ~5% faster than cfebf02 (skips the rayon
  scaffolding entirely).

Bumps to v0.4.21. No cache schema change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assets 2
Loading