Replaces the previous `Vec<ProductOutcome>::collect` between phase 2
and phase 3 of `mesh_ifc_streaming_framed` with a lock-free bounded
channel (crossbeam-channel) + per-product reorder buffer. Workers
stream outcomes as they finish tessellating; the main thread drains
in seq order to the sink. Peak in-flight memory is bounded by
`channel_cap × per_product_mesh` (≈ a few MB) instead of scaling
with total product count.
- T=1 fast path: `RAYON_NUM_THREADS=1` bypasses the channel + scope
entirely and runs the original serial tessellate-and-emit loop,
so single-thread hosts match the pre-rayon baseline timing.
- T>1: crossbeam-channel (lock-free, ~10× faster than std mpsc's
mutex+condvar SyncSender on this workload). Cap = num_threads * 16
for low backpressure on heavy products; HashMap reorder buffer
(O(1)) keyed by seq id instead of BTreeMap.
- `std::thread::scope` so `&mut sink` stays on the caller's frame
(no Send bound added to S: ProductSink). Worker panics drop their
tx clone via rayon's panic-catching machinery; the channel closes
when the last clone drops; drain exits cleanly and the panic
re-raises from scope on join — propagates to the PyO3 catch_panic
wrapper as before.
- Bench (LBK_RIBp_C 41 MB, 34k products, T=8): cfebf02 baseline
~318 ms median, GH #25 ~330 ms median. ~3% T=8 cost for the
bounded RAM win; T=1 ~5% faster than cfebf02 (skips the rayon
scaffolding entirely).
Bumps to v0.4.21. No cache schema change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>