Skip to content

Claude/optimize sensor resolve 1 xpjc#38

Merged
HanSur94 merged 12 commits into
mainfrom
claude/optimize-sensor-resolve-1Xpjc
Mar 19, 2026
Merged

Claude/optimize sensor resolve 1 xpjc#38
HanSur94 merged 12 commits into
mainfrom
claude/optimize-sensor-resolve-1Xpjc

Conversation

@HanSur94
Copy link
Copy Markdown
Owner

No description provided.

claude added 4 commits March 19, 2026 13:30
toStepFunction was O(n²) due to repeated cell array growth and array
concatenation inside the loop. Replace with single-pass pre-allocated
output: vectorized active-segment detection, vectorized gap detection,
and direct index writes with a final trim.

Also fix the allChanges concatenation in resolve() Step 1 — pre-compute
total length and fill via block copy instead of growing with [].

https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
New MEX file replaces the MATLAB toStepFunction inner loop with a
single-pass C implementation: count active segments, pre-allocate
output, fill with gap detection, trim once. Eliminates all MATLAB
interpreter overhead for this hot path.

- to_step_function_mex.c: C MEX source with pre-allocated buffers
- build_mex.m: register new MEX + copy to SensorThreshold/private
- mergeResolvedByLabel.m: persistent useMex gate dispatches to MEX
  when compiled, falls back to pure-MATLAB implementation otherwise

https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Rewrite with platform-specific SIMD for all hot phases:

- Phase 1: NaN scan uses SIMD self-compare (v==v is false for NaN)
  with branchless conditional-store index collection and early-exit
  skip when all lanes are NaN. AVX2: 4 doubles/cycle, SSE2/NEON: 2.
- Phase 2: segEnds shifted copy via SIMD load/store (simd_copy).
- Phase 3: Gap detection gathers prevEnd/currStart into packed buffers
  then uses SIMD compare + movemask (AVX2/SSE2) or lane extract (NEON).
- Phase 5: Final trim-to-size copy via simd_copy.

All four SIMD backends supported: AVX2, SSE2, ARM NEON, scalar fallback.
Uses simd_utils.h indirectly (same include path) and adds its own
intrinsics directly for NaN-specific ops not in simd_utils.h.

https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Function-based tests (Octave-compatible):
  - All NaN, single active, all contiguous, different values
  - NaN gap separator, mixed contiguous+gap, dataEnd edge
  - Single boundary, MEX parity check (when compiled)

Class-based MEX parity tests (MATLAB unittest):
  - Same edge cases as above, plus:
  - 20 randomized small trials with ~40% NaN density
  - 100K segment stress test exercising full SIMD paths
  - 50K all-active (no gaps) test
  - 10K all-NaN large test
  - 10K alternating NaN worst-case for gap detection

https://claude.ai/code/session_01GgjQM4v4dyk378ZHJbBCTJ
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'FastSense Performance'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.

Benchmark suite Current: d3356a5 Previous: 763306b Ratio
Downsample mean std(1M) 0.069 ms 0.033 ms 2.09
Instantiation mean std(1M) 1.492 ms 1.082 ms 1.38
Zoom cycle mean (1M) 16.405 ms 14.501 ms 1.13
Downsample mean std(5M) 0.085 ms 0.031 ms 2.74
Render mean std(5M) 15.119 ms 1.436 ms 10.53
Zoom cycle mean (5M) 15.82 ms 13.757 ms 1.15
Downsample mean std10M) 0.215 ms 0.096 ms 2.24
Instantiation mean std10M) 1.618 ms 1.351 ms 1.20
Render mean std10M) 4.126 ms 2.062 ms 2.00
Zoom cycle mean (10M) 15.5 ms 13.693 ms 1.13
Zoom cycle mean std10M) 0.982 ms 0.707 ms 1.39
Downsample mean std50M) 1.129 ms 0.516 ms 2.19
Zoom cycle mean (50M) 15.681 ms 13.608 ms 1.15
Downsample mean (100M) 213.427 ms 190.334 ms 1.12
Downsample mean ( std00M) 10.31 ms 0.463 ms 22.27
Zoom cycle mean (100M) 15.812 ms 13.617 ms 1.16
Downsample mean ( std00M) 33.218 ms 0.463 ms 71.75
Instantiation mean ( std00M) 1241.429 ms 183.504 ms 6.77
Render mean (500M) 688.837 ms 440.434 ms 1.56
Render mean ( std00M) 504.688 ms 2.383 ms 211.79

This comment was automatically generated by workflow using github-action-benchmark.

CC: @HanSur94

HanSur94 and others added 8 commits March 19, 2026 17:53
toStepFunction was a local function inside mergeResolvedByLabel.m,
making it invisible outside that file. The Octave test failed because
local functions cannot be called from external test files, even when
the private directory is on the path.

Extracting it to its own .m file in private/ keeps the same
encapsulation (only SensorThreshold code can call it) while making it
accessible to the test's proxy-directory pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The needs_build check in install.m only probed for binary_search_mex.
If older MEX files existed but to_step_function_mex was missing,
install() would skip build_mex() entirely. Now probes both
binary_search_mex and to_step_function_mex so any missing MEX
triggers an incremental rebuild.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stress-tests the full resolve pipeline with 500M datapoints, 2 state
channels (~9K total transitions), and 4 threshold rules with different
condition types (single-condition, multi-condition, upper, lower).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MEX files in private/ directories are invisible to exist() from
outside the parent package. Check actual file paths instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renders the 500M-point sensor with all resolved thresholds and
violations after the timing runs complete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves a tiny 4-point sensor before the timed runs to force MATLAB's
JIT compiler to compile all code paths (Sensor.resolve, binary_search,
compute_violations, toStepFunction, mergeResolvedByLabel). This way
all 3 timed runs measure steady-state performance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runs a tiny end-to-end workflow (Sensor, StateChannel, resolve,
FastSense render) on trivial data during install(). This forces
MATLAB's JIT to compile all hot code paths once per session, so the
first real call to resolve() or render() has no warmup penalty.

Uses a persistent flag so repeated install() calls skip the warmup.
Wrapped in try/catch so it never blocks installation on failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents a visible window flash during install() and avoids display
issues on headless CI runners.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@HanSur94 HanSur94 merged commit a1a4ec3 into main Mar 19, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants