CAMEL-23865: Use EWMA smoothing for throughput and fix TUI sparkline scaling#24356
Conversation
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
gnodet
left a comment
There was a problem hiding this comment.
Thanks for this improvement, Claus — the EWMA smoothing approach is sound and consistent with the existing LoadTriplet pattern, and the TUI sparkline scaling is a nice enhancement. The formula is correct and the overall direction is good.
A few items to address before merging (details in inline comments):
Blocking:
- The new test introduces
Thread.sleep()calls — project conventions require Awaitility instead. - Negative deltas are no longer guarded, which can produce persistent negative throughput via EWMA memory.
Suggestions:
3. reset() doesn't stop the watch — the first post-reset update can produce a spike that persists ~60s via EWMA.
4. niceMax and formatThroughput are duplicated between OverviewTab and EndpointsTab with divergent implementations.
5. The throughput JMX MBean attribute changed semantics (instantaneous → EWMA) — this should get an upgrade guide entry per project conventions.
This review covers project conventions and code correctness. It does not replace specialized review tools such as CodeRabbit, Sourcery, or SonarCloud.
This review was generated by an AI agent and may contain inaccuracies. Please verify all suggestions before applying.
| } else { | ||
| thp = 0; | ||
| // instantaneous rate in exchanges/second for this interval | ||
| double instantRate = (1000d / time) * delta; |
There was a problem hiding this comment.
The old code guarded if (delta > 0) and set thp = 0 for non-positive deltas. With EWMA, a negative instantRate (from currentReading < last, e.g. after a route restart that resets the exchange counter) bleeds into thp and takes ~60 seconds to decay back toward zero.
Consider clamping:
| double instantRate = (1000d / time) * delta; | |
| double instantRate = Math.max(0, (1000d / time) * delta); |
There was a problem hiding this comment.
Done — added Math.max(0, ...) clamping on the instantaneous rate. Also added watch.stop() in reset() so the next update() re-enters the initialization branch cleanly, avoiding post-reset spikes.
| t.update(total); | ||
| Thread.sleep(10); | ||
| for (int i = 0; i < 120; i++) { | ||
| total++; |
There was a problem hiding this comment.
The test introduces multiple Thread.sleep() calls. Per project conventions:
"New test code MUST NOT introduce
Thread.sleep()calls. Use the Awaitility library instead."
Since LoadThroughput relies on StopWatch internally, it requires real elapsed time. A better long-term approach would be an injectable time source for fully deterministic tests. At minimum, the sleep-based timing should be replaced with Awaitility polling assertions.
There was a problem hiding this comment.
Done — rewrote all tests using Awaitility's pollInterval + untilAsserted pattern. Since LoadThroughput relies on real elapsed time via StopWatch, the Awaitility polling drives the updates at 10ms intervals while the assertions verify convergence.
| @Test | ||
| public void testSmoothing() throws Exception { | ||
| LoadThroughput t = new LoadThroughput(); | ||
|
|
There was a problem hiding this comment.
This assertion (> 0) would also pass with the old raw throughput calculation. Consider tightening it to verify that EWMA smoothing is actually converging toward the expected steady-state rate (e.g., within a reasonable tolerance of the true rate).
There was a problem hiding this comment.
Done — tightened the convergence assertion to check throughput > 5.0 (verifying actual EWMA convergence toward the steady-state rate rather than just > 0). The smoothing test now also checks the value is bounded within a reasonable range.
| import static org.junit.jupiter.api.Assertions.assertTrue; | ||
|
|
||
| @DisabledOnOs(OS.AIX) | ||
| public class LoadThroughputTest { |
There was a problem hiding this comment.
This test is pure Java math + Thread.sleep — no JMX interaction. The @DisabledOnOs(OS.AIX) annotation may have been cargo-culted from other management tests that interact with platform MBeans. Is there a specific AIX reason to skip this test? If not, consider removing it.
There was a problem hiding this comment.
Good catch — removed. This is a pure Java math test with no JMX interaction, so the AIX exclusion was cargo-culted from other management tests.
| * 0.20 → 1, 1.5 → 2, 3 → 5, 7 → 10, 15 → 20, 35 → 50, 80 → 100, etc. | ||
| */ | ||
| private static long niceMax(long rawMax) { | ||
| long s = MetricsCollector.THROUGHPUT_SCALE; |
There was a problem hiding this comment.
niceMax and formatThroughput are duplicated between OverviewTab and EndpointsTab with different implementations:
- Here (
OverviewTab): hardcoded array of 10 steps up to1000 * THROUGHPUT_SCALE; falls back torawMax(no rounding) beyond that. EndpointsTab: uses awhile(true)loop with 1-2-5 step sequence that scales indefinitely.
The same data could get different Y-axis ceilings depending on which chart displays it. Consider extracting both methods to a shared utility (e.g., on MetricsCollector where THROUGHPUT_SCALE already lives) with a single correct implementation.
There was a problem hiding this comment.
Done — extracted both niceMax and formatThroughput to MetricsCollector as package-private static utility methods. Used the more robust EndpointsTab version (1-2-5 loop that scales indefinitely) as the single implementation. Both tabs now delegate to MetricsCollector.niceMax() and MetricsCollector.formatThroughput().
|
🧪 CI tested the following changed modules:
Build reactor — dependencies compiled but only changed modules were tested (3 modules)
|
Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
Use THROUGHPUT_SCALE (x100) in MetricsCollector so fractional msg/s rates are preserved as longs. Apply niceMax (1-2-5 step sequence) for Y-axis scaling. Use Locale.US for consistent dot decimal formatting. Endpoint charts use showYAxis(false) until tamboui supports yAxisFormatter. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
- Clamp negative instantaneous rate to zero to prevent EWMA memory bleed - Stop StopWatch on reset() to avoid post-reset throughput spike - Replace Thread.sleep() with Awaitility in tests per project conventions - Remove unnecessary @DisabledOnOs(OS.AIX) from pure-math test - Tighten test assertions to verify EWMA convergence behavior - Extract niceMax/formatThroughput to MetricsCollector shared utility - Add upgrade guide entry for throughput MBean semantics change Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Claus Ibsen <claus.ibsen@gmail.com>
a8138de to
54b6cab
Compare
Summary
Switch the throughput metric in LoadThroughput from raw instantaneous rate to EWMA (exponentially weighted moving average) smoothing with a 1-minute decay window. This produces stable, converging throughput readings instead of oscillating 0/spike values — consistent with how LoadTriplet already computes CPU load averages.
Also fixes the TUI sparkline charts to properly display sub-1.0 msg/s rates by scaling throughput values by 100x internally.
Changes
camel-management (core):
camel-jbang-plugin-tui:
docs:
Claude Code on behalf of Claus Ibsen