Conversation
…lculate); reduce session record cap to 100
- MAL terminal output sample now carries payload.value rendered per holder
type: Long/Int/Double scalars as JSON numbers (NaN / +-Infinity as strings),
LabeledValueHolder DataTables as JSON object {key: long}. Force
Metrics.calculate() at probe time for two-phase functions
(AvgHistogramPercentileFunction / SumHistogramPercentileFunction) so the
captured value carries the percentile result keyed by p=<rank> rather
than the empty pre-calculate map.
- Per-session record cap dropped from 10000 (default 1000) to 100 hard
cap = default. ~1 MiB worst-case heap per session and a UI-readable
page; operators inspect a handful of executions, not a paginated
firehose. SessionLimits constructor still rejects out-of-range with
IllegalArgumentException -> 400 invalid_limits.
- Both changes are debug-only and zero-cost when no session is installed:
the captureMeterEmit probe gate is JIT-elided on hot paths, calculate()
is idempotent via the function's isCalculated guard, and combine()
resets isCalculated=false so cluster-merged state still recomputes on
read.
- Updated the MAL admin-API doc with a concrete payload example and the
SWIP-13 reference cap line / heap math.
- E2E MAL flow asserts payload.value is present and numeric on every
terminal output sample.
- All 16 existing dsl-debugging unit tests pass.
There was a problem hiding this comment.
Pull request overview
This PR adjusts the MAL DSL live-debugger surface to (1) include the terminal meter emission’s computed value on the wire (including two-phase functions via an early calculate()), and (2) reduce per-session record retention to a much smaller cap to limit heap/UI impact.
Changes:
- Add
payload.valueto terminalmeterEmitsamples by rendering scalar and labeled holders, and forceMetrics.calculate()before reading values. - Reduce session record cap to 100 (default and hard cap) and update related tests/docs.
- Extend MAL e2e validation to assert terminal
payload.valueis present and numeric for the seeded rule.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/e2e-v2/cases/dsl-debugging/mal/dsl-debug-flow.sh | Adds an e2e assertion that terminal meterEmit includes numeric payload.value. |
| oap-server/server-admin/dsl-debugging/src/test/java/org/apache/skywalking/oap/server/admin/dsl/debugging/session/DebugSessionRegistryTest.java | Updates a test to track the new MAX_RECORD_CAP constant. |
| oap-server/server-admin/dsl-debugging/src/main/java/org/apache/skywalking/oap/server/admin/dsl/debugging/session/SessionLimits.java | Lowers MAX_RECORD_CAP to 100 and updates defaults/commentary. |
| oap-server/server-admin/dsl-debugging/src/main/java/org/apache/skywalking/oap/server/admin/dsl/debugging/session/AbstractDebugRecorder.java | Updates inline documentation referencing the new default cap. |
| oap-server/server-admin/dsl-debugging/src/main/java/org/apache/skywalking/oap/server/admin/dsl/debugging/mal/MALDebugRecorderImpl.java | Adds appendValue(...) to emit terminal payload.value and forces calculate() for two-phase functions. |
| docs/en/swip/SWIP-13.md | Updates SWIP-13 contract text to reflect the new record cap sizing and rationale. |
| docs/en/setup/backend/admin-api/dsl-debugging-mal.md | Updates the sample payload to include value in terminal output samples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+226
to
+242
| * BEFORE the streaming pipeline calls calculate(), so without forcing | ||
| * calculate() here the labeled value column would be an empty map for | ||
| * histogram-percentile rules — exactly when operators most need to | ||
| * verify what the rule emits. We force calculate() at probe time so | ||
| * the captured value matches what the storage row will contain. This | ||
| * is safe and zero-cost when debug is off: | ||
| * <ul> | ||
| * <li>The probe site itself is gated; when no operator has installed | ||
| * a session the call into this method is dead code that the JIT | ||
| * elides on hot-path inlining.</li> | ||
| * <li>{@code calculate()} is idempotent via the function's | ||
| * {@code isCalculated} guard, so the streaming pipeline's later | ||
| * call is a no-op rather than double work.</li> | ||
| * <li>Cross-node {@code combine()} resets {@code isCalculated=false}, | ||
| * so the post-merge calculate on combined state still happens | ||
| * on read — pre-computing here on the local snapshot doesn't | ||
| * leak stale values into the cluster.</li> |
Comment on lines
181
to
+194
| { "type": "output", | ||
| "sourceText": "e2e_demo_filtered_requests", | ||
| "continueOn": true, | ||
| "payload": { /* terminal meter sample — metric, entity, value, timeBucket */ } } | ||
| "payload": { | ||
| "metric": "e2e_demo_filtered_requests", | ||
| "entity": "MeterEntity(scopeType=SERVICE, serviceName=my-svc, …)", | ||
| "valueType": "sum", | ||
| "timeBucket": 202605091036, | ||
| "value": 42 /* shape depends on valueType: | ||
| number for Sum/Avg/Max/Min/CPM/Latest…, | ||
| object {bucket: count} for histograms / | ||
| *Labeled functions, omitted for non-scalar | ||
| holders. NaN/±Infinity render as strings. */ | ||
| } } |
- MALDebugRecorderImpl javadoc: drop the misleading "calculate() is
idempotent via isCalculated guard" claim. Reality: most
Metrics.calculate() implementations have no guard (AvgFunction,
CPMMetrics, SumMetrics just recompute every call), and the histogram-
percentile pair (AvgHistogramPercentileFunction,
SumHistogramPercentileFunction) check isCalculated at entry but never
set it to true on exit, so even those re-run on the streaming pipeline.
Reword to call out the bounded-but-real cost (one extra calculate per
probed emission per debug session) and keep the gating story
(probe-site JIT-elision when no session is installed; cluster combine()
resets the flag so peer reads still recompute).
- dsl-debugging-{mal,oal,lal}.md Limits sections: bring defaults +
examples in line with the new SessionLimits.MAX_RECORD_CAP = 100.
Previous tables said default 1000 with override examples like
{"recordCap": 200} — the new ceiling rejects those with
400 invalid_limits. Add explicit hard-cap columns for both recordCap
and retentionMillis so operators see the validated range up front.
wankai123
approved these changes
May 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix two related issues in the live debugger surface for unreleased SWIP-13 code
Bug 1 — terminal MAL
outputsample shows the wrong shape in the UI.Operators inspecting a
histogram_percentile([...])rule see bucket-keyedsamples (
(service_name=…, le=10),…, le=20, …) in the value column insteadof the percentile result they expect to see in storage (
{p=50},{p=90},{p=99}).Two reasons: (1) the captured
outputsample never carried the metric's actualreading on
payload.value— the wire only hadmetric/entity/valueType/
timeBucket, so the UI fell back to rendering the upstreamhistogram_percentilefunction-stageSampleFamily, which legitimately hasle=keyed samples but isn't what the rule emits. (2) Even if we addpayload.value,two-phase functions (
AvgHistogramPercentileFunction,SumHistogramPercentileFunction)populate their user-visible field (
percentileValues) lazily incalculate(),which the L1 streaming pipeline calls after the
captureMeterEmitprobe site,so a naive
getValue()read returns the empty pre-calculate map.This PR fixes both:
appendValue(JsonObject, AcceptableValue<?>)inMALDebugRecorderImplrenders
payload.valueper holder type — JSON number forLongValueHolder/
IntValueHolder/DoubleValueHolder(with\"NaN\"/\"Infinity\"string fall-back for non-finite doubles so the wire stays valid JSON), and
JsonObject {key: long}forLabeledValueHolder(label combos for*Labeledfunctions,p=<rank>entries for percentile functions).Metrics.calculate()once before thegetValue()read. Safe becausethe function's
isCalculatedguard makes the streaming pipeline's latercall a no-op, and cross-node
combine()resetsisCalculated=falsesopost-merge state still recomputes on read.
Both edits are inside
MALDebugRecorderImpl, which is only reached from thegated
captureMeterEmitprobe — when no operator has installed a session theprobe gate is JIT-elided on hot paths, so steady-state runtime cost is zero.
Bug 2 —
recordCapceiling is too generous for an operator-driven debugger.Hard cap was
10000records / session, default1000. Both burn UI pagereal estate and heap (per-record JSON ≈ 10 KiB → ~100 MiB worst case per
session). Operators inspect a handful of executions, not a paginated firehose.
This PR drops both to
100(default = hard cap =MAX_RECORD_CAP),yielding ~1 MiB worst-case heap per session. Out-of-range requests still get
400 invalid_limitsfrom the constructor.Add a unit test to verify that the fix works.
dsl-debuggingunit tests pass against the new code; nofixture broke. The cap-aware test (
DebugSessionRegistryTest.reapExpired_…)was switched to
SessionLimits.MAX_RECORD_CAPso it tracks the cap goingforward instead of pinning a stale literal.
payload.valueis present and numeric onevery terminal
outputsample.histogram_percentilerule: terminal sample now reports
value: {\"{p=50}\": 250000, \"{p=90}\": 5000000, \"{p=99}\": 10000000}.Explain briefly why the bug exists and how to fix it.
If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
Update the
CHANGESlog.10.5.0 changelog entry. The wire/cap changes won't be visible to any
released version.