SWIP-12 design + UITemplateInitializer auto-discovery & dev hot-reload by wu-sheng · Pull Request #13834 · apache/skywalking

wu-sheng · 2026-04-20T09:16:38Z

SWIP-12 design + UITemplateInitializer extensibility

This PR bundles three related changes that together unlock future mini-program monitoring work:

1. SWIP-12 design doc (docs/en/swip/SWIP-12.md)
Proposes WeChat & Alipay Mini-Program monitoring as a new pair of Layer values. Covers SDK alignment (histogram bucket unit), native-trace SegmentListener SPI, entity model, layer partitioning, dashboards layout, and MAL/OAL scope split. Still a design-only proposal — implementation lands in follow-up PRs.

2. UITemplateInitializer extensibility + dev hot-reload

UI_TEMPLATE_FOLDER is now computed from Layer.values() + "custom" at class-init time. Adding a new Layer enum value is enough — drop a ui-initialized-templates/<layer-name-lowercased>/ folder on disk and it's scanned on the next boot. Removes the prior hardcoded allowlist that was easy to miss.
SW_UI_TEMPLATE_FORCE_RELOAD env var switches the initializer from addIfNotExist to a new addOrReplace helper on UITemplateManagementService. When true, shipped templates overwrite any seeded copy every boot — so dev/extension edits show up after a simple OAP restart without wiping storage. Unset / false preserves the production behavior where operator UI edits persist.
UITemplateCheckerTest updated to tolerate missing folders (several Layer values have no template folders today).

3. new-monitoring-feature skill (.claude/skills/new-monitoring-feature/SKILL.md)
A wiring map for contributors adding a new layer: which extension point handles which signal (OAL / MAL / LAL / SpanListener / SegmentListener), where contracts live, UI template + submodule touchpoints, and cross-cutting traps.

If this is non-trivial feature, paste the links/URLs to the design doc. — docs/en/swip/SWIP-12.md
Update the documentation to include this new feature. — SWIP doc + readme index updated
Tests(including UT, IT, E2E) are added to verify the new feature. — UITemplateCheckerTest updated for auto-discovery
If it's UI related, attach the screenshots below. — N/A; no UI dashboards land in this PR (follow-up)
If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.
Update the CHANGES log. — deferred until mini-program monitoring implementation lands

Proposes WECHAT_MINI_PROGRAM (48) / ALIPAY_MINI_PROGRAM (49) layers driven by mini-program-monitor SDK v0.3+ (OTLP + SkyWalking native segments). Reuses LAL layer:auto + sourceAttribute() from SWIP-11 and componentId-based layer mapping in CommonAnalysisListener.getLayer() — no new SPI. Reserves JS componentIds 10002 (WeChat) / 10003 (AliPay), already shipping in the SDK. Showcase data generator consumes the SDK's published sim-wechat / sim-alipay GHCR images directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SDK alignment: - Bump recommended SDK to ≥ v0.4.0 (clean serviceInstance default) - Pin showcase sim images to v0.4.0; add SERVICE_INSTANCE env to mirror SDK recommendation - Reword Mini Program Setup comment from "workaround" to "v0.4.0 recommendation" - §8 status table: serviceInstance change is shipped, not pending Correctness fixes from re-reading: - §4 MAL: switch to chained .endpoint([...], [...], Layer.X) for per-page metrics (matches APISIX/RocketMQ pattern); replace incorrect "label normalization maps service.version → service_instance_id" claim with the actual behavior (agent sets it; OAP uses literal "-" if absent) - §6: correct method to protected Layer identifyServiceLayer(SpanLayer) on the abstract base — was wrongly named getLayer and described as static - §9 Dashboard: reword trace widget — "service list filtered by layer; trace widget shows in-scope service's traces" (filter is at service-list level, not on the trace widget) - §Limitations: drop stale miniprogram.device span tag reference (SDK v0.4.0 dropped device id entirely; tag was never shipped) - §Compatibility: document that OAP records literal "-" instance entity when SDK serviceInstance is unset Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…v vars, instance coherence Four review issues fixed against ground-truth source: - §4 MAL: stop fragmenting service-scoped meters by service_instance_id. Now follows the iOS pattern (otel-rules/ios/ios-metrickit.yaml + ios-metrickit-instance.yaml): four files (per-platform × per-scope), service-scoped meters key on service_name only, instance-scoped meters go in their own file with expSuffix: instance(...). This makes the "overall app health" service view genuinely fleet-aggregated and provides the metrics behind the per-instance dashboard. - §9 Dashboards: UITemplateInitializer requires layer-name folders (Layer.X.name().toLowerCase()) and an entry in the hard-coded UI_TEMPLATE_FOLDER allowlist. Hyphenated folders (wechat-mini-program/) are silently skipped. Specify wechat_mini_program/ / alipay_mini_program/ folders + UI_TEMPLATE_FOLDER appends. - §General usage: fix env var names to match application.yml — SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES (was incorrectly SW_OTEL_RECEIVER_RULES). Spell out that mini-program rules must be appended to the existing default lists for both OTEL metrics rules and lalFiles, not replace them. - §2 + §5: instance identity made coherent across signals. SDK only emits OTLP service.instance.id when operator sets serviceInstance, and segments use serviceInstance || "-". LAL extractor changed from sourceAttribute("service.version") to sourceAttribute("service.instance.id") so logs/metrics/traces all key off the same value. Added explicit "Instance coherence" subsection documenting how the three pipelines align (and what happens when serviceInstance is unset). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…d prefix coherence, component lookup via catalog service Three review issues fixed against ground-truth source: - §2 "Instance coherence" + §Compatibility: stop overstating that unset serviceInstance lands as the same literal "-" across all signals. Verified per pipeline: - Segments: SDK substitutes "-" (request.ts:147) - Logs: TrafficSinkListener:83 short-circuits when serviceInstance is empty — no instance traffic at all - Metrics: SampleFamily.dim() collapses missing labels to "" — no instance entity built Per-pipeline behavior table now shown explicitly. The unset case is not "still consistent in that no view is meaningful" hand-waving; segments produce a "-" entity while logs/metrics produce nothing. Operators must set serviceInstance for instance-level dashboards to be populated; otherwise the three pipelines diverge. Inline LAL comment updated to match. - §9 Dashboards: metric names now use the per-platform prefixes from §4 (meter_wechat_mp_*, meter_wechat_mp_instance_*, etc.) instead of the stale meter_miniprogram_* prefix. Dropped first_paint_time from the dashboard table since §3/§4 explicitly exclude it from MAL aggregation. Per-platform dashboard tables shown separately so WeChat-only navigation panels don't appear under Alipay. - §6 Trace layer mapping: rewrite to use IComponentLibraryCatalogService for component-name → id resolution instead of fictional ComponentsDefine.WECHAT_MINI_PROGRAM constants. component-libraries.yml is the single source of truth — there are no auto-generated Java constants for component IDs (verified against ComponentLibraryCatalogService.java:75-104). Listener constructor resolves the two ids once via catalog.getComponentId("WeChat-MiniProgram") / "AliPay-MiniProgram" and caches as int fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ance regression, listener subclass list Three review issues fixed against ground-truth source: - §9 Dashboards: add the layer-root template requirement. Layer.vue:41-44 requires a dashboard with isRoot: true to render the menu landing page (precedent: ios/ios-root.json). Without it, clicking Mobile > WeChat Mini Program shows an empty "no dashboard" view. Both wechat_mini_program-root.json and alipay_mini_program-root.json now listed in the folder layout. - §4 + §General usage: drop the residual "literal -" / "everything aggregates under -" wording. §2 was already corrected; §4 Notes bullet and the WeChat init example comment still asserted the uniform-fallback story. Replaced with the verified per-pipeline behavior: Analyzer.java:345 (instance traffic only emitted when non-empty) + SampleFamily.dim() (collapses missing labels to "") mean OTLP metrics produce no instance entity at all when serviceInstance is unset. Only segments substitute "-" at the SDK wire. - §6 Trace layer mapping: correct the listener subclass list. CommonAnalysisListener is extended only by RPCAnalysisListener and EndpointDepFromCrossThreadAnalysisListener (verified via grep). SegmentAnalysisListener has its own service-meta path and does not extend the base. Doesn't change the design surface (still 5 call sites in 2 files) but clarifies which classes are touched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Technical-correctness fixes (would produce broken metrics in prod): - §4 MAL: document the ms-histogram bucket unit trap. SDK emits le in ms; MAL default assumes seconds and rescales ×1000. Without compensation, request-latency percentiles come out 1000× too large (same trap that bit MetricKit in SWIP-11). Two implementation paths documented: OTLP receiver honoring Metric.unit, or a targeted preprocessor. - §4 MAL: document the +Inf overflow bucket rendering as ~9.2e18 in UI. Needs finite sentinel (SDK-side bound update to 30s, or OAP-side preprocessor ceiling). Design-clarification additions: - §3a new subsection: enumerate the OAL / topology metrics that come for free from the componentId-driven layer assignment in §6. After RPCAnalysisListener emits Service/Instance/Endpoint/Relation sources with the right layer, core.oal produces service_cpm, service_resp_time, service_percentile, endpoint_*, plus outbound topology edges. Readers would otherwise think dashboards only have the §3 metric table. - §3b new subsection: define where error_count actually comes from. A new log-MAL rule file log-mal-rules/miniprogram.yaml converts LAL-extracted error samples into per-(service, exception_type) counters. §5 LAL rule updated with a metrics {} block emitting the raw miniprogram_error_count sample. Process / deliverables additions: - §11 new: OAP-side e2e test case (test/e2e-v2/cases/miniprogram/ {wechat,alipay}/), separate from the showcase demo generator. Drives sim images in MODE=once against the full OAP wiring. CI matrix entry in .github/workflows/skywalking.yaml required. - §11 also: config-dump.yml mirror update required when application.yml defaults change (miniprogram/* in enabledOtelMetricsRules, miniprogram in lalFiles and malFiles). - §12 new: Security Notice. Mini-program SDKs post from end-user devices on the public internet — same exposure profile as iOS / browser. Add a client-side-monitoring paragraph to docs/en/security/README.md. - §13 new: Implementation Deliverables Checklist covering the two user-facing backend-*-mini-program-monitoring.md docs, docs/menu.yml entries, changes.md changelog, readme.md SWIP move, and booster-ui i18n PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… trap Prior wording cast OAP's histogram-unit rescale as a silent trap needing either an OTLP receiver enhancement or a per-feature preprocessor. That was overstated. Verified against source (SampleFamily.java:459-487, PrometheusMetricConverter.java:78): OAP doesn't enforce a unit — it just rescales le labels to ms using SampleFamily.defaultHistogramBucketUnit. Default SECONDS matches Prometheus ecosystem convention, which is what shipped rules assume. There is no silent-bug surface; it's a standard coordination between source system and MAL rule. SWIP-12 now specifies: the SDK should align miniprogram.request.duration bounds to seconds convention (divide current ms bounds by 1000) in its next release. Clean, single-line SDK change, no OAP-side plumbing added. The +Inf overflow bucket note is retained but downgraded from "must fix" to "low-risk dashboard-rendering concern; add a finite ceiling only if outliers surface in practice." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…olders + dev force-reload Motivation: adding a new feature/layer previously required two edits in UITemplateInitializer — the Layer enum plus appending to a hardcoded UI_TEMPLATE_FOLDER allowlist. Easy to miss the allowlist, silent failure (templates on disk but never loaded). Editing shipped templates also required wiping the storage container for changes to take effect. This commit collapses both pain points: - UI_TEMPLATE_FOLDER is now computed once from Layer.values() + "custom" at class-init time. Drop a ui-initialized-templates/<layer-name-lowercased>/ folder on disk and it's scanned on the next boot. Missing folders are silently skipped (same catch that existed before). Adding a new Layer enum value is enough; no second edit here. - SW_UI_TEMPLATE_FORCE_RELOAD environment variable (read directly from System.getenv, not wired through application.yml) switches the initializer from addIfNotExist to a new addOrReplace helper on UITemplateManagementService. When true, shipped templates overwrite any previously seeded copy every boot — so dev/extension edits show up after a simple OAP restart. Unset / false preserves the default production behavior where operator UI edits persist across restarts. Changes: - UITemplateInitializer: dynamic UI_TEMPLATE_FOLDER, FORCE_RELOAD flag, branch on addIfNotExist vs addOrReplace. - UITemplateManagementService: new addOrReplace(DashboardSetting) — addTemplate if absent, changeTemplate if present. - UITemplateCheckerTest: tolerate missing folders (some Layer enum values — UNDEFINED, FAAS, CACHE, DATABASE, MQ, VIRTUAL_GATEWAY, GENAI — have no template folders today). - SWIP-12 §9: drop the "append to UI_TEMPLATE_FOLDER allowlist" step; keep the folder-naming (Layer.name().toLowerCase() with underscores) and isRoot template requirements. Verified: `mvn test -Dtest=UITemplateCheckerTest` passes with auto-discovery covering all Layer enum values, skipping those without on-disk template folders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Orients contributors to the Layer enum registration, extension-point selection (OAL / MAL / LAL / SpanListener / SegmentListener), UI template + submodule touchpoints, and cross-cutting traps that don't live in any single extension-specific skill.

Copilot

Pull request overview

This PR introduces SWIP-12 (mini-program monitoring design), improves UI template initialization by auto-discovering template folders from Layer.values(), and adds a dev-oriented hot-reload mode for shipped UI templates.

Changes:

Add SWIP-12 design doc and index it in the SWIP readme.
Auto-discover UI template folders from Layer enums (+ custom) and add SW_UI_TEMPLATE_FORCE_RELOAD to overwrite seeded templates on restart.
Update UITemplateCheckerTest to skip layers that don’t have an on-disk template folder.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`oap-server/server-starter/src/test/java/org/apache/skywalking/oap/server/starter/UITemplateCheckerTest.java`	Test now tolerates missing template folders to match initializer behavior.
`oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateManagementService.java`	Adds `addOrReplace` to support force-reload behavior.
`oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java`	Auto-discovers folders from `Layer.values()` and supports force reload via env var.
`docs/en/swip/readme.md`	Updates next SWIP number and links SWIP-12.
`docs/en/swip/SWIP-12.md`	New SWIP-12 design proposal document.
`.claude/skills/new-monitoring-feature/SKILL.md`	New contributor “wiring map” for adding a monitoring feature/layer.

Comments suppressed due to low confidence (2)

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java:93

folder.toLowerCase() uses the JVM default locale, which can produce incorrect folder names under certain locales (e.g., Turkish 'IOS' -> 'ıos') and make shipped templates undiscoverable. Please lower-case using Locale.ROOT (or precompute UI_TEMPLATE_FOLDER as already-lowercased with Locale.ROOT).

                File[] templateFiles = ResourceUtils.getPathFiles("ui-initialized-templates/" + folder.toLowerCase());
                for (File file : templateFiles) {

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java:98

With UI_TEMPLATE_FOLDER now containing every Layer, ResourceUtils.getPathFiles(...) will throw/catch a FileNotFoundException once per missing template folder on every boot. Since exceptions are relatively expensive (stack trace capture) and this is expected control flow, consider checking ClassLoader.getResource(path) (or adding a non-throwing helper in ResourceUtils) and skipping when null instead of relying on exceptions.

        for (String folder : UITemplateInitializer.UI_TEMPLATE_FOLDER) {
            try {
                File[] templateFiles = ResourceUtils.getPathFiles("ui-initialized-templates/" + folder.toLowerCase());
                for (File file : templateFiles) {
                    initTemplate(file);
                }
            } catch (FileNotFoundException e) {
                log.debug("No such folder of path: {}, skipping loading UI templates", folder);
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

wu-sheng and others added 9 commits April 20, 2026 17:02

wu-sheng requested a review from Copilot April 20, 2026 09:16

Copilot started reviewing on behalf of wu-sheng April 20, 2026 09:17 View session

Copilot AI reviewed Apr 20, 2026

View reviewed changes

Comment thread docs/en/swip/SWIP-12.md Outdated

Update docs/en/swip/SWIP-12.md

82ce54d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

wu-sheng added backend OAP backend related. enhancement Enhancement on performance or codes labels Apr 20, 2026

wu-sheng added this to the 10.5.0 milestone Apr 20, 2026

wankai123 approved these changes Apr 20, 2026

View reviewed changes

wu-sheng merged commit 681a5be into master Apr 20, 2026
626 of 633 checks passed

wu-sheng deleted the feature/swip12-miniprogram-and-ui-template-hot-reload branch April 20, 2026 11:03

wu-sheng mentioned this pull request Apr 20, 2026

Implement SWIP-12: WeChat & Alipay Mini Program monitoring #13835

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SWIP-12 design + UITemplateInitializer auto-discovery & dev hot-reload#13834

SWIP-12 design + UITemplateInitializer auto-discovery & dev hot-reload#13834
wu-sheng merged 10 commits intomasterfrom
feature/swip12-miniprogram-and-ui-template-hot-reload

wu-sheng commented Apr 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wu-sheng commented Apr 20, 2026

SWIP-12 design + UITemplateInitializer extensibility

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants