Skip to content

SWIP-12 design + UITemplateInitializer auto-discovery & dev hot-reload#13834

Merged
wu-sheng merged 10 commits intomasterfrom
feature/swip12-miniprogram-and-ui-template-hot-reload
Apr 20, 2026
Merged

SWIP-12 design + UITemplateInitializer auto-discovery & dev hot-reload#13834
wu-sheng merged 10 commits intomasterfrom
feature/swip12-miniprogram-and-ui-template-hot-reload

Conversation

@wu-sheng
Copy link
Copy Markdown
Member

SWIP-12 design + UITemplateInitializer extensibility

This PR bundles three related changes that together unlock future mini-program monitoring work:

1. SWIP-12 design doc (docs/en/swip/SWIP-12.md)
Proposes WeChat & Alipay Mini-Program monitoring as a new pair of Layer values. Covers SDK alignment (histogram bucket unit), native-trace SegmentListener SPI, entity model, layer partitioning, dashboards layout, and MAL/OAL scope split. Still a design-only proposal — implementation lands in follow-up PRs.

2. UITemplateInitializer extensibility + dev hot-reload

  • UI_TEMPLATE_FOLDER is now computed from Layer.values() + "custom" at class-init time. Adding a new Layer enum value is enough — drop a ui-initialized-templates/<layer-name-lowercased>/ folder on disk and it's scanned on the next boot. Removes the prior hardcoded allowlist that was easy to miss.
  • SW_UI_TEMPLATE_FORCE_RELOAD env var switches the initializer from addIfNotExist to a new addOrReplace helper on UITemplateManagementService. When true, shipped templates overwrite any seeded copy every boot — so dev/extension edits show up after a simple OAP restart without wiping storage. Unset / false preserves the production behavior where operator UI edits persist.
  • UITemplateCheckerTest updated to tolerate missing folders (several Layer values have no template folders today).

3. new-monitoring-feature skill (.claude/skills/new-monitoring-feature/SKILL.md)
A wiring map for contributors adding a new layer: which extension point handles which signal (OAL / MAL / LAL / SpanListener / SegmentListener), where contracts live, UI template + submodule touchpoints, and cross-cutting traps.

  • If this is non-trivial feature, paste the links/URLs to the design doc. — docs/en/swip/SWIP-12.md

  • Update the documentation to include this new feature. — SWIP doc + readme index updated

  • Tests(including UT, IT, E2E) are added to verify the new feature. — UITemplateCheckerTest updated for auto-discovery

  • If it's UI related, attach the screenshots below. — N/A; no UI dashboards land in this PR (follow-up)

  • If this pull request closes/resolves/fixes an existing issue, replace the issue number. Closes #.

  • Update the CHANGES log. — deferred until mini-program monitoring implementation lands

wu-sheng and others added 9 commits April 20, 2026 17:02
Proposes WECHAT_MINI_PROGRAM (48) / ALIPAY_MINI_PROGRAM (49) layers driven
by mini-program-monitor SDK v0.3+ (OTLP + SkyWalking native segments).
Reuses LAL layer:auto + sourceAttribute() from SWIP-11 and componentId-based
layer mapping in CommonAnalysisListener.getLayer() — no new SPI. Reserves
JS componentIds 10002 (WeChat) / 10003 (AliPay), already shipping in the
SDK. Showcase data generator consumes the SDK's published sim-wechat /
sim-alipay GHCR images directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SDK alignment:
- Bump recommended SDK to ≥ v0.4.0 (clean serviceInstance default)
- Pin showcase sim images to v0.4.0; add SERVICE_INSTANCE env to mirror
  SDK recommendation
- Reword Mini Program Setup comment from "workaround" to "v0.4.0 recommendation"
- §8 status table: serviceInstance change is shipped, not pending

Correctness fixes from re-reading:
- §4 MAL: switch to chained .endpoint([...], [...], Layer.X) for per-page
  metrics (matches APISIX/RocketMQ pattern); replace incorrect "label
  normalization maps service.version → service_instance_id" claim with
  the actual behavior (agent sets it; OAP uses literal "-" if absent)
- §6: correct method to protected Layer identifyServiceLayer(SpanLayer)
  on the abstract base — was wrongly named getLayer and described as static
- §9 Dashboard: reword trace widget — "service list filtered by layer;
  trace widget shows in-scope service's traces" (filter is at service-list
  level, not on the trace widget)
- §Limitations: drop stale miniprogram.device span tag reference (SDK
  v0.4.0 dropped device id entirely; tag was never shipped)
- §Compatibility: document that OAP records literal "-" instance entity
  when SDK serviceInstance is unset

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…v vars, instance coherence

Four review issues fixed against ground-truth source:

- §4 MAL: stop fragmenting service-scoped meters by service_instance_id.
  Now follows the iOS pattern (otel-rules/ios/ios-metrickit.yaml +
  ios-metrickit-instance.yaml): four files (per-platform × per-scope),
  service-scoped meters key on service_name only, instance-scoped meters
  go in their own file with expSuffix: instance(...). This makes the
  "overall app health" service view genuinely fleet-aggregated and
  provides the metrics behind the per-instance dashboard.

- §9 Dashboards: UITemplateInitializer requires layer-name folders
  (Layer.X.name().toLowerCase()) and an entry in the hard-coded
  UI_TEMPLATE_FOLDER allowlist. Hyphenated folders (wechat-mini-program/)
  are silently skipped. Specify wechat_mini_program/ /
  alipay_mini_program/ folders + UI_TEMPLATE_FOLDER appends.

- §General usage: fix env var names to match application.yml —
  SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES (was incorrectly
  SW_OTEL_RECEIVER_RULES). Spell out that mini-program rules must be
  appended to the existing default lists for both OTEL metrics rules
  and lalFiles, not replace them.

- §2 + §5: instance identity made coherent across signals. SDK only emits
  OTLP service.instance.id when operator sets serviceInstance, and
  segments use serviceInstance || "-". LAL extractor changed from
  sourceAttribute("service.version") to sourceAttribute("service.instance.id")
  so logs/metrics/traces all key off the same value. Added explicit
  "Instance coherence" subsection documenting how the three pipelines
  align (and what happens when serviceInstance is unset).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d prefix coherence, component lookup via catalog service

Three review issues fixed against ground-truth source:

- §2 "Instance coherence" + §Compatibility: stop overstating that
  unset serviceInstance lands as the same literal "-" across all signals.
  Verified per pipeline:
  - Segments: SDK substitutes "-" (request.ts:147)
  - Logs: TrafficSinkListener:83 short-circuits when serviceInstance
    is empty — no instance traffic at all
  - Metrics: SampleFamily.dim() collapses missing labels to "" — no
    instance entity built
  Per-pipeline behavior table now shown explicitly. The unset case is
  not "still consistent in that no view is meaningful" hand-waving;
  segments produce a "-" entity while logs/metrics produce nothing.
  Operators must set serviceInstance for instance-level dashboards to
  be populated; otherwise the three pipelines diverge.
  Inline LAL comment updated to match.

- §9 Dashboards: metric names now use the per-platform prefixes from
  §4 (meter_wechat_mp_*, meter_wechat_mp_instance_*, etc.) instead of
  the stale meter_miniprogram_* prefix. Dropped first_paint_time from
  the dashboard table since §3/§4 explicitly exclude it from MAL
  aggregation. Per-platform dashboard tables shown separately so
  WeChat-only navigation panels don't appear under Alipay.

- §6 Trace layer mapping: rewrite to use IComponentLibraryCatalogService
  for component-name → id resolution instead of fictional
  ComponentsDefine.WECHAT_MINI_PROGRAM constants. component-libraries.yml
  is the single source of truth — there are no auto-generated Java
  constants for component IDs (verified against
  ComponentLibraryCatalogService.java:75-104). Listener constructor
  resolves the two ids once via catalog.getComponentId("WeChat-MiniProgram")
  / "AliPay-MiniProgram" and caches as int fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ance regression, listener subclass list

Three review issues fixed against ground-truth source:

- §9 Dashboards: add the layer-root template requirement. Layer.vue:41-44
  requires a dashboard with isRoot: true to render the menu landing page
  (precedent: ios/ios-root.json). Without it, clicking Mobile > WeChat
  Mini Program shows an empty "no dashboard" view. Both
  wechat_mini_program-root.json and alipay_mini_program-root.json now
  listed in the folder layout.

- §4 + §General usage: drop the residual "literal -" / "everything
  aggregates under -" wording. §2 was already corrected; §4 Notes
  bullet and the WeChat init example comment still asserted the
  uniform-fallback story. Replaced with the verified per-pipeline
  behavior: Analyzer.java:345 (instance traffic only emitted when
  non-empty) + SampleFamily.dim() (collapses missing labels to "")
  mean OTLP metrics produce no instance entity at all when
  serviceInstance is unset. Only segments substitute "-" at the SDK
  wire.

- §6 Trace layer mapping: correct the listener subclass list.
  CommonAnalysisListener is extended only by RPCAnalysisListener and
  EndpointDepFromCrossThreadAnalysisListener (verified via grep).
  SegmentAnalysisListener has its own service-meta path and does not
  extend the base. Doesn't change the design surface (still 5 call
  sites in 2 files) but clarifies which classes are touched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Technical-correctness fixes (would produce broken metrics in prod):
- §4 MAL: document the ms-histogram bucket unit trap. SDK emits le in
  ms; MAL default assumes seconds and rescales ×1000. Without
  compensation, request-latency percentiles come out 1000× too large
  (same trap that bit MetricKit in SWIP-11). Two implementation paths
  documented: OTLP receiver honoring Metric.unit, or a targeted
  preprocessor.
- §4 MAL: document the +Inf overflow bucket rendering as ~9.2e18 in
  UI. Needs finite sentinel (SDK-side bound update to 30s, or OAP-side
  preprocessor ceiling).

Design-clarification additions:
- §3a new subsection: enumerate the OAL / topology metrics that come
  for free from the componentId-driven layer assignment in §6. After
  RPCAnalysisListener emits Service/Instance/Endpoint/Relation sources
  with the right layer, core.oal produces service_cpm, service_resp_time,
  service_percentile, endpoint_*, plus outbound topology edges. Readers
  would otherwise think dashboards only have the §3 metric table.
- §3b new subsection: define where error_count actually comes from.
  A new log-MAL rule file log-mal-rules/miniprogram.yaml converts
  LAL-extracted error samples into per-(service, exception_type)
  counters. §5 LAL rule updated with a metrics {} block emitting the
  raw miniprogram_error_count sample.

Process / deliverables additions:
- §11 new: OAP-side e2e test case (test/e2e-v2/cases/miniprogram/
  {wechat,alipay}/), separate from the showcase demo generator.
  Drives sim images in MODE=once against the full OAP wiring. CI
  matrix entry in .github/workflows/skywalking.yaml required.
- §11 also: config-dump.yml mirror update required when application.yml
  defaults change (miniprogram/* in enabledOtelMetricsRules, miniprogram
  in lalFiles and malFiles).
- §12 new: Security Notice. Mini-program SDKs post from end-user devices
  on the public internet — same exposure profile as iOS / browser.
  Add a client-side-monitoring paragraph to docs/en/security/README.md.
- §13 new: Implementation Deliverables Checklist covering the two
  user-facing backend-*-mini-program-monitoring.md docs, docs/menu.yml
  entries, changes.md changelog, readme.md SWIP move, and booster-ui
  i18n PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… trap

Prior wording cast OAP's histogram-unit rescale as a silent trap needing
either an OTLP receiver enhancement or a per-feature preprocessor. That
was overstated.

Verified against source (SampleFamily.java:459-487,
PrometheusMetricConverter.java:78): OAP doesn't enforce a unit — it just
rescales le labels to ms using SampleFamily.defaultHistogramBucketUnit.
Default SECONDS matches Prometheus ecosystem convention, which is what
shipped rules assume. There is no silent-bug surface; it's a standard
coordination between source system and MAL rule.

SWIP-12 now specifies: the SDK should align miniprogram.request.duration
bounds to seconds convention (divide current ms bounds by 1000) in its
next release. Clean, single-line SDK change, no OAP-side plumbing added.

The +Inf overflow bucket note is retained but downgraded from "must
fix" to "low-risk dashboard-rendering concern; add a finite ceiling only
if outliers surface in practice."

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olders + dev force-reload

Motivation: adding a new feature/layer previously required two edits in UITemplateInitializer — the Layer enum plus appending to a hardcoded UI_TEMPLATE_FOLDER allowlist. Easy to miss the allowlist, silent failure (templates on disk but never loaded).  Editing shipped templates also required wiping the storage container for changes to take effect.

This commit collapses both pain points:

- UI_TEMPLATE_FOLDER is now computed once from Layer.values() + "custom"
  at class-init time. Drop a ui-initialized-templates/<layer-name-lowercased>/
  folder on disk and it's scanned on the next boot. Missing folders are
  silently skipped (same catch that existed before). Adding a new Layer
  enum value is enough; no second edit here.

- SW_UI_TEMPLATE_FORCE_RELOAD environment variable (read directly from
  System.getenv, not wired through application.yml) switches the
  initializer from addIfNotExist to a new addOrReplace helper on
  UITemplateManagementService. When true, shipped templates overwrite
  any previously seeded copy every boot — so dev/extension edits show
  up after a simple OAP restart. Unset / false preserves the default
  production behavior where operator UI edits persist across restarts.

Changes:
- UITemplateInitializer: dynamic UI_TEMPLATE_FOLDER, FORCE_RELOAD flag,
  branch on addIfNotExist vs addOrReplace.
- UITemplateManagementService: new addOrReplace(DashboardSetting) —
  addTemplate if absent, changeTemplate if present.
- UITemplateCheckerTest: tolerate missing folders (some Layer enum
  values — UNDEFINED, FAAS, CACHE, DATABASE, MQ, VIRTUAL_GATEWAY,
  GENAI — have no template folders today).
- SWIP-12 §9: drop the "append to UI_TEMPLATE_FOLDER allowlist" step;
  keep the folder-naming (Layer.name().toLowerCase() with underscores)
  and isRoot template requirements.

Verified: `mvn test -Dtest=UITemplateCheckerTest` passes with
auto-discovery covering all Layer enum values, skipping those
without on-disk template folders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Orients contributors to the Layer enum registration, extension-point
selection (OAL / MAL / LAL / SpanListener / SegmentListener),
UI template + submodule touchpoints, and cross-cutting traps
that don't live in any single extension-specific skill.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces SWIP-12 (mini-program monitoring design), improves UI template initialization by auto-discovering template folders from Layer.values(), and adds a dev-oriented hot-reload mode for shipped UI templates.

Changes:

  • Add SWIP-12 design doc and index it in the SWIP readme.
  • Auto-discover UI template folders from Layer enums (+ custom) and add SW_UI_TEMPLATE_FORCE_RELOAD to overwrite seeded templates on restart.
  • Update UITemplateCheckerTest to skip layers that don’t have an on-disk template folder.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
oap-server/server-starter/src/test/java/org/apache/skywalking/oap/server/starter/UITemplateCheckerTest.java Test now tolerates missing template folders to match initializer behavior.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateManagementService.java Adds addOrReplace to support force-reload behavior.
oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java Auto-discovers folders from Layer.values() and supports force reload via env var.
docs/en/swip/readme.md Updates next SWIP number and links SWIP-12.
docs/en/swip/SWIP-12.md New SWIP-12 design proposal document.
.claude/skills/new-monitoring-feature/SKILL.md New contributor “wiring map” for adding a monitoring feature/layer.
Comments suppressed due to low confidence (2)

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java:93

  • folder.toLowerCase() uses the JVM default locale, which can produce incorrect folder names under certain locales (e.g., Turkish 'IOS' -> 'ıos') and make shipped templates undiscoverable. Please lower-case using Locale.ROOT (or precompute UI_TEMPLATE_FOLDER as already-lowercased with Locale.ROOT).
                File[] templateFiles = ResourceUtils.getPathFiles("ui-initialized-templates/" + folder.toLowerCase());
                for (File file : templateFiles) {

oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/management/ui/template/UITemplateInitializer.java:98

  • With UI_TEMPLATE_FOLDER now containing every Layer, ResourceUtils.getPathFiles(...) will throw/catch a FileNotFoundException once per missing template folder on every boot. Since exceptions are relatively expensive (stack trace capture) and this is expected control flow, consider checking ClassLoader.getResource(path) (or adding a non-throwing helper in ResourceUtils) and skipping when null instead of relying on exceptions.
        for (String folder : UITemplateInitializer.UI_TEMPLATE_FOLDER) {
            try {
                File[] templateFiles = ResourceUtils.getPathFiles("ui-initialized-templates/" + folder.toLowerCase());
                for (File file : templateFiles) {
                    initTemplate(file);
                }
            } catch (FileNotFoundException e) {
                log.debug("No such folder of path: {}, skipping loading UI templates", folder);
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/en/swip/SWIP-12.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wu-sheng wu-sheng added backend OAP backend related. enhancement Enhancement on performance or codes labels Apr 20, 2026
@wu-sheng wu-sheng added this to the 10.5.0 milestone Apr 20, 2026
@wu-sheng wu-sheng merged commit 681a5be into master Apr 20, 2026
626 of 633 checks passed
@wu-sheng wu-sheng deleted the feature/swip12-miniprogram-and-ui-template-hot-reload branch April 20, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend OAP backend related. enhancement Enhancement on performance or codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants