Skip to content

fix(telemetry): use {feature}_{name} composite key as internal metric identifier#35019

Merged
spbolton merged 5 commits intomainfrom
fix/35010-telemetry-qualified-name
Mar 18, 2026
Merged

fix(telemetry): use {feature}_{name} composite key as internal metric identifier#35019
spbolton merged 5 commits intomainfrom
fix/35010-telemetry-qualified-name

Conversation

@spbolton
Copy link
Member

Summary

  • Reverts TotalContentsDatabaseMetricType.getName() from "COUNT_CONTENT" back to "COUNT" — restores contentlets_count in downstream telemetry reporting
  • Adds MetricType.getQualifiedName() returning {FEATURE}_{NAME} for internal uniqueness (caching, map keys, i18n)
  • Removes the getMappedMetricName() disambiguation hack in UsageResource
  • Updates all i18n keys to use qualified names (e.g., CONTENTLETS_COUNT, LANGUAGES_COUNT)

Context

PR #34479 renamed the contentlets metric from "COUNT" to "COUNT_CONTENT" to resolve a naming collision with the languages metric. This broke the downstream telemetry reporting layer — the metrics_table_view in the telemetry database derives keys as lower(feature) || '_' || lower(name) and maps them into a hardcoded metric_stat PostgreSQL composite type. The rename changed the derived key from contentlets_count to contentlets_count_content, which has no matching column and is silently dropped by json_populate_record.

Production impact: contentlets_count has been NULL for 145 of 153 cloud clients since ~Jan 7, 2026 (~10 weeks of missing data). This blocks team bonus metrics and customer health reporting.

The fix introduces getQualifiedName() as the internal unique identifier while keeping the wire format unchanged, so the external reporting layer continues to work without modification.

Changes

File Change
MetricType.java Add default getQualifiedName() returning FEATURE_NAME
TotalContentsDatabaseMetricType.java Revert getName() from "COUNT_CONTENT""COUNT"
MetricStatsCollector.java Use getQualifiedName() for cache key, name filter, logging, timing
UsageResource.java Remove getMappedMetricName() hack; use qualified names for map keys
DashboardMetricsProvider.java getDashboardMetricByName() matches on qualified name
Language*.properties (9 files) Update i18n keys to qualified name format
UsageResourceIntegrationTest.java Update assertions for qualified name keys

Test plan

  • Verify contentlets_count column is populated in metrics_table_view after deployment
  • Verify Usage Dashboard displays all metrics correctly with new qualified name i18n keys
  • Run UsageResourceIntegrationTest — all assertions updated for qualified names
  • Verify external telemetry payload sends "name": "COUNT" with "feature": "CONTENTLETS"

Fixes #35010
Closes #34042
Fixes #35002

🤖 Generated with Claude Code

… identifier #35010

Reverts TotalContentsDatabaseMetricType.getName() from "COUNT_CONTENT" back to
"COUNT" to restore the external telemetry wire format expected by the downstream
reporting layer (metric_stat composite type keys on lower(feature)_lower(name)).

Adds getQualifiedName() default method to MetricType that returns FEATURE_NAME
(e.g., CONTENTLETS_COUNT) for internal uniqueness — used in caching, map keys,
name filtering, and i18n lookups. This eliminates the naming collision between
contentlets and languages metrics (both returned "COUNT") without changing the
wire format.

Removes the getMappedMetricName() disambiguation hack in UsageResource that
manually remapped "COUNT" to "COUNT_CONTENT" or "COUNT_LANGUAGES" based on
feature enum. Updates all i18n keys to use qualified names.

Closes #34042
Fixes #35002

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Claude finished @spbolton's task in 2m 5s —— View job


Rollback Safety Re-Analysis (PR Synchronize)

  • Read rollback-unsafe categories reference
  • Get full PR diff
  • Analyze diff against all categories
  • Post result

Findings

No new rollback-unsafe categories introduced by the new commits. The existing M-3 finding from the prior analysis remains the only applicable category.


Category: M-3 — REST API Contract Change
Risk Level: 🟡 MEDIUM

Why it's unsafe: The /api/v1/usage response changes the name field value for every metric in the dashboard. All 17 metric keys are renamed from NAME-only format to FEATURE_NAME qualified format:

Old name New name
COUNT_CONTENT CONTENTLETS_COUNT
COUNT_LANGUAGES LANGUAGES_COUNT
COUNT_OF_SITES SITES_COUNT_OF_SITES
COUNT_OF_USERS USERS_COUNT_OF_USERS
… (13 more)

If N is deployed and then rolled back to N-1, the API reverts to the old name values. Any browser that cached N's Angular bundle (which reads name dynamically and uses it as an i18n key) would request usage.metric.CONTENTLETS_COUNT.label from N-1's backend, which only has usage.metric.COUNT_CONTENT.label — resulting in raw key strings shown to the user until the cache expires.

Code that makes it unsafe:

  • UsageResource.java line ~192: metricData.put("name", qualifiedName)qualifiedName is now CONTENTLETS_COUNT, previously COUNT_CONTENT
  • Language.properties lines 6408–6424: all 17 i18n keys renamed from usage.metric.COUNT_CONTENT.labelusage.metric.CONTENTLETS_COUNT.label (and equivalent for all other metrics)

Mitigating factors acknowledged:

  • The Angular frontend is updated in the same PR (dot-usage.service.ts, component specs) — so both sides are consistent at deployment time
  • The UI reads name dynamically and does not hardcode individual metric names — so no hard wiring breaks
  • This endpoint is only consumed by the internal Usage Dashboard, not a public headless API

The rollback risk is limited to the browser-cache window where a client holds N's JS bundle against N-1's API. Normal cache expiry resolves it without ops intervention. Despite these mitigations, the API contract change still matches M-3 per the reference document.

Alternative: Maintain both old and new keys in Language.properties for one release (the old keys could map to the same translated strings), so that a brief rollback window doesn't degrade the UI. This is low-effort and eliminates the cache-window risk entirely.


All other categories were checked and are clear:

  • C-1/C-2/C-3/C-4: No database migrations, no runonce tasks, no ES mapping changes, no DROP/RENAME operations
  • H-1 through H-7: No data transformations, no PK changes, no storage provider changes
  • M-1: No column type changes
  • M-2: No push publishing bundle format changes
  • M-4: No OSGi public interface changes

@github-actions github-actions bot added the Area : Backend PR changes Java/Maven backend code label Mar 17, 2026
spbolton and others added 2 commits March 17, 2026 18:40
…esource

The non-numeric stats path was using metric.getName() as the map key (e.g.,
"LAST_LOGIN") but buildMetricsByCategory looks them up by getQualifiedName()
(e.g., "USERS_LAST_LOGIN"). Keys never matched so non-numeric metrics like
LAST_LOGIN and LAST_CONTENT_EDITED were silently dropped from the dashboard.

Fix by adding getNotNumericMetricValues() to MetricsSnapshot to expose the
raw MetricValue collection with full Metric metadata, then computing the
qualified key the same way as the numeric path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, test

- DashboardMetricsProvider: release CreationalContext after resolving each
  bean reference to avoid leaking dependent objects during CDI resolution
- DashboardMetricsProvider: track annotated count separately so the log
  message correctly shows "X annotated, Y match profile" instead of the
  same value twice
- UsageResourceIntegrationTest: testGetSummary_withDefaultProfile now
  compares default vs explicit MINIMAL results instead of passing "MINIMAL"
  for both calls

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Mar 17, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-3 — REST API Contract Change
  • Risk Level: 🟡 MEDIUM
  • Why it's unsafe: The /api/v1/usage response changes the name field values for all usage dashboard metrics (e.g., COUNT_CONTENTCONTENTLETS_COUNT, COUNT_LANGUAGESLANGUAGES_COUNT), and the displayLabel i18n keys change format from usage.metric.COUNT_CONTENT.label to usage.metric.CONTENTLETS_COUNT.label. If N is deployed and then rolled back to N-1, the API reverts to returning the old name values. Any frontend or integration that adapted to N's new key format would break during the rollback window.
  • Code that makes it unsafe: dotCMS/src/main/java/com/dotcms/rest/api/v1/usage/UsageResource.java lines ~192 and ~247: metricData.put("name", qualifiedName) where qualifiedName is e.g. CONTENTLETS_COUNT — previously this was COUNT_CONTENT. Also dotCMS/src/main/webapp/WEB-INF/messages/Language.properties: all 18 i18n metric keys renamed (e.g. usage.metric.COUNT_CONTENT.labelusage.metric.CONTENTLETS_COUNT.label). No corresponding Angular frontend changes appear in this PR.
  • Alternative (if possible): Maintain backward compatibility by returning both the old name and the new qualifiedName as separate fields in the response for one release, then drop the old name in the following release. This allows the frontend and any consumers to migrate without a hard cut-over.

@spbolton
Copy link
Member Author

spbolton commented Mar 17, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-3 — REST API Contract Change
  • Risk Level: 🟡 MEDIUM
  • Why it's unsafe: The /api/v1/usage response changes the name field values for all usage dashboard metrics (e.g., COUNT_CONTENTCONTENTLETS_COUNT, COUNT_LANGUAGESLANGUAGES_COUNT), and the displayLabel i18n keys change format from usage.metric.COUNT_CONTENT.label to usage.metric.CONTENTLETS_COUNT.label. If N is deployed and then rolled back to N-1, the API reverts to returning the old name values. Any frontend or integration that adapted to N's new key format would break during the rollback window.
  • Code that makes it unsafe: dotCMS/src/main/java/com/dotcms/rest/api/v1/usage/UsageResource.java lines ~192 and ~247: metricData.put("name", qualifiedName) where qualifiedName is e.g. CONTENTLETS_COUNT — previously this was COUNT_CONTENT. Also dotCMS/src/main/webapp/WEB-INF/messages/Language.properties: all 18 i18n metric keys renamed (e.g. usage.metric.COUNT_CONTENT.labelusage.metric.CONTENTLETS_COUNT.label). No corresponding Angular frontend changes appear in this PR.
  • Alternative (if possible): Maintain backward compatibility by returning both the old name and the new qualifiedName as separate fields in the response for one release, then drop the old name in the following release. This allows the frontend and any consumers to migrate without a hard cut-over.

It is great that this was found, but I don't think in this case we need to worry about the backwards compatibility as long as the frontend is internally consistent. It is only the usage dashboard currently tied to this api. Note that other than the tests, the actual names/labels are read by the ui and used as is, so it is not actually referencing individual metrics at all, UI currently will handle the name change invisibly.

@github-actions github-actions bot added the Area : Frontend PR changes Angular/TypeScript frontend code label Mar 17, 2026
@spbolton spbolton enabled auto-merge March 17, 2026 21:14
Copy link
Member

@yolabingo yolabingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what a "bean" is but, beyond that, LGTM

…er contract

The PR changed getStats() to filter by getQualifiedName() ({FEATURE}_{name}),
but MetricTimeoutTest still passed the raw name. No metrics were selected,
pg_sleep never ran, and the snapshot had no errors — causing both assertions to fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@spbolton spbolton added this pull request to the merge queue Mar 18, 2026
Merged via the queue into main with commit 68e93ac Mar 18, 2026
45 checks passed
@spbolton spbolton deleted the fix/35010-telemetry-qualified-name branch March 18, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Not Safe To Rollback Area : Backend PR changes Java/Maven backend code Area : Frontend PR changes Angular/TypeScript frontend code

Projects

Status: No status

3 participants