fix(telemetry): use {feature}_{name} composite key as internal metric identifier#35019
fix(telemetry): use {feature}_{name} composite key as internal metric identifier#35019
Conversation
… identifier #35010 Reverts TotalContentsDatabaseMetricType.getName() from "COUNT_CONTENT" back to "COUNT" to restore the external telemetry wire format expected by the downstream reporting layer (metric_stat composite type keys on lower(feature)_lower(name)). Adds getQualifiedName() default method to MetricType that returns FEATURE_NAME (e.g., CONTENTLETS_COUNT) for internal uniqueness — used in caching, map keys, name filtering, and i18n lookups. This eliminates the naming collision between contentlets and languages metrics (both returned "COUNT") without changing the wire format. Removes the getMappedMetricName() disambiguation hack in UsageResource that manually remapped "COUNT" to "COUNT_CONTENT" or "COUNT_LANGUAGES" based on feature enum. Updates all i18n keys to use qualified names. Closes #34042 Fixes #35002 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Claude finished @spbolton's task in 2m 5s —— View job Rollback Safety Re-Analysis (PR Synchronize)
FindingsNo new rollback-unsafe categories introduced by the new commits. The existing M-3 finding from the prior analysis remains the only applicable category. Category: M-3 — REST API Contract Change Why it's unsafe: The
If N is deployed and then rolled back to N-1, the API reverts to the old name values. Any browser that cached N's Angular bundle (which reads Code that makes it unsafe:
Mitigating factors acknowledged:
The rollback risk is limited to the browser-cache window where a client holds N's JS bundle against N-1's API. Normal cache expiry resolves it without ops intervention. Despite these mitigations, the API contract change still matches M-3 per the reference document. Alternative: Maintain both old and new keys in All other categories were checked and are clear:
|
…esource The non-numeric stats path was using metric.getName() as the map key (e.g., "LAST_LOGIN") but buildMetricsByCategory looks them up by getQualifiedName() (e.g., "USERS_LAST_LOGIN"). Keys never matched so non-numeric metrics like LAST_LOGIN and LAST_CONTENT_EDITED were silently dropped from the dashboard. Fix by adding getNotNumericMetricValues() to MetricsSnapshot to expose the raw MetricValue collection with full Metric metadata, then computing the qualified key the same way as the numeric path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, test - DashboardMetricsProvider: release CreationalContext after resolving each bean reference to avoid leaking dependent objects during CDI resolution - DashboardMetricsProvider: track annotated count separately so the log message correctly shows "X annotated, Y match profile" instead of the same value twice - UsageResourceIntegrationTest: testGetSummary_withDefaultProfile now compares default vs explicit MINIMAL results instead of passing "MINIMAL" for both calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Pull Request Unsafe to Rollback!!!
|
It is great that this was found, but I don't think in this case we need to worry about the backwards compatibility as long as the frontend is internally consistent. It is only the usage dashboard currently tied to this api. Note that other than the tests, the actual names/labels are read by the ui and used as is, so it is not actually referencing individual metrics at all, UI currently will handle the name change invisibly. |
yolabingo
left a comment
There was a problem hiding this comment.
I don't know what a "bean" is but, beyond that, LGTM
…er contract
The PR changed getStats() to filter by getQualifiedName() ({FEATURE}_{name}),
but MetricTimeoutTest still passed the raw name. No metrics were selected,
pg_sleep never ran, and the snapshot had no errors — causing both assertions to fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
TotalContentsDatabaseMetricType.getName()from"COUNT_CONTENT"back to"COUNT"— restorescontentlets_countin downstream telemetry reportingMetricType.getQualifiedName()returning{FEATURE}_{NAME}for internal uniqueness (caching, map keys, i18n)getMappedMetricName()disambiguation hack inUsageResourceCONTENTLETS_COUNT,LANGUAGES_COUNT)Context
PR #34479 renamed the contentlets metric from
"COUNT"to"COUNT_CONTENT"to resolve a naming collision with the languages metric. This broke the downstream telemetry reporting layer — themetrics_table_viewin the telemetry database derives keys aslower(feature) || '_' || lower(name)and maps them into a hardcodedmetric_statPostgreSQL composite type. The rename changed the derived key fromcontentlets_counttocontentlets_count_content, which has no matching column and is silently dropped byjson_populate_record.Production impact:
contentlets_counthas been NULL for 145 of 153 cloud clients since ~Jan 7, 2026 (~10 weeks of missing data). This blocks team bonus metrics and customer health reporting.The fix introduces
getQualifiedName()as the internal unique identifier while keeping the wire format unchanged, so the external reporting layer continues to work without modification.Changes
MetricType.javadefault getQualifiedName()returningFEATURE_NAMETotalContentsDatabaseMetricType.javagetName()from"COUNT_CONTENT"→"COUNT"MetricStatsCollector.javagetQualifiedName()for cache key, name filter, logging, timingUsageResource.javagetMappedMetricName()hack; use qualified names for map keysDashboardMetricsProvider.javagetDashboardMetricByName()matches on qualified nameLanguage*.properties(9 files)UsageResourceIntegrationTest.javaTest plan
contentlets_countcolumn is populated inmetrics_table_viewafter deploymentUsageResourceIntegrationTest— all assertions updated for qualified names"name": "COUNT"with"feature": "CONTENTLETS"Fixes #35010
Closes #34042
Fixes #35002
🤖 Generated with Claude Code