[fix](metric) Preserve labels for histogram metrics to fix wrong metric name for prometheus#63485
[fix](metric) Preserve labels for histogram metrics to fix wrong metric name for prometheus#63485seawinde wants to merge 3 commits into
Conversation
### What problem does this PR solve? Issue Number: N/A Related PR: apache#13832 Problem Summary: Histogram metrics that encode labels in the Dropwizard metric name can be rendered incorrectly when label values contain dots or other special characters. For example, a user value like qing.lu@lbk.one can be split as part of the metric name instead of being kept as the user label. Root cause: In PrometheusMetricVisitor.visitHistogram() and JsonMetricVisitor.visitHistogram(), the legacy histogram export path splits the Dropwizard histogram name by dots and infers labels from k=v name segments. MetricRepo.USER_HISTO_QUERY_LATENCY previously embedded the user label into the histogram name, so dots inside user names were treated as metric-name separators. Change Summary: | File | Change Description | |------|--------------------| | fe/fe-core/src/main/java/org/apache/doris/metric/HistogramMetric.java | Add a small wrapper that keeps the Dropwizard Histogram together with its metric name and structured labels. | | fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java | Export labeled histogram metrics through the new structured label path while keeping existing static histograms exported. | | fe/fe-core/src/main/java/org/apache/doris/metric/CloudMetrics.java | Keep cloud query latency and meta-service RPC latency labels structured instead of encoding them in histogram names. | | fe/fe-core/src/main/java/org/apache/doris/metric/MetricVisitor.java | Add a labeled histogram visit overload. | | fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java | Render structured histogram labels directly in Prometheus output. | | fe/fe-core/src/main/java/org/apache/doris/metric/JsonMetricVisitor.java | Render structured histogram labels directly in JSON output. | | fe/fe-core/src/test/java/org/apache/doris/metric/MetricsTest.java | Cover user and cloud histogram labels containing dots and at signs. | Design Rationale: The fix follows the existing counter and gauge approach: keep metric names stable and carry dimensions as MetricLabel values. The legacy three-argument histogram visitor remains for existing static Dropwizard histograms without labels, while dynamic histograms that need labels use the structured four-argument path. ```mermaid graph TD A[Query finished] --> B[AuditLogHelper.updateMetricsImpl] B --> C[USER_HISTO_QUERY_LATENCY.getOrAdd(user)] C --> D[HistogramMetric name + Histogram + labels] D --> E[MetricRepo.visitHistograms] E --> F[PrometheusMetricVisitor / JsonMetricVisitor] F --> G[Stable metric name with structured labels] ``` ### Release note Fixed an issue where FE histogram metrics with label values containing dots, such as user names or cloud cluster names, could be exported with malformed metric names or labels. ### Check List (For Author) - Test: Unit Test - Maven package for fe-core passed with tests skipped to verify compile and checkstyle. - org.apache.doris.metric.MetricsTest passed manually, OK (12 tests). - ./run-fe-ut.sh --run org.apache.doris.metric.MetricsTest was attempted but blocked by an unrelated testCompile classpath issue where CloudProcVersionDisplayTest imports mockit. - Behavior changed: No. This restores intended metric label rendering. - Does this need documentation: No. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 31667 ms |
TPC-DS: Total hot run time: 170399 ms |
FE Regression Coverage ReportIncrement line coverage |
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Avoid silently routing labeled histogram visitors through the legacy histogram name parser. Keep SimpleCoreMetricVisitor behavior explicit and document why meta-service RPC latency is not removed by cluster.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran `mvn -pl fe-core -am -DskipTests -Dskip.java-formatter -DskipRat -Dspotless.check.skip=true compile` from `fe/` successfully.
- Tried `mvn -pl fe-core -am -Dtest=org.apache.doris.metric.MetricsTest -DfailIfNoTests=false -Dskip.java-formatter -DskipRat -Dspotless.check.skip=true test`, but it failed in testCompile because `CloudProcVersionDisplayTest` could not resolve `mockit`, before running the target test.
- Behavior changed: No
- Does this need documentation: No
|
run buildall |
TPC-H: Total hot run time: 31047 ms |
TPC-DS: Total hot run time: 168953 ms |
FE Regression Coverage ReportIncrement line coverage |
| import java.util.Collections; | ||
| import java.util.List; | ||
|
|
||
| public class HistogramMetric { |
There was a problem hiding this comment.
maybe extending from com.codahale.metrics.Histogram is better way
### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#13832 Problem Summary: Dynamic histogram metrics with structured labels were exported by manually enumerating each AutoMappedMetric in MetricRepo.visitHistograms(). This made future histogram metrics easy to miss and kept the registration model different from regular Doris metrics. Register labeled histogram metric groups in DorisMetricRegistry with stable group keys, preserve the existing legacy Dropwizard histogram export path, and keep cloud-only histogram export guarded by Config.isCloudMode. Re-registering the same histogram group now replaces the old group instead of exporting duplicates. ### Release note None ### Check List (For Author) - Test: Unit Test - mvn -pl fe-core -am -DskipTests -Dskip.java-formatter -DskipRat -Dspotless.check.skip=true compile - ./run-fe-ut.sh --run org.apache.doris.metric.MetricsTest did not reach test execution because existing CloudProcVersionDisplayTest testCompile fails due to missing mockit dependency - Behavior changed: No - Does this need documentation: No
|
run buildall |
TPC-H: Total hot run time: 31584 ms |
TPC-DS: Total hot run time: 169462 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: N/A
Related PR: #13832
Problem Summary:
Histogram metrics that encode labels in the Dropwizard metric name can be rendered incorrectly when label values contain dots or other special characters. For example, a user value like
qing.lu@lbk.onecan be split as part of the metric name instead of being kept as theuserlabel.Root cause: In
PrometheusMetricVisitor.visitHistogram()andJsonMetricVisitor.visitHistogram(), the legacy histogram export path splits the Dropwizard histogram name by dots and infers labels fromk=vname segments.MetricRepo.USER_HISTO_QUERY_LATENCYpreviously embedded the user label into the histogram name, so dots inside user names were treated as metric-name separators.Change Summary:
fe/fe-core/src/main/java/org/apache/doris/metric/HistogramMetric.javaHistogramtogether with its metric name and structured labels.fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.javafe/fe-core/src/main/java/org/apache/doris/metric/CloudMetrics.javafe/fe-core/src/main/java/org/apache/doris/metric/MetricVisitor.javafe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.javafe/fe-core/src/main/java/org/apache/doris/metric/JsonMetricVisitor.javafe/fe-core/src/test/java/org/apache/doris/metric/MetricsTest.javaDesign Rationale:
The fix follows the existing counter and gauge approach: keep metric names stable and carry dimensions as
MetricLabelvalues. The legacy three-argument histogram visitor remains for existing static Dropwizard histograms without labels, while dynamic histograms that need labels use the structured four-argument path.graph TD A["Query finished"] --> B["Update query metrics"] B --> C["Get or create per-user histogram"] C --> D["HistogramMetric stores name, histogram, and labels"] D --> E["MetricRepo visits histogram metrics"] E --> F["Prometheus or JSON visitor renders output"] F --> G["Stable metric name with structured labels"]Release note
Fixed an issue where FE histogram metrics with label values containing dots, such as user names or cloud cluster names, could be exported with malformed metric names or labels.
Check List (For Author)
Unit tests:
Maven package for
fe-corepassed with tests skipped to verify compile and checkstyle.org.apache.doris.metric.MetricsTestpassed manually,OK (12 tests)../run-fe-ut.sh --run org.apache.doris.metric.MetricsTestwas attempted but blocked by an unrelated testCompile classpath issue whereCloudProcVersionDisplayTestimportsmockit.Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)