Skip to content

branch-4.1:[fix](metric) Preserve labels for histogram metrics to fix wrong metric name for prometheus (#63485)#63716

Open
seawinde wants to merge 1 commit into
apache:branch-4.1from
seawinde:pr_63485_to_branch-4.1
Open

branch-4.1:[fix](metric) Preserve labels for histogram metrics to fix wrong metric name for prometheus (#63485)#63716
seawinde wants to merge 1 commit into
apache:branch-4.1from
seawinde:pr_63485_to_branch-4.1

Conversation

@seawinde
Copy link
Copy Markdown
Member

pr: #63485
commitId: 01241a9

…ic name for prometheus (apache#63485)

Related PR: apache#13832

Problem Summary:
Histogram metrics that encode labels in the Dropwizard metric name can
be rendered incorrectly when label values contain dots or other special
characters. For example, a user value like `qing.lu@lbk.one` can be
split as part of the metric name instead of being kept as the `user`
label.

Root cause: In `PrometheusMetricVisitor.visitHistogram()` and
`JsonMetricVisitor.visitHistogram()`, the legacy histogram export path
splits the Dropwizard histogram name by dots and infers labels from
`k=v` name segments. `MetricRepo.USER_HISTO_QUERY_LATENCY` previously
embedded the user label into the histogram name, so dots inside user
names were treated as metric-name separators.

Change Summary:

| File | Change Description |
|------|--------------------|
|
`fe/fe-core/src/main/java/org/apache/doris/metric/HistogramMetric.java`
| Add a small wrapper that keeps the Dropwizard `Histogram` together
with its metric name and structured labels. |
| `fe/fe-core/src/main/java/org/apache/doris/metric/MetricRepo.java` |
Export labeled histogram metrics through the new structured label path
while keeping existing static histograms exported. |
| `fe/fe-core/src/main/java/org/apache/doris/metric/CloudMetrics.java` |
Keep cloud query latency and meta-service RPC latency labels structured
instead of encoding them in histogram names. |
| `fe/fe-core/src/main/java/org/apache/doris/metric/MetricVisitor.java`
| Add a labeled histogram visit overload. |
|
`fe/fe-core/src/main/java/org/apache/doris/metric/PrometheusMetricVisitor.java`
| Render structured histogram labels directly in Prometheus output. |
|
`fe/fe-core/src/main/java/org/apache/doris/metric/JsonMetricVisitor.java`
| Render structured histogram labels directly in JSON output. |
| `fe/fe-core/src/test/java/org/apache/doris/metric/MetricsTest.java` |
Cover user and cloud histogram labels containing dots and at signs. |

Design Rationale:
The fix follows the existing counter and gauge approach: keep metric
names stable and carry dimensions as `MetricLabel` values. The legacy
three-argument histogram visitor remains for existing static Dropwizard
histograms without labels, while dynamic histograms that need labels use
the structured four-argument path.

```mermaid
graph TD
    A["Query finished"] --> B["Update query metrics"]
    B --> C["Get or create per-user histogram"]
    C --> D["HistogramMetric stores name, histogram, and labels"]
    D --> E["MetricRepo visits histogram metrics"]
    E --> F["Prometheus or JSON visitor renders output"]
    F --> G["Stable metric name with structured labels"]
```

Fixed an issue where FE histogram metrics with label values containing
dots, such as user names or cloud cluster names, could be exported with
malformed metric names or labels.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@seawinde seawinde requested a review from yiguolei as a code owner May 27, 2026 03:52
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 71.43% (55/77) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants