fix: use UTF-8 to solve Chineses bug #3792

mengnankkkk · 2025-09-28T04:32:47Z

What's changed?

Incorrect UTF-8 validation logic: The validation in OnlineParser.java: lines 419-421 incorrectly identifies valid Chinese strings as invalid.
Byte stream processing issue: The original parser reads the InputStream byte by byte and cannot correctly handle UTF-8 multi-byte characters (Chinese characters occupy 3 bytes).
Close #3791

Change the byte-by-byte parsing based on InputStream to UTF-8 string parsing
add test for this change

Checklist

I have read the Contributing Guide
I have written the necessary doc or comment.
I have added the necessary unit tests and all cases have passed.

Add or update API

I have added the necessary e2e tests and all cases have passed.

Duansg · 2025-09-28T14:01:19Z

Hi, @mengnankkkk thank you very much for your revisions to this issue. I believe the minimal change required should only address the tag value.

After reviewing the officialdocumentation, it appears that only label value currently exhibits this issue. I believe we can utilize UTF-8 parsing when necessary to avoid unnecessary encoding conversion overhead.

For example, the adopted solution: Fast ASCII validation + UTF-8 fallback mechanism

mengnankkkk · 2025-09-28T14:11:57Z

您好，非常感谢您对此问题的修改。我认为所需的最小修改应该只涉及标签值。

查阅官方文档后，似乎只有 label value 目前存在此问题。我认为我们可以在必要时使用 UTF-8 解析，以避免不必要的编码转换开销。

例如，采用的解决方案：快速 ASCII 验证 + UTF-8 回退机制

Thank you for your suggestion, I will make changes according to your suggestion.

- Replace strict UTF-8 validation with performance-optimized approach - ASCII characters (0-127) use fast path without UTF-8 conversion overhead - Non-ASCII characters use UTF-8 fallback mechanism for proper validation - Support Chinese and other Unicode characters in Prometheus label values - Add comprehensive UTF-8 multi-byte character parsing in parseLabelValue - Add test case for Chinese label values validation - Maintain full backward compatibility with existing functionality Resolves issue with Chinese characters in Prometheus metrics label values Performance improvement: zero overhead for ASCII-only label values

mengxin523 · 2025-09-29T00:54:09Z

Yes, I believe using UTF-8 can fully meet the requirement of supporting all languages worldwide for label values! Moreover, I suggest that this setting should not be limited only to label values; in this way, non-English content can be used in many other scenarios, achieving full universality and getting it right in one go.

Duansg · 2025-09-29T14:09:53Z

Yes, I believe using UTF-8 can fully meet the requirement of supporting all languages worldwide for label values! Moreover, I suggest that this setting should not be limited only to label values; in this way, non-English content can be used in many other scenarios, achieving full universality and getting it right in one go.

The Prometheus specification explicitly stipulates that metric names and label names must only use [a-zA-Z0-9_:] and does not support Chinese characters. If we were to support Chinese metric names in Hertzbeat, it would cause exported data to become unrecognizable by ecosystem tools like Prometheus/Grafana, thereby breaking compatibility.

A more reasonable approach is to maintain standardization while implementing Chinese-friendly features through label value, metadata mapping, or the frontend presentation layer. This approach adheres to the Prometheus specification while also meeting users' need for Chinese readability.

tomsun28

LGTM! 👍

tomsun28 · 2025-10-08T11:48:58Z

Yes, I believe using UTF-8 can fully meet the requirement of supporting all languages worldwide for label values! Moreover, I suggest that this setting should not be limited only to label values; in this way, non-English content can be used in many other scenarios, achieving full universality and getting it right in one go.

The Prometheus specification explicitly stipulates that metric names and label names must only use [a-zA-Z0-9_:] and does not support Chinese characters. If we were to support Chinese metric names in Hertzbeat, it would cause exported data to become unrecognizable by ecosystem tools like Prometheus/Grafana, thereby breaking compatibility.

A more reasonable approach is to maintain standardization while implementing Chinese-friendly features through label value, metadata mapping, or the frontend presentation layer. This approach adheres to the Prometheus specification while also meeting users' need for Chinese readability.

Sorry, I missed this. Do we need to discuss this PR again?

Duansg · 2025-10-09T13:00:14Z

Yes, I believe using UTF-8 can fully meet the requirement of supporting all languages worldwide for label values! Moreover, I suggest that this setting should not be limited only to label values; in this way, non-English content can be used in many other scenarios, achieving full universality and getting it right in one go.

The Prometheus specification explicitly stipulates that metric names and label names must only use [a-zA-Z0-9_:] and does not support Chinese characters. If we were to support Chinese metric names in Hertzbeat, it would cause exported data to become unrecognizable by ecosystem tools like Prometheus/Grafana, thereby breaking compatibility.
A more reasonable approach is to maintain standardization while implementing Chinese-friendly features through label value, metadata mapping, or the frontend presentation layer. This approach adheres to the Prometheus specification while also meeting users' need for Chinese readability.

Sorry, I missed this. Do we need to discuss this PR again?

Hi @tomsun28, yes, i believe there are still some issues in this PR that require priority attention, such as:

Excessive object creation and unnecessary boxing/unboxing.
Incorrect exception fallback strategy.
Redundant inspections and existing performance issues.

I've addressed them in #3810. Please review.

Duansg mentioned this pull request Sep 28, 2025

Regarding Prometheus monitoring, it seems that it does not support Chinese characters as label values. #3791

Closed

mengnankkkk force-pushed the feat-mengnankkbug branch from cbe1247 to 1fe1830 Compare September 28, 2025 14:35

mengnankkkk and others added 2 commits October 4, 2025 23:11

Merge branch 'master' into feat-mengnankkbug

441c029

Merge branch 'master' into feat-mengnankkbug

e3fbad5

tomsun28 added the good first pull request Good for newcomers label Oct 5, 2025

github-project-automation bot added this to Apache HertzBeat Oct 5, 2025

github-project-automation bot moved this to To do in Apache HertzBeat Oct 5, 2025

tomsun28 added the bugfix label Oct 5, 2025

tomsun28 added this to the 1.8.0 milestone Oct 5, 2025

tomsun28 approved these changes Oct 5, 2025

View reviewed changes

tomsun28 merged commit 32d784e into apache:master Oct 5, 2025
3 checks passed

github-project-automation bot moved this from To do to Done in Apache HertzBeat Oct 5, 2025

mengnankkkk deleted the feat-mengnankkbug branch October 5, 2025 01:18

Duansg mentioned this pull request Oct 9, 2025

[refactor]Refactoring prometheus label value utf8 support #3810

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: use UTF-8 to solve Chineses bug #3792

fix: use UTF-8 to solve Chineses bug #3792

Uh oh!

mengnankkkk commented Sep 28, 2025 •

edited

Loading

Uh oh!

Duansg commented Sep 28, 2025

Uh oh!

mengnankkkk commented Sep 28, 2025

Uh oh!

mengxin523 commented Sep 29, 2025

Uh oh!

Duansg commented Sep 29, 2025 •

edited

Loading

Uh oh!

tomsun28 left a comment

Uh oh!

Uh oh!

tomsun28 commented Oct 8, 2025

Uh oh!

Duansg commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: use UTF-8 to solve Chineses bug #3792

fix: use UTF-8 to solve Chineses bug #3792

Uh oh!

Conversation

mengnankkkk commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's changed?

Checklist

Add or update API

Uh oh!

Duansg commented Sep 28, 2025

Uh oh!

mengnankkkk commented Sep 28, 2025

Uh oh!

mengxin523 commented Sep 29, 2025

Uh oh!

Duansg commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomsun28 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomsun28 commented Oct 8, 2025

Uh oh!

Duansg commented Oct 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mengnankkkk commented Sep 28, 2025 •

edited

Loading

Duansg commented Sep 29, 2025 •

edited

Loading