-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: use UTF-8 to solve Chineses bug #3792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @mengnankkkk thank you very much for your revisions to this issue. I believe the minimal change required should only address the tag value. After reviewing the officialdocumentation, it appears that only For example, the adopted solution: Fast ASCII validation + UTF-8 fallback mechanism |
Thank you for your suggestion, I will make changes according to your suggestion. |
- Replace strict UTF-8 validation with performance-optimized approach - ASCII characters (0-127) use fast path without UTF-8 conversion overhead - Non-ASCII characters use UTF-8 fallback mechanism for proper validation - Support Chinese and other Unicode characters in Prometheus label values - Add comprehensive UTF-8 multi-byte character parsing in parseLabelValue - Add test case for Chinese label values validation - Maintain full backward compatibility with existing functionality Resolves issue with Chinese characters in Prometheus metrics label values Performance improvement: zero overhead for ASCII-only label values
cbe1247 to
1fe1830
Compare
|
Yes, I believe using UTF-8 can fully meet the requirement of supporting all languages worldwide for label values! Moreover, I suggest that this setting should not be limited only to label values; in this way, non-English content can be used in many other scenarios, achieving full universality and getting it right in one go. |
The Prometheus specification explicitly stipulates that metric names and label names must only use A more reasonable approach is to maintain standardization while implementing Chinese-friendly features through |
tomsun28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
Sorry, I missed this. Do we need to discuss this PR again? |
Hi @tomsun28, yes, i believe there are still some issues in this PR that require priority attention, such as:
I've addressed them in #3810. Please review. |
What's changed?
Incorrect UTF-8 validation logic: The validation in OnlineParser.java: lines 419-421 incorrectly identifies valid Chinese strings as invalid.
Byte stream processing issue: The original parser reads the InputStream byte by byte and cannot correctly handle UTF-8 multi-byte characters (Chinese characters occupy 3 bytes).
Close #3791
Change the byte-by-byte parsing based on InputStream to UTF-8 string parsing
add test for this change
Checklist
Add or update API