Skip to content

Parameterize metric/label name validation scheme #11848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 25, 2025

Conversation

juliusmh
Copy link
Contributor

@juliusmh juliusmh commented Jun 24, 2025

What this PR does

Direct and indirect references to the global name validation scheme were removed in favor of a per-tenant override.

  • Distributors have a validation middleware that ensures metric and label names are valid.
  • Rulers validate rule(group)s using this naming scheme.
  • Queriers use UTF8 validation everywhere.

TODO:

  • Reach consensus on whether streamingpromqlcompat.NameValidatingEngine is the right approach: Decided to do UTF8 validation in query path.
  • Get Alertmanager updated with fix, stop using our own fork

Which issue(s) this PR fixes or relates to

Depends on

Fixes: #11503

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 3 times, most recently from e47a7e5 to 78ba72c Compare June 24, 2025 18:33
@@ -255,7 +256,8 @@ func (r RuleNamespace) Validate(groupNodes []rulefmt.RuleGroupNode) []error {
func ValidateRuleGroup(g rwrulefmt.RuleGroup, node rulefmt.RuleGroupNode) []error {
var errs []error
for i, r := range g.Rules {
for _, err := range r.Validate(node.Rules[i]) {
// TODO(juliusmh):
for _, err := range r.Validate(node.Rules[i], validation.LegacyNamingScheme) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: How should we handle validation checks in mimirtool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think about it after the initial review :D

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 78ba72c to 549cafd Compare June 24, 2025 22:05
@juliusmh juliusmh requested a review from aknuds1 June 25, 2025 08:33
@aknuds1 aknuds1 requested a review from Copilot June 25, 2025 16:29
Copilot

This comment was marked as outdated.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, gave it a quick review. Nice work! Please check my questions :)

Comment on lines 83 to 85
if e.limits.NameValidationScheme(tenantID) == prom_validation.LegacyNamingScheme {
return prom_validation.LegacyNamingScheme, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it bad to enforce the legacy name validation scheme on the query path? All it will do is prevent the user from querying any metrics they ingested in the past with the UTF-8 name validation scheme?

Copy link
Contributor Author

@juliusmh juliusmh Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to maintain the current behavior, and currently, we validate label names in label_join, label_replace and count_values with legacy scheme. Queries that don't use those functions support UTF8, today. This can be seen in the tests I did.

The question we have to answer: is there any scenario in which data returned by a querier or frontend is fed back to mimir without validation (e.g. to ingesters directly, or with label validation disabled)? If the answer is no, we can safely remove the validation. I'm assuming the answer is yes though, based on this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charleskorn WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that data will always be validated when written back by rulers. They don't skip validation.

However, I believe we should validate names in at least label_replace and label_join, as not validating may lead to unexpected results:

  • functions that consume the results of label_replace and label_join may have undefined behaviour with invalid label names
  • a user testing a recording or alerting rule as a query may not realise that their rule will later fail to write samples when evaluated by rulers

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 3 times, most recently from c1310a3 to 3888883 Compare June 26, 2025 16:22
@@ -262,10 +263,10 @@ func extractLabelNames(values url.Values) ([]model.LabelName, error) {

labelNames := make([]model.LabelName, 0, len(labelNamesParams))
for _, labelNameParam := range labelNamesParams {
labelName := model.LabelName(labelNameParam)
if !labelName.IsValid() {
if !validation.UTF8NamingScheme.IsValidLabelName(labelNameParam) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the naming scheme used here be configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering in this case whether this code should use a parameter (instead of hard coding), but not let it be user configurable since, as I explain above, I'm not sure we want to enforce the legacy naming scheme in queries. WDYT @charleskorn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about this code path to know what makes sense here. Does Prometheus apply validation to the label names here as well?

Copy link
Contributor

@aknuds1 aknuds1 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Prometheus apply validation to the label names here as well?

@juliusmh Did you check?

In any case, this is an example of read path isn't it, where we don't want to enforce the "legacy" naming scheme? If so, I think it could a non-configurable parameter/argument, just to avoid hard coding.

Copy link
Contributor Author

@juliusmh juliusmh Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, the api/v1/cardinality/* endpoints are Mimir specific.

The closest is probably db stats in prometheus admin API (calculated here), where you can get some analysis on label name/value cardinality, which, as far as I can see, doesn't involve any validation. But I'm not super confident in that analysis.

@aknuds1 aknuds1 self-requested a review June 30, 2025 07:21
@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 5 times, most recently from 756ef29 to 448dde2 Compare July 4, 2025 11:47
aknuds1
aknuds1 previously requested changes Jul 4, 2025
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is starting to look good. We should keep that test case though.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 2 times, most recently from 35aeae6 to cc723b6 Compare July 7, 2025 09:17
@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from cc723b6 to 936bd5c Compare July 21, 2025 10:00
@juliusmh juliusmh changed the base branch from main to r352 July 21, 2025 10:02
@juliusmh
Copy link
Contributor Author

juliusmh commented Jul 21, 2025

As discussed @aknuds1 :

@charleskorn :

Could you elaborate? It seems like adding it to promql.QueryOpts would be a more natural fit rather than creating a wrapper like this (at least for Mimir, not sure about Prometheus).

For promql, the main problem is that query opts are directly constructed and passed to query engine in prometheus' v1.NewApi. Unless we want to rewrite that handler we need to somehow "modify" promql.QueryOptions, IMO the QueryEngine interface is the only viable place.

You're right, for MQE this can be implemented differently, see this commit. The wrapper, is/was a way of unifying this logic for promql/mimir query engine paths.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 936bd5c to 0522664 Compare July 21, 2025 10:41
@juliusmh juliusmh changed the base branch from r352 to main July 21, 2025 11:50
@dimitarvdimitrov
Copy link
Contributor

oh, also the mimirtool changes i think need a changelog entry because they're a change in behaviour

@dimitarvdimitrov I think there may have been a misunderstanding here? We are postponing mimirtool (and other) behavioral changes until a follow-up PR.

you're right 👍

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitarvdimitrov requests a test case for an invalid label name in the cardinality API. Could you add that @juliusmh?

Also, could you add a CHANGE type changelog entry about how the read path now supports all valid UTF-8 strings at least for the label value cardinality API (/api/v1/cardinality/label_values)? Please see Slack discussion regarding the latter.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 6fc01ca to b83fffe Compare July 23, 2025 16:27
@juliusmh
Copy link
Contributor Author

Given this was a bit chaotic, quick summary of what was addressed:

  • refactored distributor's validation tests
  • added more tests to MetricsQueryRequestValidationRoundTripper

and what is still open:

  • Here, I don't see which case is missing.
  • Here, should I add labelMatcher/selector validation to LabelsQueryRequestValidationRoundTripper? We didn't do that previously.
  • Here, Move this relabel config modification to a better place. Do you have any ideas where that could be?
  • I added some Changelog messages, but needs refining

@dimitarvdimitrov
Copy link
Contributor

thanks @juliusmh . i think only the last thread still needs resolving. i left a suggestion

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 2 times, most recently from 72ecc48 to 25e0428 Compare July 24, 2025 12:58
CHANGELOG.md Outdated
Comment on lines 7 to 8
* [CHANGE] Query-frontend: Add support for utf8 label/metric names in `/api/v1/cardinality/{label_values|label_values|active_series}` endpoints. #11848.
* [CHANGE] Querier: Add support for utf8 metric names, support utf8 label names in `label_join`, `label_replace` and `count_values` PromQL functions. #11848.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [CHANGE] Query-frontend: Add support for utf8 label/metric names in `/api/v1/cardinality/{label_values|label_values|active_series}` endpoints. #11848.
* [CHANGE] Querier: Add support for utf8 metric names, support utf8 label names in `label_join`, `label_replace` and `count_values` PromQL functions. #11848.
* [CHANGE] Query-frontend: Add support for UTF-8 metric and label names in `/api/v1/cardinality/{label_values|label_values|active_series}` endpoints. #11848.
* [CHANGE] Querier: Add support for UTF-8 metric and label names in `label_join`, `label_replace`, and `count_values` PromQL functions. #11848.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what looks like an accidental test change. Also please review my suggestions.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 25e0428 to 5cc2c4b Compare July 25, 2025 09:49
@juliusmh juliusmh changed the title Remove global name validation scheme Parameterize metric/label name validation scheme Jul 25, 2025
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update mimir-prometheus.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 5cc2c4b to 65bc6f8 Compare July 25, 2025 12:44
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@juliusmh juliusmh merged commit c4dc705 into main Jul 25, 2025
35 checks passed
@juliusmh juliusmh deleted the juliusmh/remove_global_name_validation branch July 25, 2025 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement classical name validation mode without disabling through global variable
5 participants