Skip to content

Remove global name validation scheme #11848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

juliusmh
Copy link
Contributor

What this PR does

The global defining the name validation scheme was removed in favor of a per-tenant override.

  • Distributors have a validation middleware that ensures metric and label names are valid.
  • Rulers validate rule(group)s using this naming scheme.
  • Queriers use a wrapper Query engine that injects name validation into promql.QueryOpts. This ensures MQE and promql will use the correct naming scheme.

This PR uses the following forks:

Which issue(s) this PR fixes or relates to

Fixes #11503

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 3 times, most recently from e47a7e5 to 78ba72c Compare June 24, 2025 18:33
@@ -255,7 +256,8 @@ func (r RuleNamespace) Validate(groupNodes []rulefmt.RuleGroupNode) []error {
func ValidateRuleGroup(g rwrulefmt.RuleGroup, node rulefmt.RuleGroupNode) []error {
var errs []error
for i, r := range g.Rules {
for _, err := range r.Validate(node.Rules[i]) {
// TODO(juliusmh):
for _, err := range r.Validate(node.Rules[i], validation.LegacyNamingScheme) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: How should we handle validation checks in mimirtool?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think about it after the initial review :D

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 78ba72c to 549cafd Compare June 24, 2025 22:05
@juliusmh juliusmh requested a review from aknuds1 June 25, 2025 08:33
@aknuds1 aknuds1 requested a review from Copilot June 25, 2025 16:29
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR eliminates the use of a global name validation scheme in favor of a per-tenant approach by propagating an explicit naming scheme parameter throughout validation, query, and rule evaluation functions. Key changes include:

  • Adding an explicit naming scheme parameter (e.g. validation.LegacyNamingScheme or validation.UTF8NamingScheme) to various functions and APIs.
  • Propagating the naming scheme into PromQL engine options, query operators, and label/metric validators.
  • Updating tests and related modules to support the new per-tenant validation scheme.

Reviewed Changes

Copilot reviewed 33 out of 88 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/util/validation/limits.go Added a new NameValidationScheme method with a TODO for tenant-specific configuration.
pkg/streamingpromql/* Updated operator and query functions to pass an explicit naming scheme.
pkg/ruler/* Updated ValidateRuleGroup and manager functions to accept a userID and utilize the naming scheme.
pkg/querier/* and pkg/cardinality/* Modified request decoding and querier instantiation to inject the naming scheme.
go.mod Updated dependency versions and replacements as per upstream PRs.
Comments suppressed due to low confidence (2)

pkg/util/validation/limits.go:1377

  • The function currently returns a constant LegacyNamingScheme with a TODO to make it tenant configurable. Consider updating the function documentation with a plan or timeline for supporting per-tenant configuration.
func (o *Overrides) NameValidationScheme(_ string) validation.NamingScheme {

pkg/ruler/manager.go:444

  • The ValidateRuleGroup method now requires an explicit userID and uses the tenant's naming scheme. Ensure that all call sites in the rules API are updated accordingly and that the documentation reflects the new parameter requirement.
		for _, err := range r.Validate(node.Rules[i], namingScheme) {

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, gave it a quick review. Nice work! Please check my questions :)

@@ -827,6 +827,8 @@ func (t *Mimir) initQueryFrontendTripperware() (serv services.Service, err error
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", t.Cfg.Querier.QueryEngine))
}

eng = streamingpromqlcompat.NameValidatingEngine(eng, t.Overrides)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering currently why we need to wrap the engine, can't e.g. a parameter be passed to promql.NewEngine?

Copy link
Contributor Author

@juliusmh juliusmh Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to promql.QueryOpts rather than promql.Engine.

At the moment queriers/frontends use a single engine to process all incoming queries. In Mimir, we use one engine in query frontends, another one is used by queriers. The alternative is to pass a QueryEngineFactory instead of a QueryEngine and either:

  • create a new engine for each request or
  • create legacy- and utf8-engines and choose the appropriate one

The question is: do we want validation scheme per-engine or per-query?

Copy link
Contributor

@charleskorn charleskorn Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: do we want validation scheme per-engine or per-query?

If the goal is to make the scheme configurable per tenant, then making it per-query makes sense to me. Otherwise we'd need an engine instance per configuration scheme.

I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to promql.QueryOpts rather than promql.Engine.

Could you elaborate? It seems like adding it to promql.QueryOpts would be a more natural fit rather than creating a wrapper like this (at least for Mimir, not sure about Prometheus).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charleskorn For context, I believe that we should make the metric/label name validation scheme configurable per-tenant. The actual configurability, IMO, should be done in a follow-up PR though, so we can focus this PR on just dropping the global variable usage.

I also think that for the most part, queries shouldn't enforce the legacy naming scheme. My reasoning is that if the user at one point has ingested metrics with the less restrictive UTF-8 naming scheme, and then flips the switch back to the "legacy" naming scheme, why should the query engine fail queries against the metrics ingested with the UTF-8 naming scheme? @juliusmh has made me aware that, on the other hand, there are certain parts of the query engine (label replace?) that need to use the same validation scheme as the ingest path. WDYT @charleskorn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@juliusmh has made me aware that, on the other hand, there are certain parts of the query engine (label replace?) that need to use the same validation scheme as the ingest path.

This is correct. label_replace and label_join will need to use the same validation scheme.

I also think that for the most part, queries shouldn't enforce the legacy naming scheme. My reasoning is that if the user at one point has ingested metrics with the less restrictive UTF-8 naming scheme, and then flips the switch back to the "legacy" naming scheme, why should the query engine fail queries against the metrics ingested with the UTF-8 naming scheme

I'm not aware of anywhere apart from label_replace and label_join that will enforce label name validation, so I don't believe this is an issue - is there something I've missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charleskorn I just want to make sure we don't introduce configurable metric/label name validation in the query engine except for where we need to (label_replace and label_join as I understand it). At most we should verify that names are valid UTF-8 (apart from the noted exceptions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about count_values, it returns a series with the aggregation label?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good catch, you're correct.

@@ -197,6 +197,7 @@ func New(cfg Config, limits *validation.Overrides, distributor Distributor, quer
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", cfg.QueryEngine))
}

eng = compat.NameValidatingEngine(eng, limits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondering as above why this wrapping is necessary, but haven't looked deeply into it yet.

Comment on lines 83 to 85
if e.limits.NameValidationScheme(tenantID) == prom_validation.LegacyNamingScheme {
return prom_validation.LegacyNamingScheme, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it bad to enforce the legacy name validation scheme on the query path? All it will do is prevent the user from querying any metrics they ingested in the past with the UTF-8 name validation scheme?

Copy link
Contributor Author

@juliusmh juliusmh Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR tries to maintain the current behavior, and currently, we validate label names in label_join, label_replace and count_values with legacy scheme. Queries that don't use those functions support UTF8, today. This can be seen in the tests I did.

The question we have to answer: is there any scenario in which data returned by a querier or frontend is fed back to mimir without validation (e.g. to ingesters directly, or with label validation disabled)? If the answer is no, we can safely remove the validation. I'm assuming the answer is yes though, based on this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charleskorn WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that data will always be validated when written back by rulers. They don't skip validation.

However, I believe we should validate names in at least label_replace and label_join, as not validating may lead to unexpected results:

  • functions that consume the results of label_replace and label_join may have undefined behaviour with invalid label names
  • a user testing a recording or alerting rule as a query may not realise that their rule will later fail to write samples when evaluated by rulers

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 3 times, most recently from c1310a3 to 3888883 Compare June 26, 2025 16:22
@@ -262,10 +263,10 @@ func extractLabelNames(values url.Values) ([]model.LabelName, error) {

labelNames := make([]model.LabelName, 0, len(labelNamesParams))
for _, labelNameParam := range labelNamesParams {
labelName := model.LabelName(labelNameParam)
if !labelName.IsValid() {
if !validation.UTF8NamingScheme.IsValidLabelName(labelNameParam) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the naming scheme used here be configurable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering in this case whether this code should use a parameter (instead of hard coding), but not let it be user configurable since, as I explain above, I'm not sure we want to enforce the legacy naming scheme in queries. WDYT @charleskorn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about this code path to know what makes sense here. Does Prometheus apply validation to the label names here as well?

Copy link
Contributor

@aknuds1 aknuds1 Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Prometheus apply validation to the label names here as well?

@juliusmh Did you check?

In any case, this is an example of read path isn't it, where we don't want to enforce the "legacy" naming scheme? If so, I think it could a non-configurable parameter/argument, just to avoid hard coding.

Copy link
Contributor Author

@juliusmh juliusmh Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, the api/v1/cardinality/* endpoints are Mimir specific.

The closest is probably db stats in prometheus admin API (calculated here), where you can get some analysis on label name/value cardinality, which, as far as I can see, doesn't involve any validation. But I'm not super confident in that analysis.

@@ -827,6 +827,8 @@ func (t *Mimir) initQueryFrontendTripperware() (serv services.Service, err error
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", t.Cfg.Querier.QueryEngine))
}

eng = streamingpromqlcompat.NameValidatingEngine(eng, t.Overrides)
Copy link
Contributor

@charleskorn charleskorn Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is: do we want validation scheme per-engine or per-query?

If the goal is to make the scheme configurable per tenant, then making it per-query makes sense to me. Otherwise we'd need an engine instance per configuration scheme.

I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to promql.QueryOpts rather than promql.Engine.

Could you elaborate? It seems like adding it to promql.QueryOpts would be a more natural fit rather than creating a wrapper like this (at least for Mimir, not sure about Prometheus).

@aknuds1 aknuds1 self-requested a review June 30, 2025 07:21
@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch 5 times, most recently from 756ef29 to 448dde2 Compare July 4, 2025 11:47
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is starting to look good. We should keep that test case though.

@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 448dde2 to 35aeae6 Compare July 7, 2025 09:03
@juliusmh juliusmh force-pushed the juliusmh/remove_global_name_validation branch from 35aeae6 to cc723b6 Compare July 7, 2025 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement classical name validation mode without disabling through global variable
3 participants