-
Notifications
You must be signed in to change notification settings - Fork 613
Remove global name validation scheme #11848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e47a7e5
to
78ba72c
Compare
pkg/mimirtool/rules/rules.go
Outdated
@@ -255,7 +256,8 @@ func (r RuleNamespace) Validate(groupNodes []rulefmt.RuleGroupNode) []error { | |||
func ValidateRuleGroup(g rwrulefmt.RuleGroup, node rulefmt.RuleGroupNode) []error { | |||
var errs []error | |||
for i, r := range g.Rules { | |||
for _, err := range r.Validate(node.Rules[i]) { | |||
// TODO(juliusmh): | |||
for _, err := range r.Validate(node.Rules[i], validation.LegacyNamingScheme) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: How should we handle validation checks in mimirtool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think about it after the initial review :D
78ba72c
to
549cafd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR eliminates the use of a global name validation scheme in favor of a per-tenant approach by propagating an explicit naming scheme parameter throughout validation, query, and rule evaluation functions. Key changes include:
- Adding an explicit naming scheme parameter (e.g. validation.LegacyNamingScheme or validation.UTF8NamingScheme) to various functions and APIs.
- Propagating the naming scheme into PromQL engine options, query operators, and label/metric validators.
- Updating tests and related modules to support the new per-tenant validation scheme.
Reviewed Changes
Copilot reviewed 33 out of 88 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
pkg/util/validation/limits.go | Added a new NameValidationScheme method with a TODO for tenant-specific configuration. |
pkg/streamingpromql/* | Updated operator and query functions to pass an explicit naming scheme. |
pkg/ruler/* | Updated ValidateRuleGroup and manager functions to accept a userID and utilize the naming scheme. |
pkg/querier/* and pkg/cardinality/* | Modified request decoding and querier instantiation to inject the naming scheme. |
go.mod | Updated dependency versions and replacements as per upstream PRs. |
Comments suppressed due to low confidence (2)
pkg/util/validation/limits.go:1377
- The function currently returns a constant LegacyNamingScheme with a TODO to make it tenant configurable. Consider updating the function documentation with a plan or timeline for supporting per-tenant configuration.
func (o *Overrides) NameValidationScheme(_ string) validation.NamingScheme {
pkg/ruler/manager.go:444
- The ValidateRuleGroup method now requires an explicit userID and uses the tenant's naming scheme. Ensure that all call sites in the rules API are updated accordingly and that the documentation reflects the new parameter requirement.
for _, err := range r.Validate(node.Rules[i], namingScheme) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, gave it a quick review. Nice work! Please check my questions :)
@@ -827,6 +827,8 @@ func (t *Mimir) initQueryFrontendTripperware() (serv services.Service, err error | |||
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", t.Cfg.Querier.QueryEngine)) | |||
} | |||
|
|||
eng = streamingpromqlcompat.NameValidatingEngine(eng, t.Overrides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering currently why we need to wrap the engine, can't e.g. a parameter be passed to promql.NewEngine
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to promql.QueryOpts
rather than promql.Engine
.
At the moment queriers/frontends use a single engine to process all incoming queries. In Mimir, we use one engine in query frontends, another one is used by queriers. The alternative is to pass a QueryEngineFactory
instead of a QueryEngine
and either:
- create a new engine for each request or
- create legacy- and utf8-engines and choose the appropriate one
The question is: do we want validation scheme per-engine or per-query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is: do we want validation scheme per-engine or per-query?
If the goal is to make the scheme configurable per tenant, then making it per-query makes sense to me. Otherwise we'd need an engine instance per configuration scheme.
I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to
promql.QueryOpts
rather thanpromql.Engine
.
Could you elaborate? It seems like adding it to promql.QueryOpts
would be a more natural fit rather than creating a wrapper like this (at least for Mimir, not sure about Prometheus).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charleskorn For context, I believe that we should make the metric/label name validation scheme configurable per-tenant. The actual configurability, IMO, should be done in a follow-up PR though, so we can focus this PR on just dropping the global variable usage.
I also think that for the most part, queries shouldn't enforce the legacy naming scheme. My reasoning is that if the user at one point has ingested metrics with the less restrictive UTF-8 naming scheme, and then flips the switch back to the "legacy" naming scheme, why should the query engine fail queries against the metrics ingested with the UTF-8 naming scheme? @juliusmh has made me aware that, on the other hand, there are certain parts of the query engine (label replace?) that need to use the same validation scheme as the ingest path. WDYT @charleskorn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@juliusmh has made me aware that, on the other hand, there are certain parts of the query engine (label replace?) that need to use the same validation scheme as the ingest path.
This is correct. label_replace
and label_join
will need to use the same validation scheme.
I also think that for the most part, queries shouldn't enforce the legacy naming scheme. My reasoning is that if the user at one point has ingested metrics with the less restrictive UTF-8 naming scheme, and then flips the switch back to the "legacy" naming scheme, why should the query engine fail queries against the metrics ingested with the UTF-8 naming scheme
I'm not aware of anywhere apart from label_replace
and label_join
that will enforce label name validation, so I don't believe this is an issue - is there something I've missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charleskorn I just want to make sure we don't introduce configurable metric/label name validation in the query engine except for where we need to (label_replace
and label_join
as I understand it). At most we should verify that names are valid UTF-8 (apart from the noted exceptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about count_values
, it returns a series with the aggregation label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch, you're correct.
@@ -197,6 +197,7 @@ func New(cfg Config, limits *validation.Overrides, distributor Distributor, quer | |||
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", cfg.QueryEngine)) | |||
} | |||
|
|||
eng = compat.NameValidatingEngine(eng, limits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering as above why this wrapping is necessary, but haven't looked deeply into it yet.
if e.limits.NameValidationScheme(tenantID) == prom_validation.LegacyNamingScheme { | ||
return prom_validation.LegacyNamingScheme, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it bad to enforce the legacy name validation scheme on the query path? All it will do is prevent the user from querying any metrics they ingested in the past with the UTF-8 name validation scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR tries to maintain the current behavior, and currently, we validate label names in label_join
, label_replace
and count_values
with legacy scheme. Queries that don't use those functions support UTF8, today. This can be seen in the tests I did.
The question we have to answer: is there any scenario in which data returned by a querier or frontend is fed back to mimir without validation (e.g. to ingesters directly, or with label validation disabled)? If the answer is no, we can safely remove the validation. I'm assuming the answer is yes though, based on this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charleskorn WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that data will always be validated when written back by rulers. They don't skip validation.
However, I believe we should validate names in at least label_replace
and label_join
, as not validating may lead to unexpected results:
- functions that consume the results of
label_replace
andlabel_join
may have undefined behaviour with invalid label names - a user testing a recording or alerting rule as a query may not realise that their rule will later fail to write samples when evaluated by rulers
pkg/streamingpromql/operators/aggregations/count_values_test.go
Outdated
Show resolved
Hide resolved
c1310a3
to
3888883
Compare
pkg/cardinality/request.go
Outdated
@@ -262,10 +263,10 @@ func extractLabelNames(values url.Values) ([]model.LabelName, error) { | |||
|
|||
labelNames := make([]model.LabelName, 0, len(labelNamesParams)) | |||
for _, labelNameParam := range labelNamesParams { | |||
labelName := model.LabelName(labelNameParam) | |||
if !labelName.IsValid() { | |||
if !validation.UTF8NamingScheme.IsValidLabelName(labelNameParam) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the naming scheme used here be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering in this case whether this code should use a parameter (instead of hard coding), but not let it be user configurable since, as I explain above, I'm not sure we want to enforce the legacy naming scheme in queries. WDYT @charleskorn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about this code path to know what makes sense here. Does Prometheus apply validation to the label names here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Prometheus apply validation to the label names here as well?
@juliusmh Did you check?
In any case, this is an example of read path isn't it, where we don't want to enforce the "legacy" naming scheme? If so, I think it could a non-configurable parameter/argument, just to avoid hard coding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik, the api/v1/cardinality/*
endpoints are Mimir specific.
The closest is probably db stats in prometheus admin API (calculated here), where you can get some analysis on label name/value cardinality, which, as far as I can see, doesn't involve any validation. But I'm not super confident in that analysis.
@@ -827,6 +827,8 @@ func (t *Mimir) initQueryFrontendTripperware() (serv services.Service, err error | |||
panic(fmt.Sprintf("invalid config not caught by validation: unknown PromQL engine '%s'", t.Cfg.Querier.QueryEngine)) | |||
} | |||
|
|||
eng = streamingpromqlcompat.NameValidatingEngine(eng, t.Overrides) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is: do we want validation scheme per-engine or per-query?
If the goal is to make the scheme configurable per tenant, then making it per-query makes sense to me. Otherwise we'd need an engine instance per configuration scheme.
I decided to go with a wrapper instead, mainly because it means less changes to the query path and it's easier to implement. This also drove the decision to add name validation scheme to
promql.QueryOpts
rather thanpromql.Engine
.
Could you elaborate? It seems like adding it to promql.QueryOpts
would be a more natural fit rather than creating a wrapper like this (at least for Mimir, not sure about Prometheus).
756ef29
to
448dde2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is starting to look good. We should keep that test case though.
448dde2
to
35aeae6
Compare
…eus/client_golang, opentelemetry-go, grafana/prometheus-alertmanager
…relabel validation scheme when unmarshaling limits
35aeae6
to
cc723b6
Compare
What this PR does
The global defining the name validation scheme was removed in favor of a per-tenant override.
This PR uses the following forks:
Which issue(s) this PR fixes or relates to
Fixes #11503
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. If changelog entry is not needed, please add thechangelog-not-needed
label to the PR.about-versioning.md
updated with experimental features.