-
Notifications
You must be signed in to change notification settings - Fork 838
Federated ruler #4470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Federated ruler #4470
Conversation
eed1e78 to
6db00dd
Compare
|
Hi @rdooley, thanks for taking on this feature. For my use-case, I'm not sure that writing to a random (albeit consistent) tenant would work well for Ruler tenant federation. It would be ideal from my perspective to give the user control over the "source" and "destination" tenants. IMO, multitenant federation in the ruler is a special case of the more general problem of accessing another tenant's data when defining recording rules/alerts (in the multitenant case, this means multiple tenants). When running Cortex within a single large organization split by team (where tenant == team), for example, team A may want to consume team B's metrics within team A's recording rules and alerts. Or, in the case of multitenant federation, team A may want to consume teams B and C metrics in a single rule. One way to achieve this would be supporting extra metadata in the rule/alert which specifies the tenant ID (defaults to the tenant in which the rule is created), similar to how you pass a composite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It is an interesting idea.
Should I introduce forcible sorting of subtenants when creating rule groups as well to reduce confusion?
I didn't understand this. Where would it go?
I see you ticked the "Documentation updated" box, but I do not see any documentation. As a user I would need to know how to configure this feature.
either foo or bar
As a user, this unpredictability seems unworkable.
|
Can you also describe the permissioning model - i.e. how would someone ensure that data from A and B is only supplied to users who have access to A and B? (Implementation of this is out of scope for of Cortex, but some statements about which points need to be locked down would be good) |
I don't have any issue with making the destination tenant for a recording rule more explicit, I'm just trying to think thru where to store that info in the rules file structure of |
I had assumed it would live in
Unticked for now as discussion is still ongoing and more documentation clearly needed.
Is adding an optional override to specify destination tenant for recording rules as suggested by @edma2 sufficient or should this be required? For my use case I am fine with federated rules being recorded into a random (but consistent) subtenant since we are assuming a user with access to create/view a recording rule in |
Can Cortex support and parse an additional field in the Prometheus rule format, or is compatibility a concern? To be clear, I'm assuming that the tenant which owns the rule is the destination, and the rule field would specify the source(s). |
|
It seems pretty nice to leave the Prometheus rule format unaltered. Thinking out loud if we are content determining destination tenant per namespace we could do |
…crease multitenant query max concurrency Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
* Push ruler metrics to a random subtenant if rule is multitenant * random seed is set based off labelset hash to ensure series always pushed to same subtenant * add changelog message about multitenant ruler support Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
6db00dd to
c7af1ac
Compare
* expand federated tenant alertmanager test case * Add tenant-federation.max-concurrency config opt (docs need regenerating later) * switch to hash mod instead of random for federated ruler subtenant selection Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
c7af1ac to
4cf4576
Compare
|
Pausing this work while a more explicit and well defined proposal is workshopped here |
|
Since this is going to take some non trivial dev work on my end to implement the changes in the proposal, I'm going to close this PR while #4477 is open for discussion and I'm doing dev work |
What this PR does:
This PR adds support for multitenant rules and alerts to ruler and alertmanager. A multitenant rule is one whose query spans multiple tenants e.g. a rule stored in
foo|barwould query both tenantfooandbarwhen evaluating the rule and store the resulting rules timeseries in eitherfooorbar, but always in the same subtenant.Using a setup of 20 tenants
0,1, ...19fed bogus metrics with avalanche and the following recording rule stored in the composite tenant0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19I saw
from the following rule file
Which issue(s) this PR fixes:
Fixes #4403
Open Question:
I noticed when adding a corresponding alertmanager config for the
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19tenant, alerts were always sent to the composite tenant with the subtenants lexographically sorted e.g.0|1|10|11|12|13|14|15|16|17|18|19|2|3|4|5|6|7|8|9. Should I introduce forcible sorting of subtenants when creating rule groups as well to reduce confusion?Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]