Skip to content

Conversation

@rdooley
Copy link

@rdooley rdooley commented Sep 9, 2021

What this PR does:
This PR adds support for multitenant rules and alerts to ruler and alertmanager. A multitenant rule is one whose query spans multiple tenants e.g. a rule stored in foo|bar would query both tenant foo and bar when evaluating the rule and store the resulting rules timeseries in either foo or bar, but always in the same subtenant.

Using a setup of 20 tenants 0, 1, ... 19 fed bogus metrics with avalanche and the following recording rule stored in the composite tenant 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19
I saw

Screenshot 2021-08-25 at 13-28-38 Explore - Cortex - Grafana

from the following rule file

name: record_rule_eg
interval: 10s
rules:
  - record: record_rule_eg
    expr: sum(avalanche_metric_m_4_10017)

Which issue(s) this PR fixes:
Fixes #4403

Open Question:
I noticed when adding a corresponding alertmanager config for the 0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19 tenant, alerts were always sent to the composite tenant with the subtenants lexographically sorted e.g. 0|1|10|11|12|13|14|15|16|17|18|19|2|3|4|5|6|7|8|9. Should I introduce forcible sorting of subtenants when creating rule groups as well to reduce confusion?

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@edma2
Copy link
Contributor

edma2 commented Sep 10, 2021

Hi @rdooley, thanks for taking on this feature. For my use-case, I'm not sure that writing to a random (albeit consistent) tenant would work well for Ruler tenant federation. It would be ideal from my perspective to give the user control over the "source" and "destination" tenants.

IMO, multitenant federation in the ruler is a special case of the more general problem of accessing another tenant's data when defining recording rules/alerts (in the multitenant case, this means multiple tenants). When running Cortex within a single large organization split by team (where tenant == team), for example, team A may want to consume team B's metrics within team A's recording rules and alerts. Or, in the case of multitenant federation, team A may want to consume teams B and C metrics in a single rule.

One way to achieve this would be supporting extra metadata in the rule/alert which specifies the tenant ID (defaults to the tenant in which the rule is created), similar to how you pass a composite X-Scope-OrgID header to the query frontend. I'm curious what you think about this idea and whether it aligns with your original vision.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It is an interesting idea.

Should I introduce forcible sorting of subtenants when creating rule groups as well to reduce confusion?

I didn't understand this. Where would it go?

I see you ticked the "Documentation updated" box, but I do not see any documentation. As a user I would need to know how to configure this feature.

either foo or bar

As a user, this unpredictability seems unworkable.

@rdooley rdooley changed the title Multitenant ruler Federated ruler Sep 10, 2021
@bboreham
Copy link
Contributor

Can you also describe the permissioning model - i.e. how would someone ensure that data from A and B is only supplied to users who have access to A and B? (Implementation of this is out of scope for of Cortex, but some statements about which points need to be locked down would be good)

@rdooley
Copy link
Author

rdooley commented Sep 10, 2021

One way to achieve this would be supporting extra metadata in the rule/alert which specifies the tenant ID (defaults to the tenant in which the rule is created), similar to how you pass a composite X-Scope-OrgID header to the query frontend. I'm curious what you think about this idea and whether it aligns with your original vision.

I don't have any issue with making the destination tenant for a recording rule more explicit, I'm just trying to think thru where to store that info in the rules file structure of rules/${OrgID}/${namespace}. A convenience of random consistent subtenant for federated rules is that I did not have to deal with that. 🤔

@rdooley
Copy link
Author

rdooley commented Sep 10, 2021

Thanks for the PR! It is an interesting idea.

Should I introduce forcible sorting of subtenants when creating rule groups as well to reduce confusion?

I didn't understand this. Where would it go?

I had assumed it would live in pkg/tenant/ however after a closer reading I see this sorting is already accomplished within NormalizeTenantIDs here

I see you ticked the "Documentation updated" box, but I do not see any documentation. As a user I would need to know how to configure this feature.

Unticked for now as discussion is still ongoing and more documentation clearly needed.

either foo or bar

As a user, this unpredictability seems unworkable.

Is adding an optional override to specify destination tenant for recording rules as suggested by @edma2 sufficient or should this be required? For my use case I am fine with federated rules being recorded into a random (but consistent) subtenant since we are assuming a user with access to create/view a recording rule in foo|bar|baz would also be able to query foo, bar or baz.

@edma2
Copy link
Contributor

edma2 commented Sep 10, 2021

I don't have any issue with making the destination tenant for a recording rule more explicit, I'm just trying to think thru where to store that info in the rules file structure of rules/${OrgID}/${namespace}. A convenience of random consistent subtenant for federated rules is that I did not have to deal with that. 🤔

Can Cortex support and parse an additional field in the Prometheus rule format, or is compatibility a concern?

To be clear, I'm assuming that the tenant which owns the rule is the destination, and the rule field would specify the source(s).

@rdooley
Copy link
Author

rdooley commented Sep 10, 2021

It seems pretty nice to leave the Prometheus rule format unaltered. Thinking out loud if we are content determining destination tenant per namespace we could do rules/${OrgID}/${DestTenantID}/${namespace}
EDIT
Or those last two path segments could be swapped and end up with rules/${OrgID}/${namespace}/${DestTenantID}

andrejbranch and others added 2 commits September 13, 2021 08:18
…crease multitenant query max concurrency

Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
* Push ruler metrics to a random subtenant if rule is multitenant
* random seed is set based off labelset hash to ensure series always
  pushed to same subtenant
* add changelog message about multitenant ruler support

Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
* expand federated tenant alertmanager test case
* Add tenant-federation.max-concurrency config opt (docs need
  regenerating later)
* switch to hash mod instead of random for federated ruler subtenant
  selection

Signed-off-by: Rees Dooley <rees.dooley@shopify.com>
@rdooley
Copy link
Author

rdooley commented Sep 14, 2021

Pausing this work while a more explicit and well defined proposal is workshopped here

@rdooley
Copy link
Author

rdooley commented Sep 22, 2021

Since this is going to take some non trivial dev work on my end to implement the changes in the proposal, I'm going to close this PR while #4477 is open for discussion and I'm doing dev work

@rdooley rdooley closed this Sep 22, 2021
@rdooley rdooley mentioned this pull request Oct 4, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ruler] Allow Federated rules

4 participants