Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New feature requirement - Cross-tenant channels #830

Closed
clemensv opened this issue Jun 10, 2021 · 17 comments
Closed

New feature requirement - Cross-tenant channels #830

clemensv opened this issue Jun 10, 2021 · 17 comments

Comments

@clemensv
Copy link
Contributor

related to #829

In large systems, multitenancy is common. Multitenancy means that shared underlying infrastructure resources are used for multiple customers who are presented with seemingly separate virtual environments.

Multitenancy presents a particular challenge when two such systems are interconnected and require a shared understanding about the mapping of tenant scopes, while the tenancy concepts on either side might be wildly different. such multi

To further complicate matters, the identity and access control systems will likely differ such that a subscriber in environment A can't act under an identity that is understood by the publisher and its middleware in environment B.

We therefore need a mapping model that allows for two such multitenant systems to allow for cross-system event flow from a tenant scope of system A to a tenant scope of system B while not having to "cross the streams" with regards to access control and identity scopes.

Furthermore, we need a handshake concept where system A offers up events from their tenant to a tenant in system B, but the tenant in system B must explicitly accept that offer. The same is true in the reverse.

The proposal is an API at the system level (on either A or B), that creates and manages a "cross-tenant channel". The channel notion might be as simple as a stable identifier. That identifier allows systems A and B to map their tenant concepts independently to that channel.

When a tenant of A ("Contoso") wants to connect with its own tenant on system B (the systems don't know that "Contoso" is the same party), it will tell system A to connect to system B and pass an identifying reference to the tenant Contoso has in B. By some mechanism that is outside of the scope of the spec we need to write here, Contoso then approves the connection coming from system A in system B. What we need to define is the API that system B makes available to allow for the first half of the handshake.

channelId = B.createTenantChannel(reference)

When events flow from A to B, system A will look up the "neutral" channel identifier for its tenant and annotate the delivery* with that channel identifier. System B will then use that channel identifier to find the target tenant for the delvery.

"Annotate the delivery" means that the HTTP call or AMQP (etc.) message that carries the flowing CloudEvent carries the channel identifier somewhere in the transport message, not in the CloudEvent per-se. The channel identifier is only meaningful for the handover hop.

The operation may have further parameters such as passing callback URLs.

@cneijenhuis
Copy link
Contributor

Sorry, I'm not sure I'm following.

Let's take our example event: https://github.com/cloudevents/spec/blob/v1.0.1/spec.md#example
Github is a multi-tenant system. The tenant is part of the source https://github.com/cloudevents/spec/pull (It's not clear if the tenant is cloudevents or cloudevents/spec, or if we really should have a concept of sub-tenants... but that's another discussion 😉 )

Now I want to send that as a Webhook to an Azure function. I setup a Webhook and get a URL like this: http://cloudevents-spec.azurewebsites.net/api/github-webhook. I add that Webhook into Github (this is like the handshake), and Github will now send the events for their tenant to the Azure tenant.

(As a practical example, when you configure Azure EventGrid as the destination for commercetools CloudEvents, essentially the same process takes place... We have customers using that in production today 😉 )

Can you explain what your proposed API would add on top of that?

@erikerikson
Copy link
Member

I have to admit that I was confused by your description today on the call but let me try to respond anyway... Dumb questions follow in an attempt to support you in clarifying for a lay person.

Why is it important for the pushing/delivering system (A above) to initiate? I would expect the pulling/sinking system (B above) to initiate the connection using a credential obtained from A and to behave according to CloudEvents+Transport contract as well as providing any materials needed to initiate connection in a sender initiated push scenario.

Why aren't you just requiring the controller of rights in both systems to issue a credential from the source system (A) that provides every necessary right to the desired data so that the receiving system (B) can initiate the connection (providing necessary the credential and configuration required for the transmission protocol)?

By handing the credential to the receiving system you enable that system to use the discovery and/or delivery specifications to facilitate identification and delivery of the data. Then the authorized receiving system can parcel it out according to its own rights system as configured by the end customer. By handing the credential from A to B the user has implicitly declared consent to the security and distribution mechanisms of B. I would assume that system B will choose to have a mapping of the system A credential into an identified security context within its own system as configured by the end user. Perhaps requiring the system A credential ID to be identified in the event (or in some envelope) is what you are asking to be standardized in this issue? Alternatively... are you asking that B supply an ID for the security context (into which the data will be transferred) to system A for A to annotate the data with upon transmission to B? It seems simpler for B to keep and maintain the credential to context mapping but I can admit there's some equivalence.

Not to ignore the conversation... I heard that you were interested in a mechanism for multiplexing and demultiplexing the data from multiple authorized linkings (isolated sub or multi tenant security contexts) across the same connection (likely clustered) but this seems like an optimization (and implementation detail) regardless of your rejection of that label during the call. Otherwise each authorized source/sink pair could allocate a separate connection (i.e. no [de/]multiplexing is done) and the connecting systems would spend a lot of resources establishing and timing out those connections. This optimization seems sensible assuming certain system guarantees (e.g. order) are maintain or at least documented (though I would expect some users to require greater isolation). Is this optimization what you mean to ask us to codify as a standard in this issue?

@deissnerk
Copy link
Contributor

@erikerikson Of course what you describe regarding identities and authorizations between A and B is possible. In practice it can be quite some challenge, though. I'm not a security expert, but the typical solutions had all some drawbacks like manual handling of regular credential rotation or complex setup, if you need to create trust between different identity providers. I also wonder, if that is the right approach. One benefit of event-driven integration is the decoupling. Events may flow through several intermediaries. They are routed based in their attributes and not a target address. With the channel approach I tell exactly those intermediaries how the events may flow, while the identities on both sides remain separate. The setup is rather simple. one side initiates the creation of the channel, while the other side acknowledges it.

Performance is not really the point. For the number of connections there are also other solutions possible. You may e.g. attach different tokens to each message.

The main benefit I see in a more simplified handling for users who have tenants in different environments and want to facilitate event flow between those.

Even, if it was just about performance, would it be strange for a spec with cloud in the name to facilitate the handling of multi-tenancy in interoperability scenarios?

@erikerikson
Copy link
Member

@deissnerk Thanks! Certainly multi-tenancy is an entirely appropriate topic though I suspect its use is distracting and confusing here. I also agree that performance is not a primary focus even if involved.

Automatic and regular rotation of credentials is always appreciated. I was particularly shocked after transitioning clouds that temporary credentials were not standard in every cloud. In no way did I mean to declare that the credential used to initiate the relationship had to be a persistent or operational credential. Neither did I specify that it had to belong to a persistent identity. I expect presenting the credential would establish authorization for two things: that I want to delegate 1) access to the scoped asset(s) for the holder and 2) that access should be established and maintained according to given configuration or until cancelled.

A problem in setup that has caused a lot of consternation in my past is blockages to IaC tooling. The original issue description set off perhaps undeserved alarm bells for me on this front. Some providers seem to have taken a UI driven approach and require you to reverse engineer their calls and identify the underlying REST calls so that you can use differently scoped/authenticated/etc. libraries in a layered manner to automate rotation or other functions (<cough>, IoT DPS... 👀).

With the channel approach I tell exactly those intermediaries how the events may flow, while the identities on both sides remain separate.

Can you help me understand the distinction between this and my suggestion? I suspect we need more semantic synchronization. I believe that the channel and rights to its information on the A side would be identified by the credential supplied. The user would obtain it from A and supply it to B, B would coordinate with A to create/update the channel in a secure manner and B would respect the supplied configuration to distribute the events transmitted over the channel accordingly. Under the covers there would be nothing to block the establishment of a credential unknown to the shared requesting user or that the credentials used would be long lived.

At first I balked at the suggestion of a workflow that has a trusted entity requesting resource allocation and configuration in another context at the behest of an arbitrary user that is at that point hasn't authenticated in B. This has a sort of precedent in OAuth and perhaps OAuth is a good solution to this problem but that is distinct in that a credential in the sink is supplied by the initiator as part of the flow. Recognizing this is related to what is discussed here reduces my concern and head scratching about the source reaching out to the sink which now feels a like a distraction that is my fault: sorry.

I am still trying to get clear on exactly the problem being solved is and what the concerns are and the solution. I think I will try again with a new comment.

@erikerikson
Copy link
Member

Attempting to summarize...

A provider knows how to manage asset access according to its own schemes but not all others and vice versa. Regardless of the possibility of writing authentication and authorization decoder rings this seems fraught and wasteful. Yet it remains user friendly to facilitate discovery and delivery across providers. To do so securely we need a vendor neutral authorization mechanism. One that is standard will reduce the friction of setting up data flows between authorized parties in a variety of contexts, including cases where the two apparent parties are, in fact, a single party.

The proposed solution is a "channel" abstraction which can have an identifier that can identify a validated linkage of one context to another allowing the use of discovery and delivery according to the hidden configurations and mechanisms on each side of the channel. Establishing the channel in a vendor neutral way would require the provision of materials that identify the scope of authorization and "proving" one has the authorization. A provider's specialism in itself will allow it to produce materials for scopes it controls and that satisfy its proof requirements. Thereby whether initiated from the producer or receiver, the materials of each system will need to be generated and exchanged. These can be more secure if pinned to the system it was interacting with at the time of generation.

This may be only enough to establish the channel but not enough to maintain it over time due to a desire to regularly rotate materials.

Such a mechanism can be used to facilitate data flow in both directions and perhaps duplexed over the same connection(s).

It seems that one or more interaction sequences with APIs for each interaction is the request of this issue.

Am I in the ballpark? Am I missing or adding significant details or concerns erroneously?

@clemensv
Copy link
Contributor Author

@erikerikson I think you're in the ballpark. I am not sure what you are referring with the materials, however.

@cneijenhuis The challenge in the case we are covering here is that, translated to the scenario you put up here, Github would allow you to set up a funnel to direct all events for a tenant into an Azure subscription, and inside of that Azure subscription exists a resource "Github events" that you can then subscribe to. That resource is securable with rules and using identities only from that Azure subscription and you don't need a Github account to set up that subscription. The subscription resource is also able to dispatch events into HTTP endpoints and queues and other endpoints that are private to the subscription and may be inside an isolated environment that Github won't be able to know about. Your example assumes public endpoints; what we propose here allows for tunneling events between completely private tenant environments. The shared elements are interconnected, multiplexed event buses and the mentioned tenant channel abstraction that allows demultiplexing of events into the respective tenant model on either side, without either side having to know about the other side's tenant model.

@erikerikson
Copy link
Member

erikerikson commented Jun 16, 2021

I was intentionally obtuse. Consider a JWT. I imagine that they might contain a credential along with identifying information for the scope on the respective side and anything else, possibly obscured by encryption or something. That seems not entirely necessary but the point is that the materials would be issued by and most useful to the same provider providing evidence and identification of specific authorization(s).

@clemensv
Copy link
Contributor Author

@erikerikson We are aiming to avoid that sort of cross-system authorization requirement altogether with this. Let me try to clarify.

Assume we have platforms A and B. In those are tenants A/1 and B/1. Let's also assume that tenants A/1 and B/1 represent the same customer organization, but that fact (the organization being the same) is not necessarily known to platforms A and/or B.

Let's further assume that platforms A and B have their own identity platform that is "closed" insofar that platform services of A cannot accept authorization tokens issued by an authorization service in B and vice versa, and that all authorization management for endpoints, e.g. putting users into a roles and evaluating those roles to rights, is equally restricted. That is the reality for most platforms today.

Given all that:

  • A and B agree on platform level event flow channels in either direction. A holds a credential/token that allows sending to B and vice versa.
    • Practically, this may mean that if platform A is not only interfacing with platform B, but also implemented on top of platform B, that A is managing the credentials for both directions in self-service as it is using primitives that B provides. For "platform level trust", platform A's special privileges in establishing channels ought to require explicit approval as a trusted partner in B, but once that is granted, all the further technical elements can be managed just like any other resource of A in B.
  • Now the customer org wants to link event flow between their tenants A/1 and B/1 in the respective systems. They go to their A/1 scope and ask to create an event flow to B and pass the identifier of the B/1 tenant.
  • Under the covers, A now calls B.createTenantChannel("B/1") acting as under its platform identity. That yields a channel identifier. B will have associated that identifier with B/1 when the call returns. A now creates an association between that identifier and A/1.
  • In B, createTenantChannel("B/1") will propose placing a resource into tenant B/1 that acts as a subscription manager. Since the customer org is doing that work, they will anticipate that and find the proposal notification in documented place in platform B. If the customer org accepts the proposal in B/1, the channel becomes active and usable.

In that model, B authorizes A to initiate creating channels into its tenants, but the tenants of B must still explicitly accept the proposed channels.

Once the channel has been accepted, all events that are configured in A/1 to flow towards B/1 will be funneled through the platform level A-to-B event pipeline. The routing target is the neutral channel ID, since the platform-B-level identifiers of B/1 might indeed change (resources may move between billing scopes, etc).

As events arrive in B, the channel ID is mapped to the tenant identifiers and events for B/1 flow up through the channel that was accepted for that tenant.

The subscription management for whether and which events are dispatched to targets inside B/1 is a completely local concern inside of B/1. So is access control management. In A/1 it is decided which events to funnel towards B/1 and then in B/1 it is decided who gets to subscribe into that stream and there may be further, filter-based authorization on that stream.

@erikerikson
Copy link
Member

The string "B/1" would be satisfactory "materials". It is generated by B and used by B after passing through the custody of A. What you describe is implied authorization for users (via configuration actions) so that supplying the B/1 ID is sufficient at the platform level.

One curiosity... What keeps a malicious discoverer (A/2) of the ID "B/1" (regardless of what data would be in the quotes) from crafting malicious or even just noisy channel approvals? How does B/1 consent to what B shows them?

@clemensv
Copy link
Contributor Author

A/2 needs to be an authorized user in A. A/2 can technically announce through this mechanism to any tenant in B, but there are ways to restrict abuse like rate limiting. It's also imaginable that for an offer to be made, A/2 also needs to provide a key that the target tenant controls, similar to Bluetooth.

B/1 consents to an offer through their portal or an API. There might be a "pending offers" list or there might be a disabled system object created that B/1 enables.

@erikerikson
Copy link
Member

Seems more straightforward for A to dispense materials that can be handed to B (to hand back to A) for channel establishment. Why not do that?

@clemensv
Copy link
Contributor Author

If there were an handshake that runs a secret through the system that the org admin of both A/2 and B/1 sets and validates, it would be seeded by that admin in A/2 and pop back up in B/1 where the admin can validate that this was initiated by themselves.

@cneijenhuis
Copy link
Contributor

Your example assumes public endpoints; what we propose here allows for tunneling events between completely private tenant environments.

Can you give a practical example? I can't follow. I can think of e.g. VPC setups, which we've also used to connect e.g. a k8s cluster with a DBaaS, but also for that the cloud providers and the DBaaS did not have to pre-establish trust (afaik).

Practically, this may mean that if platform A is not only interfacing with platform B, but also implemented on top of platform B, that A is managing the credentials for both directions in self-service as it is using primitives that B provides. For "platform level trust", platform A's special privileges in establishing channels ought to require explicit approval as a trusted partner in B, but once that is granted, all the further technical elements can be managed just like any other resource of A in B.

I still feel that this is anti-interop. Platform X, Y and Z may not be trusted by B for whatever reason and customers of B are effectively forced to use A.

If there were an handshake that runs a secret through the system that the org admin of both A/2 and B/1 sets and validates, it would be seeded by that admin in A/2 and pop back up in B/1 where the admin can validate that this was initiated by themselves.

What I've seen some of our customers do with Terraform/Infrastructure as Code is that the admins validate the terraform script/code. You can then create secret on one platform (e.g. a cloud provider) and hand it over to another platform, and vice versa if you need a bi-directional flow. No human has to ever touch/see the secret.

@clemensv
Copy link
Contributor Author

@duglin
Copy link
Collaborator

duglin commented Apr 18, 2023

@clemensv do we still need this?

@duglin
Copy link
Collaborator

duglin commented Sep 21, 2023

ping @clemensv do we still need this? Or is this a xReg thing and it should be moved over there?

@duglin
Copy link
Collaborator

duglin commented Sep 21, 2023

per the 9/21 call I checked with @clemensv and he agreed we can close this

@duglin duglin closed this as completed Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants