-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] proxy endpoint for kubernetes backend API #12231
Comments
@mclarke47 maybe something you want to have a read through also 🙏 |
I really like this idea, Do we foresee the need to add the entity ref to the query string when running a command against multiple clusters and then allowing the service locator to decide which clusters to route it to based on the entity. Then the endpoint could make requests against 1 cluster or multiple clusters, however this might not be necessary. |
Hi everyone, I'm an intern at VMware working on an implementation of this RFC under @jamieklassen - my goal for the summer is to submit a PR with this functionality. I have a mostly functional proxy that can make a request against a single cluster, and would like to add multi-cluster requests next, so I'd like to start a discussion about what we'd want those to look like from an end-user's perspective. It seems like there are three possible cluster combinations we would want to account for:
For each scenario, we will need to ensure that the proper auth credentials are available. Since the Kubernetes REST API wants an Given these requirements, I think it might make sense to move the two proposed headers (for cluster names and tokens) into one header that would contain a map with cluster names as keys and tokens as values. Here's what this could look like across all scenarios: Single cluster:
{
"my-cluster": "my-token"
} or, if the proxy should default to the existing token for the cluster (from {
"my-cluster": ""
} Multiple clusters:
{
"my-cluster-1": "my-token-1",
"my-cluster-2": "my-token-2",
"my-cluster-3": ""
} All clusters (wildcard):
Thoughts? |
Hi @liamrathke, it seems like this is for client side auth like oidc and google auth which are sent in the body of the other backend requests. I'm not a huge fan of very complicated headers putting necessary parameters in headers can make the API harder to use and understand, however I'm not sure the alternatives are much better. auth in the query stringOne alternative is to include the auth details in the query string then use some convention to pass them, for example: base64 encoding (url safe!) the object
to and putting it in the querystring:
POSTThe other option is switching the endpoint to a POST and passing the auth in the body like the other endpoints. The advantage here is you can reuse the existing types. For server side auth (like a google service accounts) we don't need to pass auth at all because the request will get decorated with auth in the backend. Let me know what you think! |
Gonna chime in and strongly discourage placing credentials in the query string. It's very common that they will be exposed through logs or proxies that way. Keep credentials in the body or headers |
@Rugvip sounds good. @liamrathke let's do either an encoded header or POST. I'll leave it up to you to decide |
@Rugvip @mclarke47 thanks for the feedback, I'm inclined to go with an encoded header so that we can have parity with the Kubernetes HTTP API as far as request methods are concerned - we won't need to decide whether an incoming POST request should be sent to Kubernetes as a GET or a POST |
Update - I'm working on my multi-cluster request implementation and realized that there's still some ambiguity about what the response from a multi-cluster request should look like. There are two main approaches I thought of which we could take: Option 1Aggregate the
This would be useful if we expect the multi-cluster proxy feature would be mostly used for aggregating data from multiple clusters. If users will need to be able to trace back an item in the aggregated Pros:
Cons:
Option 2Map each cluster name to its response - for a multi-cluster request to
Pros:
Cons:
Thoughts on either approach? Are there any specific scenarios we should be optimizing the multi-cluster request feature for? CC @jamieklassen @mklanjsek |
@liamrathke the first option feels like a better approach to me, mostly because multi and single cluster responses have the same format, as long as filtering response entries by cluster is easy. |
@liamrathke I think the existing endpoints use option 2 because of the possibility that some clusters can return an error and some don't. It might be good to standardise our multi-cluster responses with a similar format, which would make me favour 2 |
@mclarke47 makes sense, I ended up going with that approach (Option 2). Currently working on unit tests, will open a PR once they are ready. |
I'd like to make an argument for the first response format as well. Sticking to the format that the K8s API provides will allow any standard Kubernetes client to use this proxy. I agree with @jamieklassen when he says "this might be a bit ambitious," but I think it's within the realm of possibility and worth the attempt. Library & client compatibilityFrom the perspective of a plugin author taking advantage of this proxy, say some hot new query parameter gets added to the Kubernetes API. It would be great if I could bump the And from the perspective of the maintainers of this proxy, it would be great to not be in a position where changes to the K8s API require changes to the proxy. I think this is possible if we take a light touch and limit the proxy to be purely additive to the Kubernetes API — Embrace and extend. Multi-cluster resultsLimiting results to specific clustersAdding an I don't see any problem adopting this as suggested, although we might want to be a bit more specific about how its content is structured. Are we talking, like, a comma-delimited list of the cluster names from Associating results to their clustersInstead of dedicating a top-level section for each cluster, we could indicate the source cluster for each item in the response by appending a namespaced annotation: {
"apiVersion": "v1",
"items": [
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"annotations": {
"backstage.io/kubernetes-cluster-name": "cluster-a"
}
}
},
{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"annotations": {
"backstage.io/kubernetes-cluster-name": "cluster-b"
}
}
}
]
} Reporting errorsAs @mclarke47 points out, multi-cluster queries can result in partially-successful responses, and we'll want to report the failures somehow. One additive way we could report them is by appending an {
"apiVersion": "v1",
"items": [{
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"annotations": {
"backstage.io/kubernetes-cluster-name": "cluster-a"
}
}
}],
"errors": {
"cluster-a": null, // Successful response from this cluster
"cluster-b": {
"code": "500", // sic, don't want JS to interpret this as a float
"message": "Server error" // Message copied directly from server response
},
"cluster-c": {
"code": "403",
"message": "Forbidden"
}
}
} Maybe it's even worth naming this section It's tempting to go with a custom
PerformanceOne drawback of the existing The backend waits until every one of these requests completes or a 30-second timeout has elapsed before it starts to send the concatenated response. For large numbers of managed clusters — especially in a dev-environment scenario where infrastructure can be volatile — it becomes more and more likely that at least one of those requests hangs. The effect on the end user in these cases is that every request takes a full 30 seconds before anything can be displayed in the UI. The proxy approach mitigates this problem to a large extent. Clients gain control over both The problem of a slow cluster remains, but I think we can dramatically improve on the latency of the response if we're careful about its format. Two properties are desirable here, both of which are satisfied by the format proposed above: Items from different clusters are interleaved
Timeout errors come at the very end of the response
To be clear, I'm not suggesting that we immediately make the proxy smart enough to interleave a streamed response like this — we should probably start with a naive AuthenticationThe current My main concern would be a privilege escalation attack where a Backstage user is 1) authorized to use a plugin that relies on this proxy but 2) is not authorized to make modifications to Kubernetes clusters. In particular, I worry about abuse of a service account. A single Backstage service account per cluster will necessarily be granted the union of all the privileges necessary to drive all of a plugin's use cases. Plugins can lock down parts of their UX based on the logged-in user, but system operators cannot enforce the same constraints at the Kubernetes API layer. Perhaps the use cases this proxy enables are distinct enough from what My first instinct is to defer this problem to the operator. The proxy would expect calls to supply an opaque This eliminates the possibility of server-wide authentication, but maybe that's a good thing because it precludes the service account escalation attack vector Another benefit of this opaque-token approach is that it's supported by standard clients. Adding our own semantics to the contents of the Ok, that's it! Let me know what you think. Glad we're tackling this proposal, it opens up a bunch of new use cases for us. Cheers. |
Hi everyone - just wanted to give a quick update on where it looks like things are headed, based on some internal VMware discussions about the RFC. I'm a big fan of @chinigo's suggestion, but aggregating the Kubernetes API responses into a single Kubernetes response object is only really possible if the response is structured as a list already. In other words, if the response from Kubernetes is designed to return a single object only (i.e. creating an object), we would need to fundamentally modify the Kubernetes response to accommodate multiple items. At that point, we would essentially be writing a wrapper for part of the Kubernetes API, which somewhat defeats the purpose of the proxy. For this reason, I'm inclined to stick with Option 1, which has already been implemented. Adding the ability to create objects is one of the primary reasons we're looking to add the proxy in the first place, so a format that properly handles this use case is critical. I'm not sure if there's a simple implementation that balances a) developer convenience when accessing data from multiple clusters, and b) maintains parity with standard Kubernetes cluster responses. |
Another RFC update - I have removed the "fallback" option for making proxy requests without credentials, after discussing internally with the VMWare Backstage team. Allowing requests without credentials could potentially pose a security issue, allowing anyone with knowledge of the Backstage IP to perform arbitrary requests against the Kubernetes cluster without authenticating first. |
Let's just do single-clusterAfter long deliberation and a brief discussion with @luchillo17, I'd like to narrow the scope of this RFC even further -- in the first pass, let's make the
It feels right to use this highly-simplified implementation for now because it was so hard to resolve the problems with a multi-cluster proxy. These included:
In the initial post of this RFC, I introduced a number of problems that I wanted to solve, and I now feel that I introduced too many. The proxy design I'm suggesting here does help with:
The needs that this design does not address, which I think can be either postponed to a new feature or even supported by the existing endpoints:
@luchillo17 is going to take over #13026 and move it in this new direction starting now, and @mclarke47 @freben I don't imagine you'll have any issues with this, but please share your thoughts if you have them! |
To put a finer point on part of the above:
means that if a cluster has a |
I just wanted to say that I've read through this and it sounds sane. Hopefully it'll be good enough for most use cases to simply issue several concurrent requests from the client side instead of multiplexing, especially if the server side has some internal debouncing of heavy requests like cluster lists etc. |
We've got a first version of the proxy now! plenty more ideas about how to improve and document it, but I think those can take the form of their own issues. |
Status: Open for comments
Need
VMware is building Tanzu Application Platform, a platform for building applications on Kubernetes. Most of the platform consists of CRDs and controllers for exposing high-level abstractions which make developers more productive. The Tanzu Application Platform GUI is a Backstage app, and most of the UI experiences it provides take the form of CRD UIs (similar to those discussed in #2857 and #5511) which consume the kubernetes-backend plugin for their data. However, Tanzu Application Platform GUI increasingly needs to support:
Proposal
Add a
/proxy
endpoint to the kubernetes-backend plugin's API. This endpoint would allow Backstage plugins to make arbitrary Kubernetes calls (including listing objects without reference to an Entity/label selector, creating/deleting objects, and non-resource-based APIs like logs) to any cluster (or multiple/all clusters), using the same or similar auth translation logic to that of the existing endpoints.The OpenShift console has a k8s proxy mounted at
/api/kubernetes
, potentially routing to multiple clusters, which does a similar kind of auth translation. I'm suggesting that Backstage have something similar (but following its own idiom, and not written in Go).Having discussed it briefly with @mclarke, it sounds like, instead of supporting any k8s endpoint as a path nested under
/api/kubernetes/proxy
(similar to what OpenShift does), it might be necessary to encode the URL path of the kubernetes endpoint in question. Personally I'd prefer the "sub-path" setup suggested by the OpenShift example, but , maybe some maintainers could weigh in here?Some considerations:
X-Kubernetes-Cluster
(inspired by kcp; if we want to bikeshed, OpenShift's console usesX-Cluster
) to scope requests to a single cluster (or all clusters with a wildcard value? There are probably API actions where this might not make sense)KubernetesRequestAuth
as JSON in a different header (X-Kubernetes-Auth
comes to mind? Maybe something to identify it as Backstage-specific?). A similar approach seems suitable here.It would be really neat if kubernetes-based UI plugins could simply make use of a library like @kubernetes/client-node and have this API transparently "just work", but this might be a bit ambitious.
Example
Suppose I've got an app-config like
where
IP-ADDRESS
is the IP address of a real GKE cluster. Then I'd like to be able to make a request to the Backstage server likewhere that
path
URL parameter is/v1/namespaces/default/pods/mypod-abc123/log?container=workload
(the usual path for akubectl -n default logs mypod-abc123 -c workload
command) URL-encoded, andGOOGLE-ACCESS-TOKEN
is a valid access token that has been negotiated with accounts.google.com out of band. Then the server should respond with the logs of theworkload
container in themypod-abc123
pod in thedefault
namespace of thegke-cluster
cluster.Alternatives
We could try adding new features (pod logs, object creation, queries against specific cluster/namespace/type instead of entity-based queries) one-by-one to the existing API structure, each with its own RFC/discussion/consensus-building -- but this felt like it made iterating on each kubernetes-related feature too expensive. In theory, as more features are developed using this "unsafe" or "plumbing" API, their implementations could be used as helper methods in the kubernetes frontend plugin, or they could be references for new, safer/idiomatic API endpoints within the kubernetes-backend.
Risks
The text was updated successfully, but these errors were encountered: