-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes backend rewrite #206
Conversation
dask-gateway-k8s-operator/pkg/apis/gateway/v1alpha1/daskcluster_types.go
Outdated
Show resolved
Hide resolved
@jcrist What flags did you pass when you ran |
|
There are random things missing that should be there when you run Did you update any files other than |
Ah! Good catch. The
Yeah, that's what the |
@jcrist Looks like the verbose CRDs were a result of passing When I switched it to a type of The controller will deserialize the strings using Do you want one, large, fully-functional PR / POC or would you prefer smaller PRs along the way? |
Yes, this is correct. But the cleaned up CRD files work fine, and shouldn't cause any problems. Why would you want to serialize the pod specs as a string instead of the full object? This way things still deserialize properly, and we get schema verification on object creation rather than having to handle that ourselves in the controller. Further, things other than the controller have to work with these objects - the gateway api server watches them too. Having the templates directly as objects rather than strings that need further work to interpret makes this nicer. Why would you define them as strings over the typed PodTemplate objects?
I would prefer if you formed a WIP branch and made a PR against this branch. This allows us to discuss code as you work on it, and makes it easier to incrementally merge things upstream when we find good merge points. |
Copy that. It's really helpful to hear the rationale behind the decision. Thanks for the clarification! |
Any particular reason that we're using a pod template as opposed to a corev1.pod? |
Pod templates are commonly used for other objects that create many instances of a pod from a template (jobs, deployments, statefulsets, etc...). A pod is an instantiation of a pod template. |
We also add a script to reformat the generated CRD yaml. Without this reformatting, the yaml file is ~12,000 lines!!
Doesn't do much for now, but basic class outline is here.
Add kubernetes requirements
Just testing the functional python code for now.
The gitignore in the root directory was masking this.
- Reflect k8s state for DaskCluster and Secret objects, updating internal cluster cache to follow the state of these objects - Implement `start_cluster`, `get_cluster` and `list_clusters`
I have run into a problem with two solutions, and I'm not sure which is better. To connect to a cluster, a user needs the temporary credentials we generate for it (TLS key and cert). These need to be stored in kubernetes, and also accessed by the gateway api server. Because we want the gateway to be able to manage clusters in multiple namespaces, this presents a bit of a problem. For a pod to access the secret, it must be in its own namespace. Thus, if we only store the TLS credentials in a secret, then the api server needs a cluster role to read ( One alternative option is to store the credentials inside the Is this a bad idea? Keeping a secret in something common like a configmap is a bad idea since lots of users need access to configmaps. But storing a secret in a custom resource object that is only needed for the application (and other users are unlikely to have access) seems less flawed. AFAICT secrets are just like any other object in kubernetes, except that it's a convention to store secret info in them. cc @yuvipanda, @jacobtomlinson for thoughts. |
A third option would be to create the secrets in the same namespace as the DaskCluster objects, and have the controller mirror them to namespace the cluster eventually runs in. This is more complicated than storing them in the DaskCluster objects, but keeps secret things stored in secrets. This also allows for admins who can query/modify DaskCluster objects but can't connect to them, since the permissions are separated (could be given I guess I'm looking for feedback from someone who knows more about what kubernetes admins might expect. Storing everything in the DaskCluster object makes our code the simplest, so if that's fine then I'd prefer that. |
I lean towards storing some credentials in the CR. I don't think that pattern is unheard of and I'm not sure there's a need to create separate service accounts for access to the CR vs the TLS key and cert. Take my input with a grain of salt though. I need to do some more digging. I'll also sync up with our SRE team to see if they have any objections. |
That would be great, those are the kind of people I want to hear from. |
Fourth option: create another CRD (when you're a hammer, everything looks like a nail) that shares the same namespace as the |
@jcrist which pods need access to the TLS secrets? If it's only DG, DG will be deployed to a predetermined namespace, not a dynamically provisioned one, in which case, we can store all of the secrets in that one namespace. |
No, the scheduler and worker pods will also need to mount their corresponding secret as a volume, and they'll be potentially running in other namespaces. |
Thanks for clarifying!
In that case, I think that this approach makes a lot of sense. There would be no need to watch the secrets in the dynamic namespaces, so no overhead, other than creation. We can make the scheduler the parent of the duplicated secret. That way, as soon as it's deleted, the duplicated secret will be deleted as well. |
Scratch that, cross namespace owner references are prohibited.
This complicates things. I have to dig deeper. |
If we make the CRD cluster-scoped, namespace-scoped operands can be owned by the CRs. https://github.com/operator-framework/operator-sdk/blob/master/doc/operator-scope.md#crd-scope |
I'm OOO tomorrow but really look forward to deploying and testing on Wednesday. |
- Batch creation of pods to reduce load on api server, and catch expected failures (e.g. due to resource-quotas) earlier. - Requeue with backoff in the presence of pod operation failure - Add a client-side rate limiter for k8s pod operations. This makes things work nicely in the presence of resource-quotas. If a cluster fails to start a requested pod, it will requeue itself after a backoff period. These errors are caught early to minimize failed api calls.
Todo list:
|
Used to fill in `stop_time` in the cluster models returned by the api server.
Cluster records (the DaskCluster objects in k8s) are persisted for a while even after the cluster is stopped. We now add configuration for deleting these objects periodically after a set period of time (defaults to 24 hours).
Hey @jcrist -- this is looking great! We just deployed this to try it out. We needed to tweak a few small things in the We can hit the gateway, log in,
|
Ah, I can see the logic bug here, but I'm curious why it's occurring on your side and not on mine. Can you post the full controller logs? |
Should be fixed now. |
Simplifies shutdown of queue consumers.
- Add rate limiting to all k8s calls. - Expose rate limiting parameters in the configuration - Expose backoff parameters in the configuration
Move more things to `INFO` level, more uniform/informative messages.
Sorry, changes were pushed up to come CRDs and I think our issue was that some CRDs were missing permissions (a lot of this, I believe, is due to security restrictions on our end about how verbs are listed, etc.) Bug is fixed -- I can try to deploy the previous version a little later and try to re-trigger the traceback |
On shutdown there are many events from child objects as cascading deletes progress, which leads to many unnecessary reconcilation calls. We now check if the cluster is shutting down before enqueueing the cluster for reconcilation, which reduces unnecessary reconcilation calls on shutdown.
Did you need to change the service account permissions for the controller/api server? The permissions created by the helm chart should be state the permissions required explicitly - if these aren't sufficient then this is a bug in our helm chart. I'd be interested in seeing what you needed to do to get things working. |
One notable change was this in the controller RBAC:
|
Hmmm, didn't know that |
Most users will probably want to serve over a single port (both HTTP and TCP traffic) rather than splitting the scheduler traffic out to a separate port. We now enable that by default. The client is also updated to infer the proper port from the scheme if the port is not provided (e.g. use port 443 if given ``https://foo.com``). Also fixes rbac permissioning for the controller.
I've fixed the RBAC issue. I also updated the helm chart to serve over a single port by default, as this will likely be the more common configuration. |
This is getting pretty close to ready to merge. I'm going to try and expand unit tests tomorrow, and then hit the merge button. I think getting larger integration tests setup will be a subsequent PR - I've at least verified things work locally and this PR is pretty big already. |
You may find OSDKs documentation on E2E tests to be a helpful reference: https://github.com/operator-framework/operator-sdk/blob/master/doc/test-framework/writing-e2e-tests.md |
Do other backends rely on the Golang proxy that you wrote? |
Yes. All other backends use our bundled proxy, the kubernetes backend is the only one using traefik. |
Ok, I'm going to merge this. Since the design has changed a few times during this PR (and some of the above comments are out-of-date), here is an up-to-date summary of what was implemented here:
There are currently some unit tests, mostly that k8s objects created have the appropriate fields and that configuration is forwarded appropriately. I've poked at things locally and everything seems ok. Integration tests will be added in a subsequent PR. |
This implements (or will) the design proposed in #198. The kubernetes backend is broken into 2 components:
Backend
subclass, responsible for creating and queryingDaskCluster
objectsDaskCluster
objectsThe controller is responsible for the following:
DaskCluster
object. The secret is mounted as a volume nameddask-credentials
in this pod.dask-credentials
.cluster.spec.worker.replicas
.Fixes #198.
(originally opened as #205, closed and reopened from a branch hosted on the dask upstream repo to make it easier for other contributors. Github doesn't let you change the source branch on an open PR)