Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/admission webhook improvements #577

Merged

Conversation

kesselborn
Copy link
Contributor

This PR adds some conveniences around the admission webhook -- everytime I needed to implement a webhook, I was quite annoyed about the many moving parts that have to be coordinated.
This looks like a lot of code but there is quite a bit of documentation and test code included + the derive code is more or less trivial but annoying to implement for each project individually.
It's probably easiest to look at each commit indivually as they are split up between krator and krator_derive.
This is still blocked until #574 (or the bug it that one tries to fix) is fixed, but perhaps people interested can already have a look at it.
/cc @kflansburg

@kflansburg
Copy link
Collaborator

Hey @kesselborn thanks for putting this together. Something like this was definitely on my roadmap.

This will probably take some time to review, but I do have some questions. The thing that originally prevented me from doing something like this in the moose example is that it requires the operator itself to have RBAC permissions to make these configuration changes, which I don't believe is best practice. Does this require those elevated permissions? What are your thoughts on this? I had planned to implement some sort of tooling like a Cargo plugin to deploy the operator and make this configuration.

@kesselborn
Copy link
Contributor Author

yes: if you want to do it the way I added it to the moose example, you would need elevated permissions ... but as it is only an example, I think that's ok + you can see how you can use those functions.
In a current operator I am writing, you can just output the crd when calling it with an option.

That's why I added the Type WebhookResources as a wrapper type and implemented the Display crate. This is analagous to the crd() command from the kube_derive macro ... so you can either just output the manifests as yaml or write code that installs those resources. The code can than be run locally or within the cluster.

With this approach, you have total flexibility when you provide an operator:

  • you could just auto-generate the files in a pre-commit hook and add the yaml-files to your repository
  • you could create a binary that installs the files and can be run once before deploying the operator
  • you could auto-generate a very specific RBAC-file which just allows to install exactly these resources
  • you could document how users can create the yaml files

... the quick win is that working code and documentation don't diverge.

While I like cargo plugins, this would make Rust a dependency for people who want to use the operator and I think Rust is not (yet) very common for many people. As an Operator-Provider, I would like people to be able to use the operator without the need to use something other than the normal Kubernetes-Tooling

@kesselborn
Copy link
Contributor Author

I am currently thinking, whether it would make sense to extract the derive macro into it's own little crate. It's not really related to krator but could be used independently as well.
The only remaining part would be the extraction of the certificate and the private key -- what do you think @kflansburg

I'll think about it and probably adjust the pull request accordingly

@kflansburg kflansburg force-pushed the feature/admission-webhook-improvements branch from 4c5e48d to 1a03183 Compare April 16, 2021 16:51
kesselborn and others added 3 commits April 16, 2021 11:56
This macro provides convinience methods for easily creating necessary
Kubernetes resources for running a admission webhook:

- a mutating webhook configuration
- a service specification for the webhook
- a secret containing a certificate and a private key for the webhook
- add krator_derive tests to justfile
- in admission.rs: add new type WebhookResources that provides some
  convinience methods for webhook-resources-tuple (Service, Secret, MutatingWebhookConfiguration)
  that gets returned by the derive macro
- add new method

      async fn admission_hook_tls(&self) -> anyhow::Result<krator::admission::AdmissionTLS>

  to the operator trait. This method has to be implemented when the admission-webhook
  feature is enabled and needs to return an AdmissionTLS struct containing a certificate and a private key for
  the webhook web server. This way, passing this information must not rely on files whose path is given via
  env vars. Furthermore, add convenience methods that allow to convert a Kubernetes-Secret to a
  AdmissionTLS struct, so saving the TLS information in a
  Kubernetes-Secret is an easy use case
- add brief README to the moose example
- adjust Moose example so that it uses the new admission-webhook macro
  from `krator_derive`; add functionality to automatically install the
  CRD and the necessary admission webhook resources on startup
- add build and compilation steps from krator to `justfile` so breaking
  changes in theses crates make the build fail
@kflansburg kflansburg force-pushed the feature/admission-webhook-improvements branch from 1a03183 to 7c47872 Compare April 16, 2021 16:56
@kflansburg kflansburg changed the title [blocked by #574] Feature/admission webhook improvements Feature/admission webhook improvements Apr 16, 2021
Copy link
Collaborator

@kflansburg kflansburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, albeit very complex. IMO the best practice for this would be to emit manifests / kustomize + Makefile in a similar fashion to kubebuilder, with auto-apply being an opt-in feature flag. I do think this gets us closer to that so I'm in favor of merging after some of my comments are addressed.

@thomastaylor312 I think you should take a look at this since it touches a lot of stuff in krator-derive. I'm not sure how common it is to have so many dependencies in a derive crate but it seems like an antipattern.

crates/krator-derive/Cargo.toml Outdated Show resolved Hide resolved
crates/krator-derive/Cargo.toml Outdated Show resolved Hide resolved
crates/krator-derive/src/lib.rs Show resolved Hide resolved
crates/krator-derive/tests/admission.rs Outdated Show resolved Hide resolved
crates/krator/Cargo.toml Show resolved Hide resolved
crates/krator/examples/moose.rs Show resolved Hide resolved
crates/krator/examples/moose.rs Outdated Show resolved Hide resolved
crates/krator/examples/moose.rs Outdated Show resolved Hide resolved
crates/krator-derive/src/admission.rs Show resolved Hide resolved
crates/krator/examples/moose.rs Outdated Show resolved Hide resolved
@kesselborn
Copy link
Contributor Author

@kflansburg sorry for the late reploy and thanks for taking the time to review this rather big PR -- very much appreciated.

Before fixing the comments I would like to discuss again, if it makes sense to make this part of krator_derive. I implemented it there as I was using krator and was missing the functionality. When thinking about it again however, I think none of the additions are related to krator and could be used as well when writing an operator w/o using krator ... whereas pulling in krator_derive without using krator makes little sense.

I think extracting this as an individual crate is probably cleaner? What do you think?

Regarding pulling in a lot of additional dependencies via a proc macro: I was wondering this myself as well, and would be interested in @thomastaylor312 opinion here as well.

Copy link
Member

@thomastaylor312 thomastaylor312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. Thank you for adding tests and cleaning up the derive stuff.! Insofar as I understand, I don't think we need to import any dependencies for the derive macros to work, but I could be wrong

crates/krator-derive/Cargo.toml Outdated Show resolved Hide resolved
crates/krator-derive/src/admission.rs Outdated Show resolved Hide resolved
crates/krator-derive/src/admission.rs Outdated Show resolved Hide resolved
crates/krator-derive/src/admission.rs Outdated Show resolved Hide resolved
crates/krator-derive/src/lib.rs Show resolved Hide resolved
crates/krator-derive/Cargo.toml Outdated Show resolved Hide resolved
/// the certificate from the given service and the service of the given service as configuration
///
/// Example
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example! The one problem (like in a previous comment) that this code cannot depend on krator or we get a weird circular dependency issue. If we can get it so that isn't needed in dev-dependencies, this will work fine. If it does need a dependency on krator, we'll want to move this example there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really a problem in dev-dependencies as well? I got problems when having circular dependencies in dependencies but no complains in dev-dependencies ... but that's perhaps just a cheap works for me ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are ok for now. I'll do some double checking later

crates/krator/examples/moose.rs Show resolved Hide resolved
@kflansburg
Copy link
Collaborator

@kesselborn Given that krator-derive should not depend on krator, does that address your concerns about splitting it out into a new crate? I think for now it makes most sense to me in krator-derive. I think this could still be used without krator.

@kesselborn
Copy link
Contributor Author

@kflansburg regarding the splitting out into a separate crate: the transition macro creates a

impl krator::TransitionTo

block ... doesn't that make it kind of bound to krator?
However: happy to stand corrected, no too familiar what is most common

I was under the assumption that it made sense to require dependencies of
the code that gets produced by the macro so that they are available as
transient dependencies
@thomastaylor312
Copy link
Member

thomastaylor312 commented Apr 21, 2021

To clarify, it doesn't depend on krator in the Cargo dependency sense. I think my comment here explains that a bit more. However, I also think for now this works here, but as people start wanting to use it for things outside of krator, we could split it out to a new crate

Adds three possible attributes for the admission-webhook feature:

- secret
- service
- admission_webhook_config

This way, users can opt-out if they don't need all functions.

All features are documented and the resulting dependencies documented
@kesselborn
Copy link
Contributor Author

@kflansburg / @thomastaylor312 : can we move this forward somehow? I know, it's quite a lot of comments and therefore some work to review.

I resolved all comments apart from the ones, where I had a follow-up

@kflansburg
Copy link
Collaborator

@kflansburg / @thomastaylor312 : can we move this forward somehow? I know, it's quite a lot of comments and therefore some work to review.

I resolved all comments apart from the ones, where I had a follow-up

Apologies, I've been very busy this last week. I will try to take a final look at this soon.

@bacongobbler
Copy link
Collaborator

Hey so I'm trying to run the examples, but I'm running into an issue with the admission webhook being unreachable:

><> k create -f manfred.yaml
Error from server (InternalError): error when creating "manfred.yaml": Internal error occurred: failed calling webhook "mooses.animals.com": Post "https://mooses-animals-com-admission-webhook.default.svc:443/?timeout=10s": dial tcp 10.96.19.121:443: connect: connection refused

I'm trying to follow the guide as directed in the logs:

May 05 09:36:15.807 moose: 
    
Running moose example. Try to install some of the manifests provided in examples/assets

This is with a KinD cluster, in case that matters. I'll try with AKS and see if my results differ.

@bacongobbler bacongobbler self-requested a review May 5, 2021 17:09
@bacongobbler
Copy link
Collaborator

Oddly I receive a different result with AKS:

><> k create -f manfred.yaml
Error from server (InternalError): error when creating "manfred.yaml": Internal error occurred: failed calling webhook "mooses.animals.com": Post "https://mooses-animals-com-admission-webhook.default.svc:443/?timeout=10s": read unix @->/socket/konnectivity: read: connection reset by peer

@bacongobbler
Copy link
Collaborator

Things work just fine without the admission webhook feature enabled.

><> k create -f manfred.yaml
moose.animals.com/manfred created
><> k create -f melissa.yaml
moose.animals.com/melissa created
Running moose example. Try to install some of the manifests provided in examples/assets
    
    
    at crates/krator/examples/moose.rs:529

  May 05 10:36:29.473 krator::runtime: Got a watch restart. Resyncing queue...
    at crates/krator/src/runtime.rs:297

  May 05 10:36:29.473 krator::runtime: Finished resync of objects.
    at crates/krator/src/runtime.rs:300

  May 05 10:36:53.484 moose: Found new moose named manfred!
    at crates/krator/examples/moose.rs:127

  May 05 10:37:00.263 moose: Found new moose named melissa!
    at crates/krator/examples/moose.rs:127

@kesselborn
Copy link
Contributor Author

hey @bacongobbler : do you run the moose example outside the cluster or do you deploy it into the cluster? If you run it outside the cluster, you need to do some "hacking", so that the example can process the mutation requests ... if you look further up in the log it should have respective instructions:

If you run this example outside of Kubernetes (i.e. with `cargo run`), you need to make the webhook available.
Try the script example/assets/use-external-endpoint.sh to redirect webhook traffic to this process. If this
operator runs within Kubernetes and you use the webhook resources provided by the admission-webhook macro, 
make sure your deployment has the following labels set: ...

I tested locally using kind as well ... if you test it with AKS, you would have to make sure, that the Kubernets-Cluster can talk to your process

@kesselborn
Copy link
Contributor Author

so, just to give more context: these admission-webhooks are really just that: webhooks. So, in order to provide a webhook, the example app provides an https-endpoint where Kubernetes will post moose manifests whenever you create a new moose. This endpoint needs to be available through the service mooses-animals-com-admission-webhook. When you run the executable outside the cluster, you need to make the endpoint available to this service ... this is what the script script example/assets/use-external-endpoint.sh will do for local setups like kind.

If you use AKS, you should deploy the moose example into the cluster setting the appropriate labels (they show up in the log as well), as this script will most probably not work.

@bacongobbler
Copy link
Collaborator

Got it. Thanks for the context.

However, even with the script it still doesn't seem to be working. I'm not sure if the admission webhook is picking up on the new endpoint.

><> ./examples/assets/use-external-endpoint.sh
service/mooses-animals-com-admission-webhook patched
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
endpoints/mooses-animals-com-admission-webhook configured
><> k create -f examples/assets/manfred.yaml
Error from server (InternalError): error when creating "examples/assets/manfred.yaml": Internal error occurred: failed calling webhook "mooses.animals.com": Post "https://mooses-animals-com-admission-webhook.default.svc:443/?timeout=10s": dial tcp 10.96.78.189:443: connect: connection refused

Even if it did work, I don't think hard-coding to 172.17.0.1 will work for WSL2 as the network bridge from KinD to the host network is different for that operating system. We documented this process in the HOWTO guide for Krustlet if that helps.

For reference: the patch succeeded, and the endpoint exists.

><> k get svc mooses-animals-com-admission-webhook -o yaml | tail -n 12
spec:
  clusterIP: 10.96.78.189
  clusterIPs:
  - 10.96.78.189
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8443
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
><> k get endpoints
NAME                                   ENDPOINTS         AGE
kubernetes                             172.18.0.2:6443   13m
mooses-animals-com-admission-webhook   172.17.0.1:8443   9m1s

@bacongobbler
Copy link
Collaborator

bacongobbler commented May 5, 2021

As a note: I don't think my comments here are blocking this from going through. I'm just providing my (relatively naive) feedback here to inform you how someone new to the feature is experimenting. Doc/tooling fixes shouldn't block this from going through IMO.

That being said, the concerns raised earlier from @kflansburg about requiring escalated privileges to register an admission webhook seems concerning. That seems like a potential blocker. If one were to find a security flaw in an operator's admission webhook, the attacker now would have escalated cluster privileges. That seems like a very concerning attack vector.

How do other operator frameworks register admission webhooks? Are there ways to avoid the need for privilege escalation? If not, are there ways to mitigate an attack?

@kesselborn
Copy link
Contributor Author

yeah ... testing this outside the cluster is a work around to speed up development and there will never be a really good solution for it. I just tried to reach parity with the existing scripts in the current example code.

The correct way to test this is to deploy the moose operator into a Kubernetes cluster.

Regarding the security concerns: this is an example operator which is optimized for ease of use. You only need more privileges if you want to install necessary resources with the operator binary -- but that's not necessary. There are actually specific functions that will print out the manifests for you, so you can install the resources with kubectl, helm, etc.

Letting the operator produce these manifests is something I like as the operator and the documentation are not diverging -- you can always create the manifests that fit your operator.

Copy link
Collaborator

@kflansburg kflansburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kesselborn We have discussed and believe that it is not a good idea to include this security vulnerability as part of the demo, as it will encourage bad behavior. If you can change it to generate the YAML for the user to manually apply, and update the README, then I will approve. If you really want to keep this functionality, then I would ask that this specific behavior is gated behind a specific feature with danger in the name.

@bacongobbler Were you able to get the Endpoint working? Was the issue with the gateway being used and WSL?

@thomastaylor312 You has also requested changes. Can you indicate whether those have been resolved?

@bacongobbler
Copy link
Collaborator

Thanks for the ping. In the end I was able to get it to work - just a quick misunderstanding of the networking setup on my end.

@kesselborn kesselborn force-pushed the feature/admission-webhook-improvements branch from 9da7a4d to cd74b85 Compare May 7, 2021 16:28
@kesselborn
Copy link
Contributor Author

@kflansburg I do agree that this sets a bad precedent. I adjusted the example accordingly and added a comment to to the apply functions to note, that they should not be called in the operator.

@bacongobbler
Copy link
Collaborator

Providing some additional context as discussed offline...

We (Taylor and I) are maintainers on several large open source projects. Helm being one of them. In Helm 2, our stance was to provide a permissive default configuration. This allowed first-time users to start experimenting with Helm and Kubernetes without having to dive headfirst into the security controls. Unfortunately, this permissive configuration could grant a user a broad range of permissions they weren't intended to have. And for the most part users ignored the security documentation, instead choosing to criticize the project for "not doing the right thing" by default.

Learning from our mistakes, it's best to demonstrate "best practices" as the default rather than skewing towards ease-of-use. Especially when "ease-of-use" means skirting certain security controls.

Another great example of providing insecure defaults can be shown in this article for MongoDB instances being hacked. The attackers hacked instances that did not have password-protected admin accounts. As in, they attacked MongoDB instances using the default settings.

As the Moose demonstration is the de-facto example for krator, I would not be surprised that many users will copy the example and apply it to their projects. Hence the concern.

@thomastaylor312 thomastaylor312 dismissed their stale review May 7, 2021 17:18

All comments were addressed, leaving final review for @kflansburg

@kesselborn
Copy link
Contributor Author

kesselborn commented May 7, 2021

@bacongobbler thanks for the explanation -- I absolutely understand those points and do agree that this would have set a bad example :)

Looking in the failed checks now

as discussed: this doesn't reflect best practice due to necessary
permissions
@kesselborn kesselborn force-pushed the feature/admission-webhook-improvements branch from cd74b85 to ac54528 Compare May 7, 2021 17:29
Copy link
Collaborator

@kflansburg kflansburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience!

@kflansburg kflansburg merged commit c42b7b9 into krustlet:main May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants