Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[devworkspace] Generate webhook server cert with job on kubernetes #17855

Closed
sleshchenko opened this issue Sep 14, 2020 · 19 comments
Closed

[devworkspace] Generate webhook server cert with job on kubernetes #17855

sleshchenko opened this issue Sep 14, 2020 · 19 comments
Assignees
Labels
engine/devworkspace Issues related to Che configured to use the devworkspace controller as workspace engine. kind/enhancement A feature request - must adhere to the feature request template. severity/P2 Has a minor but important impact to the usage or development of the system.
Milestone

Comments

@sleshchenko
Copy link
Member

Is your enhancement related to a problem? Please describe.

Currently. devworkspace controller with webhooks enabled needs certificate.
On the OpenShift we use out-of-the-box to generate such a certificate, for the Kubernetes we have pre-install step to generate those certificates and create them on the cluster.
It's needed to rework that pre-install step to be performed on the same level, as for OpenShift infra: webhook server deploying.
To generate certificates it's proposed to reuse the same job as Che Operators uses https://github.com/eclipse/che-operator/blob/master/pkg/deploy/tls.go#L277

@sleshchenko sleshchenko added kind/enhancement A feature request - must adhere to the feature request template. engine/devworkspace Issues related to Che configured to use the devworkspace controller as workspace engine. labels Sep 14, 2020
@che-bot che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Sep 14, 2020
@sleshchenko sleshchenko added this to the 7.20 milestone Sep 14, 2020
@vparfonov vparfonov added severity/P2 Has a minor but important impact to the usage or development of the system. and removed status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. labels Sep 14, 2020
@sleshchenko sleshchenko added this to To do in Controller team sprint #190 via automation Sep 21, 2020
@JPinkney JPinkney self-assigned this Sep 21, 2020
@sleshchenko sleshchenko moved this from To do to In progress in Controller team sprint #190 Sep 21, 2020
@l0rd
Copy link
Contributor

l0rd commented Oct 6, 2020

Why aren't we using certmanager on Kubernetes? It looks like the industry-trusted tool to manage certificates on Kubernetes.

@sleshchenko
Copy link
Member Author

sleshchenko commented Oct 6, 2020

Better to ask deploy team.
I assume we don't want to be dependent on cert-manager since it's not something we can easily configure out-of-the-box in 100% of cases. So, a job is a really light-weight option to go and there is still an ability for user to precreate/configure TLS secrets created in solution whichever user wants, cert manager or any other.
We can reuse user's generated TLS cert for devworkspace controller but secrets should have the distinct names. To improve that we may want to make these secrets name configurable.

@tolusha @mmorhun Do you have any comments why we moved from cert-manager on operator/chectl side?

@mmorhun
Copy link
Contributor

mmorhun commented Oct 6, 2020

At the first iteration of bringing TLS by default into Eclispe Che we used Cert-Manager with helm installer. At that point it sounded like a perfect solution. However, some experience showed, that it has own downsides. What I recall:

  • It requires a root CA key/certificate if we need server certificate, so there is still a need in a Kubernetes job to generate it.
  • Cert-Manager is deployed via yaml manifest, so it is not easy to customise. For example, it requires cert-manager namespace to be installed into. But sometimes user who installs Che doesn't have the permission to do so and installation just fails. Theoretically we can alter the yamls, but it doesn't sound like a good solution...
  • We need to manually maintain it. From deployment (wait and check that all its components are deployed and ready) to conflicts if it is already installed and maybe has different settings.
  • Cert-Manager has own set of resources (and some pods always running) which consumes resources and has notable impact on Che installation time (mostly due to image pulling, but its startup time also counts).

So, for operator we decided to use our own lightweight job which just have work done. Operator launches it, if there is no existing certificate, so a user can just precreate one manually or via Cert-Manager or another tool and Che will use it.
It is possible to use Cert-Manager, the question is in balance between required effort to implement full integration and then maintain it vs functionality it provides.

@mmorhun mmorhun closed this as completed Oct 6, 2020
Controller team sprint #190 automation moved this from In progress to Done Oct 6, 2020
@mmorhun mmorhun reopened this Oct 6, 2020
Controller team sprint #190 automation moved this from Done to In progress Oct 6, 2020
@sleshchenko
Copy link
Member Author

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io https://operatorhub.io/?keyword=manager
The referenced operator in the doc returns 404 https://cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

@mmorhun
Copy link
Contributor

mmorhun commented Oct 6, 2020

I was always wondering how it is possible to accidentally close issue. It is. Sorry, reopened.

@l0rd
Copy link
Contributor

l0rd commented Oct 7, 2020

@sleshchenko I am sorry for not spotting that before but I don't want us to make bad decisions that we may need to pay in the future. We are not in the certificate management business. Maintaining a certificate manager is something I don't want us to deal with. I want to be really sure that there are no better alternatives.

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io operatorhub.io/?keyword=manager
The referenced operator in the doc returns 404 cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

On OpenShift we should not use certmanager but the cert signing service.

@mmorhun I am not sure if certmanager is the right tool (maybe cfssl is a better choice?) but we definitely should avoid using che-cert-manager-ca-cert-generator-image because that's an hack that works for a PoC.

  • It requires a root CA key/certificate if we need server certificate, so there is still a need in a Kubernetes job to generate it.

I am perfectly ok if we provide a link to the instructions for configuring certmanager. We should not generate the CA key/certificate, the user should do that with its PKI infra.

  • Cert-Manager is deployed via yaml manifest, so it is not easy to customise. For example, it requires cert-manager namespace to be installed into. But sometimes user who installs Che doesn't have the permission to do so and installation just fails. Theoretically we can alter the yamls, but it doesn't sound like a good solution...

Che operator should not install/configure certmanager. It should be a pre-requisite.

  • We need to manually maintain it. From deployment (wait and check that all its components are deployed and ready) to conflicts if it is already installed and maybe has different settings.

Sure but the effort is justified because we will support an industry standard. Whereas maintaining our certificate signer image is something that have less value: customers cannot really use it in production (we do not manage certs rotations, revocation etc...).

  • Cert-Manager has own set of resources (and some pods always running) which consumes resources and has notable impact on Che installation time (mostly due to image pulling, but its startup time also counts).

Again: that should not be part of Che installation time.

@sleshchenko
Copy link
Member Author

@l0rd I'm OK if we'll try to find a better approach.
I believe we started self-signed certs support with tls secrets as prerequisites, we had documentation with an article on how to set up self-signed certs. But at the time we were enabling TLS by default - we came to the issue that we want to automate certificates setting up process with chectl server:start, cause it's not convenient to manually set up certs over and over once you install minikube for example.

Whereas maintaining our certificate signer image is something that have less value: customers cannot really use it in production (we do not manage certs rotations, revocation etc...).

Good point! But we still benefit from using cert-gen job while local testing, as well as users who just getting familiar with Che on their minikubes (if any =) )
Should it be enough to get rid of responsibility for TLS in production if we ?:

  • we disable using cert-gen job by default:
    • chectl server:start fails if there are no secrets;
    • CheCluster CR status is filled with error if TLS secrets are not found;
  • in our documentation have reference to articles how to set up self-signed certificates on the kubernetes, describe which SAN(DNS) we need, in which objects (configmap, secrets) which cert parts are expected(CA, tls.crt, tls.key), where their names can be configured;
  • solve the issue with local testing by:
    • introduce field in CheCluster CR(probably some hidden one, like CHE_MULTIUSER in customCheProperties) to enable cert generated by Che Operator;
    • chectl start could have some option with a clear description that it IS NOT production-ready and DO NOT USE it for real clusters, tests only;

If we agree on some plan like above - on devworkspace controller side, we won't make cert gen as controller part and like previously we'll generate certificates as deployment prerequisites(now one for webhook server, later +1 for workspaces endpoints), with a non-production makefile rule that will init certificates on minikube;

Devworkspace Operator aspect:
The only difference I see here: Che operator can be installed properly without TLS certificates and then fail/expose error during setting up CheCluster CR but DevWorkspace Operator is going to fail during installing without TLS certs, unless we agree that we don't support webhooks on the K8s cluster(which mean creator access only is omitted) (I don't think it's typical for operator to have such any pre-requisites, any example for such operator?)
AFAIK OLM could provide certificates for webhook manager for us (not sure it's implemented for k8s) but if we use OLM for that - we have unsecured workspaces-resources(pods) once DevWorkspace Operator is uninstalled (again creator access only);
Hopefully, at time we'll provide DevWorkspace Operator on K8s - OLM will solve all the issues for us, like provide certs (if it's not there yet) + clean up CR along with all related resources on opetaror uninstalling (Angel already created an issue on OLM side).

P.S. Sorry for too many words, I just tried to be clear enough and it does not seem to be a simple topic where I can be short at well.
cc @amisevsk @JPinkney

@l0rd
Copy link
Contributor

l0rd commented Oct 7, 2020

I don't understand. The nginx ingress controller comes with a default TLS certificate. If the secret in the Ingress is omitted then the nginx default one will be used. What's the motivation that pushed us to generate an untrusted certificate when we could use nginx default one? I am talking mainly of minikube and I have an hello world sample here that works on minikube.

@l0rd
Copy link
Contributor

l0rd commented Oct 7, 2020

For devworkspace I would disable the webhook if no cert+key is provided and the cluster doesn't provide a cert signing service.

@tolusha
Copy link
Contributor

tolusha commented Oct 8, 2020

@mmorhun
WDYT ?

@sleshchenko
Copy link
Member Author

sleshchenko commented Oct 8, 2020

The nginx ingress controller comes with a default TLS certificate. If the secret in the Ingress is omitted then the nginx default one will be used.

Thanks for sharing. For some reason I missed that feature at time I worked with self-signed certs.
I tested it and as far as I see, despite of the ability for a user to add an exclusion in the browser it's not a valid self-signed certificate for Che:
Screenshot_20201008_103928

For compare: Che generated self-signed cert details

Screenshot_20201008_104519

I think it means that we won't be able to configure properly Che components'(plugin broker, Che Server, Theia) trust stores, it's going to fail if we propagate default nginx cert CA because of DNS mismatch. The only way we can make Che working with such default cert - configure everything to ignore TLS verification.

P.S.

Nginx generates certificates for its webhook server with https://github.com/jet/kube-webhook-certgen job on minikube

Screenshot_20201008_110910
Screenshot_20201008_122830
Screenshot_20201008_122720

I'm not trying to use it as argument to go that way, it's just a fact

@mmorhun
Copy link
Contributor

mmorhun commented Oct 8, 2020

What's the motivation that pushed us to generate an untrusted certificate when we could use nginx default one? I am talking mainly of minikube and I have an hello world sample here that works on minikube.

@l0rd it was done to have valid self-signed certificate. In case of nginx default certificate we have DNS mismatch. Moreover, some tools require CA certificate, but nginx default one is just self-signed server certificate.

@l0rd
Copy link
Contributor

l0rd commented Oct 8, 2020

@sleshchenko @mmorhun you are right about the default ngninx cerficate. I have created a separate issue for a kube cluster with a valid certificate though #18079. And with single host + usage of internal services rather than ingresses we should not need to generate a cert even with that invalid cert (I will log a separate issue for that too).

@davidfestal
Copy link
Contributor

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io operatorhub.io/?keyword=manager
The referenced operator in the doc returns 404 cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

On a 4.5.4 CRC cluster, see 2 cert-manager proposed in the OperatorHub:

  • One which is of the Certified provider type,
  • One with the Marketplace provider type, but when you click on the Purchase link, it doesn't find it on the marketplace catalog (probably the same prolem as when following the cert-manager doc). However the image is the following: registry.marketplace.redhat.com/rhm/jetstack/cert-manager-operator@sha256:... so I assume it is still expected to have that in the new RH marketplace also.

I also tested on a 4.6.0.rc0 cluster started with cluster-bot, and it has the same 2 cert-manager operators available. I could install the certified one without any problem.

As for the fact that is is not on the Operator.io website, let's remember that this site is for the K8S version of operators (not for the Openshift-dedicated nor certified ones).

@sleshchenko
Copy link
Member Author

David thanks for sharing info. It's was mistake to reference OpenShift 4.6. I mainly wanted to solve an issue with a certificate manager for devworkspace operator on kubernetes.

@sleshchenko
Copy link
Member Author

@l0rd We merged the PR since a pretty good job is done there and it solves our issue with multiple deployments support for k8s and OpenShift. Thank you for raising an issue with preferences not to use Che dependent cert gen job. I've created a separate issue to agree on a better alternative devfile/devworkspace-operator#170
But at this time of alpha, it's not critical. Priority is integration with Che.

@sleshchenko sleshchenko moved this from In progress to Done in Controller team sprint #190 Oct 9, 2020
@l0rd
Copy link
Contributor

l0rd commented Oct 9, 2020

thank you @sleshchenko

@davidfestal
Copy link
Contributor

Thanks for the details @sleshchenko.
Just want to point to the last documentation about webhooks in Kubebuilder, which is now completely the basis of the new OperatorSDK 1.0:
https://book.kubebuilder.io/cronjob-tutorial/running.html and https://book.kubebuilder.io/cronjob-tutorial/cert-manager.html
But surely you already know this doc.

@amisevsk
Copy link
Contributor

amisevsk commented Oct 9, 2020

Total aside, but I'm worried about updating to Operator SDK 1.0 -- I poked a bit at it last week and it seems like it could potentially be a very error-prone process (the upgrade guide is "bootstrap a new project and copy file contents over").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine/devworkspace Issues related to Che configured to use the devworkspace controller as workspace engine. kind/enhancement A feature request - must adhere to the feature request template. severity/P2 Has a minor but important impact to the usage or development of the system.
Projects
No open projects
Development

No branches or pull requests

9 participants