Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create http01 ClusterIssuer with DigitalOcean provider using new static manifests #1149

Closed
lloeki opened this issue Dec 14, 2018 · 18 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@lloeki
Copy link

lloeki commented Dec 14, 2018

Describe the bug:

Following documentation to install with static manifests, then attempting to create an Issuer or ClusterIssuer on a fresh DO k8s cluster results in the following error:

Error from server (InternalError): error when creating "30-staging-clusterissuer.yml": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Issue seems quite different from #1103.

Expected behaviour:

An Issuer or ClusterIssuer should be able to be created after following documentation instructions for static manifests.

Steps to reproduce the bug:

Anything else we need to know?:

I first tried to setup cert-manager 0.5.2 with static manifests but the documentation is severely lacking, CRDs are absent, and there are a number of other missing things, like pods failing to start with log output missing secret "webhook-ca".

While looking for a way to solve it I noticed that there seems to be quite a refactoring with much better documentation and manifests on master. With CRDs set up and namespace created, everything seemed to be in order except that I had to apply -f with --validate=false due to #1143.

I then proceeded to create a ClusterIssuer following this part of the documentation:

$ kubectl apply -f 30-staging-clusterissuer.yml 
Error from server (InternalError): error when creating "30-staging-clusterissuer.yml": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

Nothing of significance appears in the pod logs.

Since there were no match in cert-manager issues I looked for similar errors in other kubernetes projects involving admission webhooks and found this.

Check that your cluster has aggregate api server enabled. Test that the configmap extension-apiserver-authentication-reader in kube-system namespace has key requestheader-client-ca-file

by running kubectl describe configmap -n kube-system extension-apiserver-authentication, which does contain requestheader-client-ca-file.

So I ran kubectl get apiservice clusterissuers.admission.certmanager.k8s.io -o yaml which I suppose is expected to return something along the lines of:

status:
  conditions:
  - lastTransitionTime: 2018-02-27T07:59:50Z
    message: all checks passed
    reason: Passed
    status: "True"
    type: Available

But returns this error instead:

Error from server (NotFound): apiservices.apiregistration.k8s.io "clusterissuers.admission.certmanager.k8s.io" not found

So I tried a more general kubectl get apiservice for which the output contains a single reference to anything related to certmanager.k8s.io:

v1beta1.admission.certmanager.k8s.io   cert-manager/cert-manager-webhook   False (FailedDiscoveryCheck)   26m

with no trace of issuers.admission.certmanager.k8s.io, clusterissuers.admission.certmanager.k8s.io, or certificates.admission.certmanager.k8s.io.

The output of kubectl describe apiservice v1beta1.admission.certmanager.k8s.io contains:

Status:
  Conditions:
    Last Transition Time:  2018-12-14T14:34:51Z
    Message:               no response from https://10.245.20.156:443: Get https://10.245.20.156:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available

Environment details::

  • Kubernetes version (e.g. v1.10.2): v1.12.3
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): DigitalOcean
  • cert-manager version (e.g. v0.4.0): v0.6.0 (master)
  • Install method (e.g. helm or static manifests): static manifests

/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 14, 2018
@lloeki
Copy link
Author

lloeki commented Dec 14, 2018

Note that on master there is now only a single static manifest that does include the webhook thing, whereas previously there was a with-rbac.yaml and with-rbac-webhook.yaml (removed in cdd513c)

@lloeki
Copy link
Author

lloeki commented Dec 14, 2018

Using non-webhook cert-manager from 4283138 (parent of cdd513c) allows for ClusterIssuer to be created.

Is the removal of the non-webhook manifest intentional?

@munnerz
Copy link
Member

munnerz commented Dec 14, 2018

Hey - it looks like you’re trying to install the manifests and follow the instructions for ‘master’ instead of the latest stable release (v0.5.2).

As you’ve noticed, we’ve removed the separation between different manifest types in favour of an all-in-one bundle for the upcoming v0.6 release.

For the time being, check out the documentation for ‘latest’ (or more specifically, release-0.5). This will guide you through using either the static manifests or the Helm chart as you’ve noted.

0.6 isn’t available yet, as some additional documentation needs putting together (although features in the project itself are done!)

You should have a much smoother experience if you stick to the latest release branch of the project 😄

@munnerz
Copy link
Member

munnerz commented Dec 14, 2018

For a bit of clarity on this issue, can you share the step by step commands you're running, and exactly which guide you're following and running into these problems?

@lloeki
Copy link
Author

lloeki commented Dec 17, 2018

Seems like you didn't quite catch it, or maybe I wasn't clear... I admit it's quite a convoluted story. As I mentioned earlier (see the links of my original message):

Attempt to install 0.5.2

I first tried to install 0.5.2 following the matching "latest" documentation (not master), the only instructions being:

With static manifests

As some users may not be able to run Tiller in their own environment, static Kubernetes deployment manifests are provided which can be used to install cert-manager.

You can get a copy of the static manifests from the deploy directory.

Commands used:

kubectl apply -f with-rbac-webhook.yaml # fails because of missing CRD
kubectl apply -f 00-crds.yml (from master)
kubectl apply -f with-rbac-webhook.yaml # success

... except now pods won't start due to missing webhook-ca secret on one pod and another error (I can't recall) on the other pod.

Attempt to install 0.6 from master (i.e with webhook)

I followed the exact instructions here except with --validate=false due to #1143.

kubectl apply -f 00-crds.yml
kubectl create namespace cert-manager
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
kubectl apply -f cert-manager.yml --validate=false

Attempt to install 0.6 without webhook

Same commands, except cert-manager.yml comes from a few commits earlier, and without --validate (since that's not an issue).

kubectl apply -f 00-crds.yml
kubectl create namespace cert-manager
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
kubectl apply -f cert-manager.yml

@lloeki
Copy link
Author

lloeki commented Dec 17, 2018

As you’ve noticed, we’ve removed the separation between different manifest types in favour of an all-in-one bundle for the upcoming v0.6 release.

RBAC-only is the way to go for sure but how are you supposed to install the no-webhook variant with a single static manifest? Or is there no no-webhook option anymore by design?

You should have a much smoother experience if you stick to the latest release branch of the project

That's what I thought at first but to be honest as a static manifest user, master was a much more pleasant experience ;)

@LEI
Copy link

LEI commented Dec 24, 2018

Before stumbling on this issue I also first tried to deploy from the master branch without success.

With v0.5, after creating webhook-ca and webhook-webhook-tls secrets as well as a ClusterIssuer, cert-manager logs this:

controller.go:140] clusterissuers controller: syncing item 'letsencrypt-staging'
logger.go:88] Calling GetAccount
setup.go:181] letsencrypt-staging: verified existing registration with ACME server
helpers.go:147] Setting lastTransitionTime for ClusterIssuer "letsencrypt-staging" condition "Ready" to 2018-12-24 05:00:43.673152723 +0000 UTC m=+242.213380853
controller.go:149] clusterissuers controller: Re-queuing item "letsencrypt-staging" due to error processing: Internal error occurred: failed calling webhook "clusterissuers.admission.certmanager.k8s.io": the server is currently unable to handle the request

@adammartin
Copy link

Is there a resolution here? It looks very similar to what I'm seeing:

Error from server (InternalError): error when creating "cluster-issuer-prod.yaml": Internal error occurred: failed calling admission webhook "clusterissuers.admission.certmanager.k8s.io": the server could not find the requested resource

@lloeki
Copy link
Author

lloeki commented Jan 25, 2019

As a workaround, my last attempt in one of the above comment is successful.

@munnerz
Copy link
Member

munnerz commented Feb 7, 2019

Please take a read of the webhook component documentation, as well as the troubleshooting instructions now available in the docs: https://cert-manager.readthedocs.io/en/latest/getting-started/webhook.html

I'm going to close this issue, as it's a deployment configuration problem that we have documentation to cover. If you're still running into problems, feel free to jump onto our Slack channel and we can work through it to get things sorted 😄

@munnerz munnerz closed this as completed Feb 7, 2019
@tsuna
Copy link

tsuna commented Feb 26, 2019

I upgraded from v0.6.0 to v0.6.2 (helm chart version v0.6.6) and I'm running into this issue too.

$ helm upgrade --version v0.6.6 cert-manager stable/cert-manager
Release "cert-manager" has been upgraded. Happy Helming!
[...]
$ kubectl logs -n cert-manager cert-manager-6d47b6c444-nndq9
I0226 12:09:09.343842       1 start.go:81] starting cert-manager v0.6.2 (revision f5e1477bd7ced69e53a233484905fea16bf4102f)
I0226 12:09:09.344818       1 controller.go:141] Using the following nameservers for DNS01 checks: [100.64.0.10:53]
I0226 12:09:09.348440       1 leaderelection.go:193] attempting to acquire leader lease  cert-manager/cert-manager-controller...
I0226 12:10:12.854995       1 leaderelection.go:202] successfully acquired lease cert-manager/cert-manager-controller
I0226 12:10:12.863311       1 metrics.go:145] Listening on http://0.0.0.0:9402
I0226 12:10:12.868304       1 controller.go:82] Starting challenges controller
I0226 12:10:12.871519       1 controller.go:82] Starting orders controller
I0226 12:10:12.871626       1 controller.go:82] Starting certificates controller
I0226 12:10:12.871677       1 controller.go:82] Starting clusterissuers controller
I0226 12:10:12.871716       1 controller.go:82] Starting ingress-shim controller
I0226 12:10:12.871766       1 controller.go:82] Starting issuers controller
I0226 12:10:12.977568       1 controller.go:142] issuers controller: syncing item 'cert-manager/cert-manager-webhook-ca'
I0226 12:10:12.982960       1 setup.go:69] Signing CA verified
I0226 12:10:12.983069       1 controller.go:148] issuers controller: Finished processing work item "cert-manager/cert-manager-webhook-ca"
I0226 12:10:12.983115       1 controller.go:142] issuers controller: syncing item 'cert-manager/cert-manager-webhook-selfsign'
I0226 12:10:12.983152       1 controller.go:148] issuers controller: Finished processing work item "cert-manager/cert-manager-webhook-selfsign"
I0226 12:10:12.983274       1 controller.go:145] certificates controller: syncing item 'cert-manager/cert-manager-webhook-ca'
I0226 12:10:12.983641       1 sync.go:399] Certificate received from server has a validity duration of 2160h0m0s. The requested certificate validity duration was 43800h0m0s
I0226 12:10:12.983694       1 sync.go:263] Certificate cert-manager/cert-manager-webhook-ca scheduled for renewal in 716h42m18.016314625s
I0226 12:10:12.984253       1 controller.go:141] clusterissuers controller: syncing item 'letsencrypt-prod-dns'
I0226 12:10:12.985293       1 setup.go:149] Skipping re-verifying ACME account as cached registration details look sufficient.
I0226 12:10:12.985392       1 controller.go:147] clusterissuers controller: Finished processing work item "letsencrypt-prod-dns"
I0226 12:10:12.985448       1 controller.go:173] ingress-shim controller: syncing item 'my-namespace/my-app-example-com'
I0226 12:10:12.985543       1 sync.go:177] Certificate "my-app-ssl-cert-tls" for ingress "my-app-example-com" already exists
I0226 12:10:12.985580       1 sync.go:180] Certificate "my-app-ssl-cert-tls" for ingress "my-app-example-com" is up to date
I0226 12:10:12.985608       1 controller.go:179] ingress-shim controller: Finished processing work item "my-namespace/my-app-example-com"
I0226 12:10:12.986120       1 controller.go:145] certificates controller: syncing item 'cert-manager/cert-manager-webhook-webhook-tls'
I0226 12:10:12.987876       1 sync.go:399] Certificate received from server has a validity duration of 2160h0m0s. The requested certificate validity duration was 8760h0m0s
I0226 12:10:12.987920       1 sync.go:263] Certificate cert-manager/cert-manager-webhook-webhook-tls scheduled for renewal in 716h42m23.012087687s
I0226 12:10:12.988278       1 controller.go:145] certificates controller: syncing item 'my-namespace/my-app-ssl-cert-tls'
I0226 12:10:12.988919       1 sync.go:263] Certificate my-namespace/my-app-ssl-cert-tls scheduled for renewal in 404h9m1.011088757s
I0226 12:10:12.992518       1 controller.go:151] certificates controller: Finished processing work item "cert-manager/cert-manager-webhook-ca"
I0226 12:10:13.003848       1 controller.go:151] certificates controller: Finished processing work item "cert-manager/cert-manager-webhook-webhook-tls"
E0226 12:10:13.016427       1 controller.go:147] certificates controller: Re-queuing item "my-namespace/my-app-ssl-cert-tls" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
I0226 12:10:17.903387       1 controller.go:141] clusterissuers controller: syncing item 'letsencrypt-prod-dns'
I0226 12:10:17.903715       1 setup.go:149] Skipping re-verifying ACME account as cached registration details look sufficient.
I0226 12:10:17.903777       1 controller.go:147] clusterissuers controller: Finished processing work item "letsencrypt-prod-dns"
I0226 12:10:17.903826       1 controller.go:142] issuers controller: syncing item 'cert-manager/cert-manager-webhook-ca'
I0226 12:10:17.904311       1 setup.go:69] Signing CA verified
I0226 12:10:17.904383       1 controller.go:148] issuers controller: Finished processing work item "cert-manager/cert-manager-webhook-ca"
I0226 12:10:18.016559       1 controller.go:145] certificates controller: syncing item 'my-namespace/my-app-ssl-cert-tls'
I0226 12:10:18.017328       1 sync.go:263] Certificate my-namespace/my-app-ssl-cert-tls scheduled for renewal in 404h8m55.9826814s
E0226 12:10:18.023576       1 controller.go:147] certificates controller: Re-queuing item "my-namespace/my-app-ssl-cert-tls" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
I0226 12:10:28.023852       1 controller.go:145] certificates controller: syncing item 'my-namespace/my-app-ssl-cert-tls'
I0226 12:10:28.028310       1 sync.go:263] Certificate my-namespace/my-app-ssl-cert-tls scheduled for renewal in 404h8m45.971701408s
E0226 12:10:28.033980       1 controller.go:147] certificates controller: Re-queuing item "my-namespace/my-app-ssl-cert-tls" due to error processing: Internal error occurred: failed calling admission webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request

I'm not on EKS (I'm on AWS but running k8s myself on EC2 instances). I've looked at the doc linked to in the previous comment and haven't found anything helpful.

$ kubectl get issuer --namespace cert-manager
NAME                            AGE
cert-manager-webhook-ca         30d
cert-manager-webhook-selfsign   30d
$ kubectl get certificate -o wide --namespace cert-manager
NAME                               READY   SECRET                             ISSUER                          STATUS                                          AGE
cert-manager-webhook-ca            True    cert-manager-webhook-ca            cert-manager-webhook-selfsign   Certificate is up to date and has not expired   30d
cert-manager-webhook-webhook-tls   True    cert-manager-webhook-webhook-tls   cert-manager-webhook-ca         Certificate is up to date and has not expired   30d
$ kubectl get cronjob -n cert-manager
NAME                           SCHEDULE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
cert-manager-webhook-ca-sync   @weekly    False     0        2d              30d

What am I missing?

@Yanson
Copy link

Yanson commented Feb 26, 2019

See: helm/charts#10869 (comment)

@tsuna
Copy link

tsuna commented Feb 26, 2019

Thanks @Yanson, that steered me in the right direction. Found this in my k8s apiserver logs:

E0226 23:17:02.164188       1 available_controller.go:311] v1beta1.admission.certmanager.k8s.io failed with: Get https://100.71.25.209:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0226 23:17:02.681617       1 get.go:245] Starting watch for /api/v1/secrets, rv=38009789 labels= fields= timeout=8m56s
I0226 23:17:07.445565       1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.admission.certmanager.k8s.io
E0226 23:17:07.445702       1 controller.go:111] loading OpenAPI spec for "v1beta1.admission.certmanager.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[X-Content-Type-Options:[nosniff] Content-Type:[text/plain; charset=utf-8]]
I0226 23:17:07.445713       1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.admission.certmanager.k8s.io: Rate Limited Requeue.

That API address corresponds to the service:

$ kubectl -n cert-manager get svc
NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
cert-manager-webhook   ClusterIP   100.71.25.209   <none>        443/TCP   30d

I can confirm that I can't hit it from within the apiserver's pod:

$ kubectl exec -it -n kube-system kube-apiserver-ip-x-x-x-x.us-west-2.compute.internal sh
/ # wget -S https://100.71.25.209:443
Connecting to 100.71.25.209:443 (100.71.25.209:443)
<hang>

I'm not on EKS/GKE, if anyone has any pointers on how to fix this, please let me know! Edit: just to be clear, I am on a mostly default kops-created cluster on AWS.

@munnerz
Copy link
Member

munnerz commented Mar 21, 2019

@tsuna looking at the output here, it looks like you've misconfigured your Kubernetes cluster's service CIDR (unless I am mistaken!)

100.71.25.209:443 is a public IP address, and so you're almost definitely going to get funky behaviour as (presumably) you don't own it, meaning the apiserver may use the public gateway as a route instead of routing locally.

@tsuna
Copy link

tsuna commented Mar 21, 2019

It's not a public IP address. https://tools.ietf.org/html/rfc6598#section-7

7. IANA Considerations

IANA has recorded the allocation of an IPv4 /10 for use as Shared Address Space.

The Shared Address Space address range is 100.64.0.0/10.

It's the default range used by kops, I believe because it doesn't conflict with anything else on AWS.

@Ashraf-Hassan
Copy link

I am also seeing in my api logs the same error:
I0404 11:59:10.925402 1 controller.go:105] OpenAPI AggregationController: Processing item v1beta1.admission.certmanager.k8s.io
E0404 11:59:10.933941 1 controller.go:111] loading OpenAPI spec for "v1beta1.admission.certmanager.k8s.io" failed with: OpenAPI spec does not exists
I0404 11:59:10.933953 1 controller.go:119] OpenAPI AggregationController: action for item v1beta1.admission.certmanager.k8s.io: Rate Limited Requeue.

I thought of opening a new issue, but it is the same here, I have 2 issuers (below), one using let's encrypt, and one is self signed, I managed to create certificates using the self signed issuere, how can I solve this issue:

kubectl describe issuer -n ingress

Name: cert-manager-webhook-ca
Namespace: ingress
Labels: app=webhook
chart=webhook-v0.6.4
heritage=Tiller
release=cert-manager
Annotations:
API Version: certmanager.k8s.io/v1alpha1
Kind: Issuer
Metadata:
Creation Timestamp: 2019-03-29T10:43:06Z
Generation: 2
Resource Version: 3977934
Self Link: /apis/certmanager.k8s.io/v1alpha1/namespaces/ingress/issuers/cert-manager-webhook-ca
UID: 6ff231d4-520f-11e9-9078-001a4a16021e
Spec:
Ca:
Secret Name: cert-manager-webhook-ca
Status:
Conditions:
Last Transition Time: 2019-03-29T10:44:23Z
Message: Signing CA verified
Reason: KeyPairVerified
Status: True
Type: Ready
Events:

Name: cert-manager-webhook-selfsign
Namespace: ingress
Labels: app=webhook
chart=webhook-v0.6.4
heritage=Tiller
release=cert-manager
Annotations:
API Version: certmanager.k8s.io/v1alpha1
Kind: Issuer
Metadata:
Creation Timestamp: 2019-03-29T10:43:06Z
Generation: 2
Resource Version: 3977935
Self Link: /apis/certmanager.k8s.io/v1alpha1/namespaces/ingress/issuers/cert-manager-webhook-selfsign
UID: 6ff2aefc-520f-11e9-9078-001a4a16021e
Spec:
Self Signed:
Status:
Conditions:
Last Transition Time: 2019-03-29T10:44:23Z
Message:
Reason: IsReady
Status: True
Type: Ready
Events:

@async-dna
Copy link

Following along with interest in your progress @tsuna, as my team is seeing similar issues in a kops created cluster running in EC2.

@async-dna
Copy link

@tsuna further analysis showed that for us it was only a single master node that couldn't reach the cert-manager-webhook service. restarting that master node magically fixed everything. ¯_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

9 participants