Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: webhook context deadline exceeded #5180

Closed
irbekrm opened this issue Jun 6, 2022 · 8 comments
Closed

e2e flake: webhook context deadline exceeded #5180

irbekrm opened this issue Jun 6, 2022 · 8 comments
Labels
flake/test-logic Indicates that the issue relates to test flakes caused by incorrect logic inside a test kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@irbekrm
Copy link
Collaborator

irbekrm commented Jun 6, 2022

We occassionally see a 'context deadline exceeded errors in our e2e tests when creating cert-manager resources from test, i.e https://prow.build-infra.jetstack.net/view/gs/jetstack-logs/logs/ci-cert-manager-e2e-v1-23/1533685236070617088

Unexpected error:
    <*errors.StatusError | 0xc001c33180>: {
        ErrStatus: {
            TypeMeta: {Kind: "", APIVersion: ""},
            ListMeta: {
                SelfLink: "",
                ResourceVersion: "",
                Continue: "",
                RemainingItemCount: nil,
            },
            Status: "Failure",
            Message: "Internal error occurred: failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s\": context deadline exceeded",
            Reason: "InternalError",
            Details: {
                Name: "",
                Group: "",
                Kind: "",
                UID: "",
                Causes: [
                    {
                        Type: "",
                        Message: "failed calling webhook \"webhook.cert-manager.io\": failed to call webhook: Post \"https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s\": context deadline exceeded",
                        Field: "",
                    },
                ],
                RetryAfterSeconds: 0,
            },
            Code: 500,
        },
    }
    Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
occurred
/home/prow/go/src/github.com/cert-manager/cert-manager/test/e2e/suite/conformance/certificates/acme/acme.go:261

The webhook logs from that test run has EOFs and broken pipe errors, but those may be related to #4594

...
2022-06-06T06:01:48.019977103Z stderr F I0606 06:01:48.015364       1 logs.go:59] http: TLS handshake error from 10.244.0.1:61022: EOF
2022-06-06T06:01:48.110820701Z stderr F E0606 06:01:48.015692       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:25889: write: broken pipe"  
2022-06-06T06:01:48.121549398Z stderr F I0606 06:01:48.121296       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:48.125867909Z stderr F E0606 06:01:48.071148       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:17932: write: broken pipe"  
2022-06-06T06:01:48.127004356Z stderr F I0606 06:01:48.126420       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:48.501797416Z stderr F E0606 06:01:48.501584       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:41779: write: broken pipe"  
2022-06-06T06:01:48.50183594Z stderr F I0606 06:01:48.501654       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:48.588277436Z stderr F I0606 06:01:48.587808       1 logs.go:59] http: TLS handshake error from 10.244.0.1:32167: EOF
2022-06-06T06:01:48.590765815Z stderr F E0606 06:01:48.589725       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:37760: write: broken pipe"  
2022-06-06T06:01:48.590783038Z stderr F I0606 06:01:48.589777       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:53.002032628Z stderr F E0606 06:01:53.001867       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:5630: write: broken pipe"  
2022-06-06T06:01:55.73233795Z stderr F I0606 06:01:53.562086       1 logs.go:59] http: TLS handshake error from 10.244.0.1:3672: EOF
2022-06-06T06:01:55.758806156Z stderr F I0606 06:01:55.758338       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:56.909861791Z stderr F I0606 06:01:56.907310       1 logs.go:59] http: TLS handshake error from 10.244.0.1:6921: EOF
2022-06-06T06:01:56.916836431Z stderr F E0606 06:01:56.913972       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:61581: write: broken pipe"  
2022-06-06T06:01:56.91684869Z stderr F I0606 06:01:56.914013       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:56.945784762Z stderr F E0606 06:01:56.929048       1 server.go:329] cert-manager/webhook "msg"="failed to encode response body" "error"="write tcp 10.244.0.9:10250->10.244.0.1:17735: write: broken pipe"  
2022-06-06T06:01:56.945811362Z stderr F I0606 06:01:56.929108       1 logs.go:59] http: superfluous response.WriteHeader call from github.com/cert-manager/cert-manager/pkg/webhook/server.(*Server).handle.func1 (server.go:330)
2022-06-06T06:01:57.426687481Z stderr F I0606 06:01:57.426414       1 logs.go:59] http: TLS handshake error from 10.244.0.1:53010: EOF
...

kube apiserver logs from that run does not give any additional information:

...
022-06-06T06:01:46.769244525Z stderr F Trace[1744729817]: [1.410369738s] [1.410369738s] END
2022-06-06T06:01:46.817224179Z stderr F E0606 06:01:46.816806       1 available_controller.go:524] v1alpha1.acme.testing.cert-manager.io failed with: failing or missing response from https://10.96.2.15:443/apis/acme.testing.cert-manager.io/v1alpha1: Get "https://10.96.2.15:443/apis/acme.testing.cert-manager.io/v1alpha1": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2022-06-06T06:01:46.844803134Z stderr F I0606 06:01:46.844316       1 trace.go:205] Trace[1658207756]: "Delete" url:/apis/kyverno.io/v1alpha1/namespaces/kyverno/reportchangerequests/rcr-mzrm5,user-agent:kyverno/v0.0.0 (linux/amd64) kubernetes/$Format,audit-id:799cccba-249c-4b0d-965f-b4762c5c089c,client:10.244.0.16,accept:application/json,protocol:HTTP/2.0 (06-Jun-2022 06:01:45.914) (total time: 929ms):
2022-06-06T06:01:46.844828459Z stderr F Trace[1658207756]: ---"Object deleted from database" 903ms (06:01:46.844)
2022-06-06T06:01:46.844834815Z stderr F Trace[1658207756]: [929.726503ms] [929.726503ms] END
2022-06-06T06:01:46.883764765Z stderr F I0606 06:01:46.883589       1 trace.go:205] Trace[308899535]: "Call validating webhook" configuration:cert-manager-webhook,webhook:webhook.cert-manager.io,resource:cert-manager.io/v1, Resource=certificates,subresource:status,operation:UPDATE,UID:c0eecb90-2168-4f42-ba83-faac7c483259 (06-Jun-2022 06:01:37.710) (total time: 9172ms):
2022-06-06T06:01:46.883817231Z stderr F Trace[308899535]: ---"Request completed" 9172ms (06:01:46.883)
2022-06-06T06:01:46.883823137Z stderr F Trace[308899535]: [9.172744533s] [9.172744533s] END
2022-06-06T06:01:46.886266696Z stderr F I0606 06:01:46.886132       1 trace.go:205] Trace[1601827349]: "Call mutating webhook" configuration:cert-manager-webhook,webhook:webhook.cert-manager.io,resource:cert-manager.io/v1, Resource=issuers,subresource:,operation:CREATE,UID:164f1fdf-5c59-4fd2-a932-e35a1561beff (06-Jun-2022 06:01:36.874) (total time: 10011ms):
2022-06-06T06:01:46.886295342Z stderr F Trace[1601827349]: [10.011745795s] [10.011745795s] END
2022-06-06T06:01:46.886316385Z stderr F W0606 06:01:46.886213       1 dispatcher.go:195] Failed calling webhook, failing closed webhook.cert-manager.io: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
2022-06-06T06:01:46.890592642Z stderr F I0606 06:01:46.890455       1 trace.go:205] Trace[1654099403]: "Create" url:/apis/cert-manager.io/v1/namespaces/e2e-tests-certificates-tm75v/issuers,user-agent:Go-http-client/2.0,audit-id:de2c9de4-8b76-42a5-9b69-90b7876a8931,client:192.168.0.1,accept:application/json, */*,protocol:HTTP/2.0 (06-Jun-2022 06:01:36.873) (total time: 10017ms):
2022-06-06T06:01:46.890622555Z stderr F Trace[1654099403]: [10.017056613s] [10.017056613s] END
2022-06-06T06:01:46.898551639Z stderr F I0606 06:01:46.898402       1 trace.go:205] Trace[1330309164]: "Call mutating webhook" configuration:cert-manager-webhook,webhook:webhook.cert-manager.io,resource:cert-manager.io/v1, Resource=certificates,subresource:,operation:CREATE,UID:6501038f-a3cd-404e-98c0-4e986803368e (06-Jun-2022 06:01:36.898) (total time: 10000ms):
2022-06-06T06:01:46.898618177Z stderr F Trace[1330309164]: [10.000290089s] [10.000290089s] END
2022-06-06T06:01:46.898948365Z stderr F W0606 06:01:46.898799       1 dispatcher.go:195] Failed calling webhook, failing closed webhook.cert-manager.io: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
2022-06-06T06:01:46.899569554Z stderr F I0606 06:01:46.899463       1 trace.go:205] Trace[56390371]: "Create" url:/apis/cert-manager.io/v1/namespaces/e2e-tests-certificates-x5jkk/certificates,user-agent:e2e.test/v0.0.0 (linux/amd64) kubernetes/$Format,audit-id:a8661e06-8aa0-459a-9209-c1b194ac1963,client:192.168.0.1,accept:application/json, */*,protocol:HTTP/2.0 (06-Jun-2022 06:01:36.897) (total time: 10002ms):
2022-06-06T06:01:46.899585161Z stderr F Trace[56390371]: [10.002056485s] [10.002056485s] END
2022-06-06T06:01:46.980170937Z stderr F I0606 06:01:46.980008       1 trace.go:205] Trace[1025096483]: "GuaranteedUpdate etcd3" type:*unstructured.Unstructured (06-Jun-2022 06:01:34.868) (total time: 12111ms):
2022-06-06T06:01:46.980208284Z stderr F Trace[1025096483]: ---"Transaction prepared" 12015ms (06:01:46.883)
2022-06-06T06:01:46.980214438Z stderr F Trace[1025096483]: [12.111819534s] [12.111819534s] END
2022-06-06T06:01:46.98209891Z stderr F I0606 06:01:46.981975       1 trace.go:205] Trace[191612969]: "Patch" url:/apis/cert-manager.io/v1/namespaces/e2e-tests-certificates-rvfn4/certificates/testcert/status,user-agent:cert-manager-certificates-trigger/v1.8.0-109-gf1de1d1f0945ae (linux/amd64) cert-manager/f1de1d1f0945ae6aff7705a76fc6d6fcd8628e17,audit-id:8f7c076a-71ac-4a18-822d-ccffdc4a3963,client:10.244.0.11,accept:application/json, */*,protocol:HTTP/2.0 (06-Jun-2022 06:01:34.867) (total time: 12114ms):
2022-06-06T06:01:46.982118941Z stderr F Trace[191612969]: ---"Object stored in database" 12105ms (06:01:46.981)
2022-06-06T06:01:46.982134732Z stderr F Trace[191612969]: [12.114025884s] [12.114025884s] END
...

/kind flake

@jetstack-bot jetstack-bot added the kind/flake Categorizes issue or PR as related to a flaky test. label Jun 6, 2022
@irbekrm
Copy link
Collaborator Author

irbekrm commented Jun 6, 2022

I am mostly interested whether this is an actual potential issue with the webhook or whether perhaps it's just a temporary issue with networking in the kind cluster or something along those lines, so webhook cannot be reached

@irbekrm
Copy link
Collaborator Author

irbekrm commented Jun 6, 2022

This too https://prow.build-infra.jetstack.net/view/gs/jetstack-logs/logs/ci-cert-manager-e2e-v1-23/1533413442235404288

Both validating and mutating webhooks fail with 'context deadline exceeded' and 'connection refused'. 'connection refused' suggests that this might be an issue with network setup in the e2e test, not with the webhook

Also https://prow.build-infra.jetstack.net/view/gs/jetstack-logs/logs/ci-cert-manager-e2e-v1-23/1528698848166285312

@sedefsavas
Copy link

I can confirm that we hit this issue with Kubernetes v1.21.2.

@irbekrm
Copy link
Collaborator Author

irbekrm commented Jun 8, 2022

@sedefsavas have you verified that this is not an issue with network connectivity in your cluster https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions ?

At the moment I am not aware of any cases where this would happen for other reasons than Kubernetes apiserver not being able to reach the webhook and am assuming that probably this is also the case with our tests

@SgtCoDFish SgtCoDFish added the flake/test-logic Indicates that the issue relates to test flakes caused by incorrect logic inside a test label Jul 7, 2022
@jetstack-bot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle stale

@jetstack-bot jetstack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 5, 2022
@jetstack-bot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale

@jetstack-bot jetstack-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 4, 2022
@jetstack-bot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

@jetstack-bot
Copy link
Collaborator

@jetstack-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.
Send feedback to jetstack.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake/test-logic Indicates that the issue relates to test flakes caused by incorrect logic inside a test kind/flake Categorizes issue or PR as related to a flaky test. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

4 participants