Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS Resolution Fails Sporadically - Crashes Server #10093

Closed
1 task done
NickTheSecurityDude opened this issue Jan 11, 2023 · 4 comments
Closed
1 task done

DNS Resolution Fails Sporadically - Crashes Server #10093

NickTheSecurityDude opened this issue Jan 11, 2023 · 4 comments
Labels
area/kubernetes Issues where Kong is running on top of Kubernetes core/proxy dns pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... stale

Comments

@NickTheSecurityDude
Copy link

NickTheSecurityDude commented Jan 11, 2023

Is there an existing issue for this?

  • I have searched the existing issues

Kong version ($ kong version)

Kong Proxy 3.1.0 Kong Ingress 2.8.0

Current Behavior

Kong keeps crashing with DNS resolution errors.

If I restart kong it will work for 15 min or so, then all requests will fail and the website goes down.

If I exec into the kong pod directly I’m able to run nslookup commands fine.

Does anyone know why this may be happening? EKS 1.24 Kong Proxy 3.1.0 Kong Ingress 2.8.0

[notice] 1132#0: *436443 [kong] handler.lua:181 [mysite-auth] :mys_lua: resty - validation api call encountered error [cosocket] DNS resolution failed: dns lookup pool exceeded retries (1): timeout. Tried: [“(short)auth.web.svc.mysite.com:(na) - cache-miss”,“auth.web.svc.mysite.com.kong.svc.mysite.com:33

The other error I’m seeing frequently is:
[warn] 1132#0: * [lua] batch_queue.lua:183: failed to process entries: nil, context: ngx.timer”

Expected Behavior

No DNS Resolution Errors.

Steps To Reproduce

It happens sporadically, if I delete the kong pod and let it recreate it, it will work for some time before showing those errors and taking the web site offline.

Anything else?

EKS 1.24

@chronolaw chronolaw added area/kubernetes Issues where Kong is running on top of Kubernetes core/proxy labels Jan 11, 2023
@J1a-wei
Copy link

J1a-wei commented Jan 13, 2023

Same issue. However, we have always happened that the gateway service is not normal and cannot connect to the internal readis cluster

[cosocket] DNS resolution failed: dns client error: 101 empty record received. Tried: ["(short):(na) - cache-miss",".kong.svc.cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":33 - cache-hit/dns client error: 101 empty record received",".kong.svc.cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":1 - cache-hit/dns client error: 101 empty record received",".kong.svc.cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":5 - cache-hit/dns client error: 101 empty record received"], client: 100.168.33.146, server: kong_admin, request: "POST /web3-gateway/caches HTTP/1.1", host: "xxxx"

#10107

@fffonion
Copy link
Contributor

fffonion commented Feb 7, 2023

Hi @NickTheSecurityDude @0xRook1e could yoy attach a tcpdump during the time the error happened (when Kong receives error/timeout, but normal nslookup is fine)?

@fffonion fffonion added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Feb 14, 2023
@surenraju-careem
Copy link

Related issue #9959

@stale
Copy link

stale bot commented Mar 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 11, 2023
@stale stale bot closed this as completed Mar 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubernetes Issues where Kong is running on top of Kubernetes core/proxy dns pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... stale
Projects
None yet
Development

No branches or pull requests

6 participants