DNS Resolution Fails Sporadically - Crashes Server #10093

NickTheSecurityDude · 2023-01-11T02:28:47Z

Is there an existing issue for this?

I have searched the existing issues

Kong version (`$ kong version`)

Kong Proxy 3.1.0 Kong Ingress 2.8.0

Current Behavior

Kong keeps crashing with DNS resolution errors.

If I restart kong it will work for 15 min or so, then all requests will fail and the website goes down.

If I exec into the kong pod directly I’m able to run nslookup commands fine.

Does anyone know why this may be happening? EKS 1.24 Kong Proxy 3.1.0 Kong Ingress 2.8.0

[notice] 1132#0: *436443 [kong] handler.lua:181 [mysite-auth] :mys_lua: resty - validation api call encountered error [cosocket] DNS resolution failed: dns lookup pool exceeded retries (1): timeout. Tried: [“(short)auth.web.svc.mysite.com:(na) - cache-miss”,“auth.web.svc.mysite.com.kong.svc.mysite.com:33

The other error I’m seeing frequently is:
[warn] 1132#0: * [lua] batch_queue.lua:183: failed to process entries: nil, context: ngx.timer”

Expected Behavior

No DNS Resolution Errors.

Steps To Reproduce

It happens sporadically, if I delete the kong pod and let it recreate it, it will work for some time before showing those errors and taking the web site offline.

Anything else?

EKS 1.24

The text was updated successfully, but these errors were encountered:

J1a-wei · 2023-01-13T07:49:42Z

Same issue. However, we have always happened that the gateway service is not normal and cannot connect to the internal readis cluster

[cosocket] DNS resolution failed: dns client error: 101 empty record received. Tried: ["(short):(na) - cache-miss",".kong.svc.cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:33 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":33 - cache-hit/dns client error: 101 empty record received",".kong.svc.cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:1 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":1 - cache-hit/dns client error: 101 empty record received",".kong.svc.cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".svc.cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".cluster.local:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",".ap-northeast-1.compute.internal:5 - cache-miss/scheduled/querying/try 1 error: bad name/scheduled/querying/try 2 error: bad name/dns lookup pool exceeded retries (1): bad name",":5 - cache-hit/dns client error: 101 empty record received"], client: 100.168.33.146, server: kong_admin, request: "POST /web3-gateway/caches HTTP/1.1", host: "xxxx"

#10107

fffonion · 2023-02-07T10:04:34Z

Hi @NickTheSecurityDude @0xRook1e could yoy attach a tcpdump during the time the error happened (when Kong receives error/timeout, but normal nslookup is fine)?

surenraju-careem · 2023-02-19T17:27:39Z

Related issue #9959

stale · 2023-03-11T11:34:48Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

chronolaw added area/kubernetes Issues where Kong is running on top of Kubernetes core/proxy labels Jan 11, 2023

hanshuebner added the dns label Jan 24, 2023

fffonion added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Feb 14, 2023

stale bot added the stale label Mar 11, 2023

stale bot closed this as completed Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS Resolution Fails Sporadically - Crashes Server #10093

DNS Resolution Fails Sporadically - Crashes Server #10093

NickTheSecurityDude commented Jan 11, 2023 •

edited

J1a-wei commented Jan 13, 2023 •

edited

fffonion commented Feb 7, 2023

surenraju-careem commented Feb 19, 2023

stale bot commented Mar 11, 2023

DNS Resolution Fails Sporadically - Crashes Server #10093

DNS Resolution Fails Sporadically - Crashes Server #10093

Comments

NickTheSecurityDude commented Jan 11, 2023 • edited

Is there an existing issue for this?

Kong version ($ kong version)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

J1a-wei commented Jan 13, 2023 • edited

fffonion commented Feb 7, 2023

surenraju-careem commented Feb 19, 2023

stale bot commented Mar 11, 2023

NickTheSecurityDude commented Jan 11, 2023 •

edited

Kong version (`$ kong version`)

J1a-wei commented Jan 13, 2023 •

edited