Kong2.3.0, cpu 100%, nginx No response #6739

RyouZhang · 2021-01-15T13:54:45Z

Summary

SUMMARY_GOES_HERE

Steps To Reproduce

Additional Details & Logs

Kong version ($ kong version)
Kong debug-level startup logs ($ kong start --vv)
Kong error logs (<KONG_PREFIX>/logs/error.log)
Kong configuration (the output of a GET request to Kong's Admin port - see
https://docs.konghq.com/latest/admin-api/#retrieve-node-information)
Operating system

The text was updated successfully, but these errors were encountered:

bungle · 2021-01-15T15:49:54Z

@RyouZhang, please give us more information. Did you do inplace update? If so, you might have conflicting rockspecs installed.

RyouZhang · 2021-01-15T15:56:03Z

Use docker deploy
yestoday upgrade to kong 2.3.0， it work fine.
suddeny kong cpu 100% and not response.

after docker restart 24h, kong cpu 100%, and not response; It's very interesting.

I do a case, one kong has normal traffic，another only has metric； but after 24h，cpu 100% and no response all of.

I think maybe metric has some trouble.

After 24h, the issue reappear，kong reload， can response but cpu 100% and memory leak happend

Proc info
/proc/{pid}/wchan = 0
/proc/{pid}/syscall = running
/proc/{pid}/stack = [<0000000000000000>] 0xffffffffffffffff
/proc/{pid}stat = 24 (nginx) R 1 1 1 0 -1 4202816 2247197 0 0 0 85695 7564 0 0 20 0 1 0 3437536495 657178624 38614 18446744073709551615 94407332544512 94407334300730 140726817353536 140726817351104 139739583651365 0 65536 1073745920 402745863 0 0 0 17 16 0 0 0 0 0 94407334675680 94407334830717 94407338496000 140726817355784 140726817355855 140726817355855 140726817357778 0

strack -d :
strace: new tcb for pid 39217, active tcbs:1
strace: ptrace_setoptions = 0x11
strace: attach to pid 39217 (main) succeeded
strace: Process 39217 attached
strace: [wait(0x80057f) = 39217] WIFSTOPPED,sig=SIGTRAP,EVENT_STOP (128)
strace: pid 39217 has TCB_STARTUP, initializing it
strace: pid 39217 invalid syscall -1

ghost · 2021-01-19T14:19:00Z

Having the same issue. Reproduction steps below:

Install gojira

Follow the below execution flow:

export GOJIRA_IMAGE=kong:2.3.0
gojira hybrid up

# Start the CP
gojira hybrid run@kong-cp kong migrations bootstrap
gojira hybrid run@kong-cp kong start

# Start the DP
gojira hybrid run@kong-dp kong start

# Install deck on the CP and sync the kong.yaml below
gojira hybrid shell@kong-cp
kong-cp:/# ./install_deck.sh
kong-cp:/# deck sync -s kong.yaml

# Start monitoring the CPU of both the CP and the DP.
# Wait a few minutes. I had to wait 1 to 2 minutes.
# Now all workers from both the DP and the CP are each consuming 50% of CPU.
# This issue does not happen in traditional mode deployment. Also does not happen on 2.2.0 hybrid mode following the same steps.
# Proxy requests just hang and time out after 30 seconds. Nothing in the notice error logs (attached).

kong.yaml:

_format_version: "1.1"
services:
- connect_timeout: 60000
  host: testUps6066_01
  name: httpbin
  port: 80
  protocol: http
  read_timeout: 60000
  retries: 5
  write_timeout: 60000
  routes:
  - id: c62fbb13-d29a-4bac-9c2a-92610ac7b6e6
    paths:
    - /test
    path_handling: v0
    preserve_host: false
    protocols:
    - http
    regex_priority: 0
    strip_path: true
    https_redirect_status_code: 426
upstreams:
- name: testUps6066_01
  algorithm: round-robin
  slots: 10000
  healthchecks:
    active:
      concurrency: 10
      healthy:
        http_statuses:
        - 200
        - 302
        interval: 1
        successes: 1
      http_path: /status/404
      https_verify_certificate: true
      type: http
      timeout: 1
      unhealthy:
        http_failures: 1
        http_statuses:
        - 429
        - 404
        - 500
        - 501
        - 502
        - 503
        - 504
        - 505
        tcp_failures: 1
        timeouts: 1
        interval: 1
    passive:
      healthy:
        http_statuses:
        - 200
        - 201
        - 202
        - 203
        - 204
        - 205
        - 206
        - 207
        - 208
        - 226
        - 300
        - 301
        - 302
        - 303
        - 304
        - 305
        - 306
        - 307
        - 308
        successes: 0
      type: http
      unhealthy:
        http_failures: 0
        http_statuses:
        - 429
        - 500
        - 503
        tcp_failures: 0
        timeouts: 0
    threshold: 0
  hash_on: none
  hash_fallback: none
  hash_on_cookie_path: /
  targets:
  - target: httpbin.org:80
    weight: 100

error logs:

cp_logs.txt
dp_logs.txt

ghost · 2021-01-19T14:41:03Z

Both DP and CP logs in debug level start filling up extremely quickly with

2021/01/19 14:39:24 [debug] 70#0: *733 [lua] base.lua:493: f(): [upstream:testUps6066_01 2] no dns changes detected for httpbin.org
2021/01/19 14:39:25 [debug] 72#0: *803 [lua] base.lua:448: [upstream:testUps6066_01 2] executing requery for: httpbin.org
2021/01/19 14:39:25 [debug] 71#0: *805 [lua] base.lua:448: [upstream:testUps6066_01 2] executing requery for: httpbin.org
2021/01/19 14:39:24 [debug] 70#0: *733 [lua] base.lua:448: [upstream:testUps6066_01 2] executing requery for: httpbin.org
2021/01/19 14:39:25 [debug] 72#0: *803 [lua] base.lua:640: queryDns(): [upstream:testUps6066_01 2] querying dns for httpbin.org
2021/01/19 14:39:25 [debug] 71#0: *805 [lua] base.lua:640: queryDns(): [upstream:testUps6066_01 2] querying dns for httpbin.org
2021/01/19 14:39:24 [debug] 70#0: *733 [lua] base.lua:640: queryDns(): [upstream:testUps6066_01 2] querying dns for httpbin.org
2021/01/19 14:39:25 [debug] 72#0: *803 [lua] base.lua:493: f(): [upstream:testUps6066_01 2] no dns changes detected for httpbin.org
2021/01/19 14:39:25 [debug] 71#0: *805 [lua] base.lua:493: f(): [upstream:testUps6066_01 2] no dns changes detected for httpbin.org
2021/01/19 14:39:25 [debug] 72#0: *803 [lua] base.lua:448: [upstream:testUps6066_01 2] executing requery for: httpbin.org
2021/01/19 14:39:24 [debug] 70#0: *733 [lua] base.lua:493: f(): [upstream:testUps6066_01 2] no dns changes detected for httpbin.org
2021/01/19 14:39:25 [debug] 71#0: *805 [lua] base.lua:448: [upstream:testUps6066_01 1] executing requery for: httpbin.org
2021/01/19 14:39:25 [debug] 72#0: *803 [lua] base.lua:640: queryDns(): [upstream:testUps6066_01 2] querying dns for httpbin.org
2021/01/19 14:39:24 [debug] 70#0: *733 [lua] base.lua:448: [upstream:testUps6066_01 2] executing requery for: httpbin.org
2021/01/19 14:39:25 [debug] 71#0: *805 [lua] base.lua:640: queryDns(): [upstream:testUps6066_01 1] querying dns for httpbin.org

To the point where I can't even attach the error log in debug level neither from the DP nor CP because of how big in size they get after just a few seconds.

bungle · 2021-01-19T16:27:03Z

@murillopaula thank you for pointers. It looks like a bug in healthchecks. @RyouZhang, I will look what is happening, and will report back. It seems like it is healthchecks related and happens mostly on hybrid mode. @RyouZhang what mode did you run yours?

RyouZhang · 2021-01-19T16:32:57Z

debug log, I‘ll try tomorrow.
today I got a new case
use Kong:2.3.0-ubuntu image, 4 worker, after 24h 3 worker cpu 100%, 1 worker fine.

and memory monitor:

ghost · 2021-01-19T16:45:58Z

@RyouZhang do you use hybrid mode? do you use upstreams/targets with health checks (passive/active)?

RyouZhang · 2021-01-20T01:58:16Z

ok, this is my kong.yaml and docker cmd
Archive.zip
2 day cpu used:

ghost · 2021-01-20T12:23:40Z

@RyouZhang ok, it seems like you are also using health checks. We've found a temporary workaround to mitigate this issue:

luarocks remove lua-resty-dns-client --force
luarocks install lua-resty-dns-client 5.1.1 --force

i.e. downgrade the lua-resty-dns-client to 5.1.1.

Can you let us know if that works for you? Note this is a temporary hack and should not be considered a definitive solution.

RyouZhang · 2021-01-21T15:15:33Z

use lua-resty-dns-client 5.1.1, work 24h， it's fine.

@RyouZhang

### Summary @RyouZhang reported with #6739 that Kong starts to spin CPUs 100% with `2.3.0` when healthchecks are enabled. @murillopaula confirmed it and gave additional information and we could reproduce it easily. We identified an issue in lua-resty-dns-client 5.2.1 together with @Tieske and @locao, fixed it. This PR bumps the Kong dependency to the fixed version of the library. ### Issues Resolved Fix #6739

bungle · 2021-01-21T19:29:33Z

@RyouZhang we released lua-resty-dns-client 5.2.1, that also fixes the issue, and will be included in Kong 2.3.1 that we are releasing within the next few days. Thank you for cooperating and letting us know about it!

@RyouZhang

### Summary @RyouZhang reported with #6739 that Kong starts to spin CPUs 100% with `2.3.0` when healthchecks are enabled. @murillopaula confirmed it and gave additional information and we could reproduce it easily. We identified an issue in lua-resty-dns-client 5.2.1 together with @Tieske and @locao, fixed it. This PR bumps the Kong dependency to the fixed version of the library. ### Issues Resolved Fix #6739

bungle added the pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... label Jan 15, 2021

bungle added task/bug and removed pending author feedback Waiting for the issue author to get back to a maintainer with findings, more details, etc... labels Jan 19, 2021

bungle added the core/balancer label Jan 19, 2021

Tieske mentioned this issue Jan 21, 2021

fix(base) fix issue with 100% CPU Kong/lua-resty-dns-client#116

Merged

bungle mentioned this issue Jan 21, 2021

chore(deps) bump lua-resty-dns-client to 5.2.1 #6760

Merged

locao closed this as completed in #6760 Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kong2.3.0, cpu 100%, nginx No response #6739

Kong2.3.0, cpu 100%, nginx No response #6739

RyouZhang commented Jan 15, 2021

bungle commented Jan 15, 2021

RyouZhang commented Jan 15, 2021 •

edited

ghost commented Jan 19, 2021 •

edited by ghost

ghost commented Jan 19, 2021 •

edited by ghost

bungle commented Jan 19, 2021 •

edited

RyouZhang commented Jan 19, 2021 •

edited

ghost commented Jan 19, 2021 •

edited by ghost

RyouZhang commented Jan 20, 2021 •

edited

ghost commented Jan 20, 2021

RyouZhang commented Jan 21, 2021

bungle commented Jan 21, 2021

Kong2.3.0, cpu 100%, nginx No response #6739

Kong2.3.0, cpu 100%, nginx No response #6739

Comments

RyouZhang commented Jan 15, 2021

Summary

Steps To Reproduce

Additional Details & Logs

bungle commented Jan 15, 2021

RyouZhang commented Jan 15, 2021 • edited

ghost commented Jan 19, 2021 • edited by ghost

ghost commented Jan 19, 2021 • edited by ghost

bungle commented Jan 19, 2021 • edited

RyouZhang commented Jan 19, 2021 • edited

ghost commented Jan 19, 2021 • edited by ghost

RyouZhang commented Jan 20, 2021 • edited

ghost commented Jan 20, 2021

RyouZhang commented Jan 21, 2021

bungle commented Jan 21, 2021

RyouZhang commented Jan 15, 2021 •

edited

ghost commented Jan 19, 2021 •

edited by ghost

ghost commented Jan 19, 2021 •

edited by ghost

bungle commented Jan 19, 2021 •

edited

RyouZhang commented Jan 19, 2021 •

edited

ghost commented Jan 19, 2021 •

edited by ghost

RyouZhang commented Jan 20, 2021 •

edited