-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100% CPU usage of nginx worker process after custom plugin crash #9301
Comments
recommended flame graph |
@fairyqb Attaching CPU flame Graph of process |
Looks like epoll looping, maybe because a socket is reported as ready (readable/writeable) that is then not handled (read/written), causing the next epoll call to immediately return again. Given that this is about the Golang plugin, what does it mean that the plugin "crashed"? Can we even expect to gracefully handle this situation? |
Hi Samuele,
We are not using go-pluginserver to run custom plugins, we are running a
plugin in embedded server mode.
…On Thu, Sep 1, 2022 at 12:57 PM Samuele Illuminati ***@***.***> wrote:
Hello @dhrumil29699 <https://github.com/dhrumil29699>, I was trying to
replicate this but couldn't: for me when a simple go plugin crashes, the
go-pluginserver is quickly restarted without affecting the CPU
utilization noticeably.
Could you share some additional details of how exactly your plugin is
crashing, what is causing the failure and whether kong is logging something
similar to: external pluginserver 'go-pluginserver' terminated?
Also, could you let us know if this can be replicated on the latest
version of Kong - 2.8.1.4?
—
Reply to this email directly, view it on GitHub
<#9301 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFZJ3FZR3LWGQM3INDLMB3LV4BLGHANCNFSM57OQP2AQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Thanks & Regards
Dhrumil
--
The information in this mail and any attachment(s) to this message is/are
confidential and intended solely for the addressee or organisation to whom
it is addressed. If you have erroneously received this message, please
***@***.*** ***@***.***> immediately and
destroy the message and attachment(s). If you are not the intended
recipient, any copying, forwarding, altering or disclosing the contents of
this email message may be unlawful. The information, attachment(s) or the
opinions expressed in this mail are those of the individual sender and not
necessarily those of ACKO. ACKO accepts no responsibility for any loss or
damage arising from the use of this email message or its attachment(s).
|
Hi @dhrumil29699, yes I've meant to review my answer after noticing that detail in your issue's description, to verify whether it made a difference. After testing this in embedded server mode I can confirm that I could still not reproduce this behaviour: after the embedded server terminates and restart due to the plugin's crash, the resources utilization remain stable, even after running an heavy load of requests on the gateway. Based on that I believe this problem might be due to the particular nature of the crash of your plugin and/or environment configuration or plugin implementation. The following additional details could help identifying the root cause: Can this be replicated with the following conditions?:
|
@samugi |
hello @dhrumil29699, thanks for getting back to us. As mentioned I couldn't replicate this in a test environment based on your description. I would advice going through each bullet point from my previous answer, thank you. |
@samugi I have tried to replicate issue mention in above condition.
go-pluginserver mode logs
embedded server mode logs
embedded server mode has one more issue earlier in which it's not able to call plugin after crash for that it require few changes in pb_rpc.lua file issue mention in #8293. |
Hi @dhrumil29699 thank you for the additional details. Can you confirm that you can only replicate this by applying the patch on |
This is a fix for #9301 when a request made the plugin fail, the creation of a new instance was not handling a failed connection properly. Upon failure to connect to the socket, it was possible to end up in a state where the instance existed without an instance id, causing an infinite loop and high CPU utilization.
Is there an existing issue for this?
Kong version (
$ kong version
)2.7.1
Current Behavior
When custom plugin crash for some reason Nginx worker process stuck in some kind of loop and CPU usage of worker process reach to 100%. After some time container start throttling and half of the request going to this pod result in 502. This remain same for almost 10 min. Eventually Kong main process get exited and container restart. Custom plugins are written in Golang and running in embedded server mode.
Attaching screenshot of pod where CPU is high after restart.
Memory usage of same container also increase steadily.
Output of top command withing container whose CPU is high
GDB output of worker process using bt
strace output of process
Expected Behavior
If custom plugin crash it should not have any impact nginx worker process.
Steps To Reproduce
Anything else?
We are running kong in hybrid mode. Plugin are running in embedded server mode. Apart from custome plugin we are also using correlation-id, request-transformer, jwt, response-transformer plugins.
The text was updated successfully, but these errors were encountered: