OOM when stress test the Seldon model, which may be caused by the logging of request and response payloads #3726

chi2liu · 2021-11-08T08:51:40Z

Describe the bug

When we stress test the Seldon model, we find that OOM errors occur after sending a large number of requests.We monitored the containers in the pod and discovered the model container reaches its memory limit and is eventually killed due to OOM. And when we disabled the logging of request and response payloads from Seldon Deployment(by comment the logger), it won't be OOM. I wonder what may be the real reason of OOM. Is it because there are too many logs in memory, resulting in oom?

graph:
children: []
endpoint:
type: "REST"
#logger:
# mode: "all"

yaliqin · 2021-11-11T18:42:37Z

@chi2liu It is due to seldon-container-engine request/response logging queue. There are ways to work with it.

chi2liu added bug triage Needs to be triaged and prioritised accordingly labels Nov 8, 2021

ivan-valkov mentioned this issue Nov 12, 2021

Performance fix for the logger in the executor #3734

Merged

ukclivecox closed this as completed in #3734 Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM when stress test the Seldon model, which may be caused by the logging of request and response payloads #3726

OOM when stress test the Seldon model, which may be caused by the logging of request and response payloads #3726

chi2liu commented Nov 8, 2021

yaliqin commented Nov 11, 2021

OOM when stress test the Seldon model, which may be caused by the logging of request and response payloads #3726

OOM when stress test the Seldon model, which may be caused by the logging of request and response payloads #3726

Comments

chi2liu commented Nov 8, 2021

yaliqin commented Nov 11, 2021