uWSGI OOMKilled on Kubernetes #9562

hoeg · 2024-02-16T09:00:17Z

Bug description
Deploying Defect Dojo to a Kubernetes cluster causes the uWSGI container to consume a lot of memory resulting in the node killing the pod. This is due to the unbound number of file descriptors on the node. See unbit/uwsgi#2299 for a description of the issue with uWSGI.

Steps to reproduce

Deploy helm chart to a kubernetes cluster with nodes running Flatcar Container Linux by Kinvolk 3602.2.1 (Oklo)
watch the pod get deployed and after <15 sec killed by the node due to OOM.

Expected behavior
Expected the pod to start up and not get OOMKilled by the node.

I locally build my own container adding the --max-fd argument to docker/entrypoint-uwsgi.sh and used that image in the my cluster, this resolved the issue.

Deployment method (select with an X)

Docker Compose
Kubernetes
GoDojo

Environment information

Kubernetes nodes running:

Kernel Version:             5.15.136-flatcar
OS Image:                   Flatcar Container Linux by Kinvolk 3602.2.1 (Oklo)
Operating System:           linux
Architecture:               amd64
Container Runtime Version:  containerd://1.6.21
Kubelet Version:            v1.28.3
Kube-Proxy Version:         v1.28.3

DefectDojo version: 2.30.4

Logs
Logs from the defectdojo-django pod

$ k logs defect-dojo-defectdojo-django
Defaulted container "uwsgi" out of: uwsgi, nginx
[13/Feb/2024 08:50:57] INFO [dojo.models:4295] enabling audit logging
/usr/local/lib/python3.11/site-packages/coreapi/codecs/download.py:5: DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
  import cgi
System check identified no issues (0 silenced).
*** Starting uWSGI 2.0.23 (64bit) on [Tue Feb 13 08:50:58 2024] ***
compiled with version: 10.2.1 20210110 on 29 January 2024 15:50:06
os: Linux-5.15.136-flatcar #1 SMP Mon Oct 23 16:44:45 -00 2023
nodename: defect-dojo-defectdojo-django
machine: x86_64
clock source: unix
detected number of CPU cores: 4
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8081 fd 3
spawned uWSGI http 1 (pid: 13)
uwsgi socket 0 bound to UNIX address /run/defectdojo/uwsgi.sock fd 6
Python version: 3.11.4 (main, Aug 16 2023, 05:31:52) [GCC 10.2.1 20210110]
Python main interpreter initialized at 0x7fb82cac7558
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 405672 bytes (396 KB) for 15 cores
*** Operational MODE: preforking+threaded ***

note that uWSGI logs detected max file descriptor number: 1073741816 which causes the container to use a lot of memory.

Running the same deployment locally on my kind cluster i get:

Defaulted container "uwsgi" out of: uwsgi, nginx
[16/Feb/2024 08:57:04] INFO [dojo.models:4295] enabling audit logging
System check identified no issues (0 silenced).
*** Starting uWSGI 2.0.23 (64bit) on [Fri Feb 16 08:57:05 2024] ***
compiled with version: 11.2.1 20220219 on 05 February 2024 16:57:27
os: Linux-6.5.11-linuxkit #1 SMP PREEMPT Wed Dec  6 17:08:31 UTC 2023
nodename: defect-dojo-defectdojo-django-7774dcb687-gn5wn
machine: aarch64
clock source: unix
detected number of CPU cores: 10
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8081 fd 3
spawned uWSGI http 1 (pid: 17)
uwsgi socket 0 bound to UNIX address /run/defectdojo/uwsgi.sock fd 6
Python version: 3.11.3 (main, May  3 2023, 08:27:37) [GCC 11.2.1 20220219]
Python main interpreter initialized at 0xffffa64d55c0
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 183136 bytes (178 KB) for 4 cores
*** Operational MODE: preforking+threaded ***
[16/Feb/2024 08:57:05] INFO [dojo.models:4295] enabling audit logging
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xffffa64d55c0 pid: 1 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (pid: 1, cores: 2)
spawned uWSGI worker 2 (pid: 18, cores: 2)

where we see that detected max file descriptor number: 1048576. This is much lower and does not result in a OOMKilled event.

Suggestion

We add the option to include the --max-fd argument with a configurable value to the docker/entrypoint-uwsgi.sh script such that it is possible to set it to set it to a lower value, e.g. 1048576.

The text was updated successfully, but these errors were encountered:

kiblik · 2024-02-16T10:47:29Z

Can you please open it as PR? It looks, like you already know the solution :)

hoeg · 2024-02-16T10:58:28Z

Sure! Just wanted to make sure you where willing to accept it 👍

kiblik · 2024-02-16T11:02:22Z

I'm not a moderator (just a regular member of the community) but from what I see deep discussion usually opens under open PR.
Based on your description I suppose your fix is quite small (easy to implement), so feel free to do it this way.

hoeg · 2024-02-16T11:58:06Z

Thanks for the insights, I will open a PR 🚀

mtesauro · 2024-02-16T21:24:48Z

@hoeg To add on to what @kiblik said - For Helm in particular, we're trying to keep it at a 'generic framework for deploying DefectDojo level - no opinionated to much in any particular direction.

I know we've pushed back on very specific k8s/Helm changes that pushed the Helm towards only working on a specific vendors cloud or specific tech choice (like HA vs not-HA DB).

So, please keep this in mind when creating that PR. We're a project with a very broad community who deploy DefectDojo on everything from a laptop running Kali Linux to auto-scaling k8s and we try to keep a balance between those deployment choices in what we accept into the main repo.

For corner cases or very vendor specific things, we'd prefer the ability to opt-in to that choice while keeping the current default.

Anyway, that's how we try to balance a specific community member need vs the broader community. HTH.

sebglon · 2024-04-12T07:25:44Z

After removing the CPU limit on the iwsg container and increasing the memory limit to 3Gi, i have no more oomKill.

Now my memory not go higher than 1Gi

mtesauro · 2024-04-13T19:33:24Z

Seems like this OOOMKill issue is a k8s config issue. Closing this.

For future readers of this thread, the best place to get advice on running DefectDojo, the OWASP Slack has a broad and active community. Info on Slack is at https://github.com/DefectDojo/django-DefectDojo?tab=readme-ov-file#community-getting-involved-and-updates

hoeg added the bug label Feb 16, 2024

hoeg mentioned this issue Feb 16, 2024

Add --max-fd argument to uwsgi to stop it from getting OOMKilled in Kubernetes #9564

Closed

jackson-van-hyning mentioned this issue Apr 1, 2024

uWSGI OOMKilled after certain interactions (Kubernetes) #9849

Closed

3 tasks

mtesauro closed this as completed Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uWSGI OOMKilled on Kubernetes #9562

uWSGI OOMKilled on Kubernetes #9562

hoeg commented Feb 16, 2024

kiblik commented Feb 16, 2024

hoeg commented Feb 16, 2024

kiblik commented Feb 16, 2024

hoeg commented Feb 16, 2024

mtesauro commented Feb 16, 2024

sebglon commented Apr 12, 2024 •

edited

mtesauro commented Apr 13, 2024

uWSGI OOMKilled on Kubernetes #9562

uWSGI OOMKilled on Kubernetes #9562

Comments

hoeg commented Feb 16, 2024

kiblik commented Feb 16, 2024

hoeg commented Feb 16, 2024

kiblik commented Feb 16, 2024

hoeg commented Feb 16, 2024

mtesauro commented Feb 16, 2024

sebglon commented Apr 12, 2024 • edited

mtesauro commented Apr 13, 2024

sebglon commented Apr 12, 2024 •

edited