Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uWSGI OOMKilled on Kubernetes #9562

Closed
1 of 3 tasks
hoeg opened this issue Feb 16, 2024 · 7 comments
Closed
1 of 3 tasks

uWSGI OOMKilled on Kubernetes #9562

hoeg opened this issue Feb 16, 2024 · 7 comments
Labels

Comments

@hoeg
Copy link

hoeg commented Feb 16, 2024

Bug description
Deploying Defect Dojo to a Kubernetes cluster causes the uWSGI container to consume a lot of memory resulting in the node killing the pod. This is due to the unbound number of file descriptors on the node. See unbit/uwsgi#2299 for a description of the issue with uWSGI.

Steps to reproduce

  • Deploy helm chart to a kubernetes cluster with nodes running Flatcar Container Linux by Kinvolk 3602.2.1 (Oklo)
  • watch the pod get deployed and after <15 sec killed by the node due to OOM.

Expected behavior
Expected the pod to start up and not get OOMKilled by the node.

I locally build my own container adding the --max-fd argument to docker/entrypoint-uwsgi.sh and used that image in the my cluster, this resolved the issue.

Deployment method (select with an X)

  • Docker Compose
  • Kubernetes
  • GoDojo

Environment information

  • Kubernetes nodes running:
Kernel Version:             5.15.136-flatcar
OS Image:                   Flatcar Container Linux by Kinvolk 3602.2.1 (Oklo)
Operating System:           linux
Architecture:               amd64
Container Runtime Version:  containerd://1.6.21
Kubelet Version:            v1.28.3
Kube-Proxy Version:         v1.28.3
  • DefectDojo version: 2.30.4

Logs
Logs from the defectdojo-django pod

$ k logs defect-dojo-defectdojo-django
Defaulted container "uwsgi" out of: uwsgi, nginx
[13/Feb/2024 08:50:57] INFO [dojo.models:4295] enabling audit logging
/usr/local/lib/python3.11/site-packages/coreapi/codecs/download.py:5: DeprecationWarning: 'cgi' is deprecated and slated for removal in Python 3.13
  import cgi
System check identified no issues (0 silenced).
*** Starting uWSGI 2.0.23 (64bit) on [Tue Feb 13 08:50:58 2024] ***
compiled with version: 10.2.1 20210110 on 29 January 2024 15:50:06
os: Linux-5.15.136-flatcar #1 SMP Mon Oct 23 16:44:45 -00 2023
nodename: defect-dojo-defectdojo-django
machine: x86_64
clock source: unix
detected number of CPU cores: 4
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1073741816
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8081 fd 3
spawned uWSGI http 1 (pid: 13)
uwsgi socket 0 bound to UNIX address /run/defectdojo/uwsgi.sock fd 6
Python version: 3.11.4 (main, Aug 16 2023, 05:31:52) [GCC 10.2.1 20210110]
Python main interpreter initialized at 0x7fb82cac7558
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 405672 bytes (396 KB) for 15 cores
*** Operational MODE: preforking+threaded ***

note that uWSGI logs detected max file descriptor number: 1073741816 which causes the container to use a lot of memory.

Running the same deployment locally on my kind cluster i get:

Defaulted container "uwsgi" out of: uwsgi, nginx
[16/Feb/2024 08:57:04] INFO [dojo.models:4295] enabling audit logging
System check identified no issues (0 silenced).
*** Starting uWSGI 2.0.23 (64bit) on [Fri Feb 16 08:57:05 2024] ***
compiled with version: 11.2.1 20220219 on 05 February 2024 16:57:27
os: Linux-6.5.11-linuxkit #1 SMP PREEMPT Wed Dec  6 17:08:31 UTC 2023
nodename: defect-dojo-defectdojo-django-7774dcb687-gn5wn
machine: aarch64
clock source: unix
detected number of CPU cores: 10
current working directory: /app
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
*** WARNING: you are running uWSGI without its master process manager ***
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8081 fd 3
spawned uWSGI http 1 (pid: 17)
uwsgi socket 0 bound to UNIX address /run/defectdojo/uwsgi.sock fd 6
Python version: 3.11.3 (main, May  3 2023, 08:27:37) [GCC 11.2.1 20220219]
Python main interpreter initialized at 0xffffa64d55c0
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 183136 bytes (178 KB) for 4 cores
*** Operational MODE: preforking+threaded ***
[16/Feb/2024 08:57:05] INFO [dojo.models:4295] enabling audit logging
WSGI app 0 (mountpoint='') ready in 0 seconds on interpreter 0xffffa64d55c0 pid: 1 (default app)
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI worker 1 (pid: 1, cores: 2)
spawned uWSGI worker 2 (pid: 18, cores: 2)

where we see that detected max file descriptor number: 1048576. This is much lower and does not result in a OOMKilled event.

Suggestion

We add the option to include the --max-fd argument with a configurable value to the docker/entrypoint-uwsgi.sh script such that it is possible to set it to set it to a lower value, e.g. 1048576.

@hoeg hoeg added the bug label Feb 16, 2024
@kiblik
Copy link
Contributor

kiblik commented Feb 16, 2024

Can you please open it as PR? It looks, like you already know the solution :)

@hoeg
Copy link
Author

hoeg commented Feb 16, 2024

Sure! Just wanted to make sure you where willing to accept it 👍

@kiblik
Copy link
Contributor

kiblik commented Feb 16, 2024

I'm not a moderator (just a regular member of the community) but from what I see deep discussion usually opens under open PR.
Based on your description I suppose your fix is quite small (easy to implement), so feel free to do it this way.

@hoeg
Copy link
Author

hoeg commented Feb 16, 2024

Thanks for the insights, I will open a PR 🚀

@mtesauro
Copy link
Contributor

@hoeg To add on to what @kiblik said - For Helm in particular, we're trying to keep it at a 'generic framework for deploying DefectDojo level - no opinionated to much in any particular direction.

I know we've pushed back on very specific k8s/Helm changes that pushed the Helm towards only working on a specific vendors cloud or specific tech choice (like HA vs not-HA DB).

So, please keep this in mind when creating that PR. We're a project with a very broad community who deploy DefectDojo on everything from a laptop running Kali Linux to auto-scaling k8s and we try to keep a balance between those deployment choices in what we accept into the main repo.

For corner cases or very vendor specific things, we'd prefer the ability to opt-in to that choice while keeping the current default.

Anyway, that's how we try to balance a specific community member need vs the broader community. HTH.

@sebglon
Copy link

sebglon commented Apr 12, 2024

After removing the CPU limit on the iwsg container and increasing the memory limit to 3Gi, i have no more oomKill.

Now my memory not go higher than 1Gi

@mtesauro
Copy link
Contributor

Seems like this OOOMKill issue is a k8s config issue. Closing this.

For future readers of this thread, the best place to get advice on running DefectDojo, the OWASP Slack has a broad and active community. Info on Slack is at https://github.com/DefectDojo/django-DefectDojo?tab=readme-ov-file#community-getting-involved-and-updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants