python: properly propagate shutdown signal in weblogs#6374
Conversation
|
|
|
✅ Tests 🎉 All green!❄️ No new flaky tests detected 🔗 Commit SHA: c102ca1 | Docs | Datadog PR Page | Was this helpful? Give us feedback! |
edc43ef to
e599acc
Compare
4b1e3f2 to
aec0b37
Compare
aec0b37 to
a5517db
Compare
a5517db to
8e28b38
Compare
25b6b3d to
c070b72
Compare
c070b72 to
fda6c57
Compare
VianneyRuhlmann
left a comment
There was a problem hiding this comment.
Nice, thx for fixing it
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c102ca18a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
d0f8b2d
into
main
|
|
||
| if [[ $(python -c 'import sys; print(sys.version_info >= (3,12))') == "True" ]]; then | ||
| ddtrace-run gunicorn -w 1 -b 0.0.0.0:7777 --header-map dangerous --access-logfile - django_app.wsgi -k gevent | ||
| exec ddtrace-run gunicorn -w 1 -b 0.0.0.0:7777 --header-map dangerous --access-logfile - django_app.wsgi -k gevent |
There was a problem hiding this comment.
For my knowledge, what are the diff between using exec and not, and why it makes the app more stable ?
There was a problem hiding this comment.
In the previous setup: the main process (pid 1) was bash (defined in the ENTRYPOINT of the upstream python images) and at startup bash runs our app.sh scripts. When docker wants to stop the container, it sends a SIGTERM (graceful shutdown) to the main process (bash) that does not forward signals to child processes and the app never has the occasion to run the flushing logic on graceful shutdown. docker then sends after a grace period a SIGKILL that kills the app at the kernel level.
When using exec, the launched process replaces the current process so ddtrace-run becomes the main process (pid 1) and docker sends its SIGTERM directly to ddtrace-run and we have the possibility to properly flush everything on shutdown.
If the flushing logic on shutdown works correctly (and it seems to be the case), that would mean that we can now set the library interface timeout back to 0 (from 5s currently)
There was a problem hiding this comment.
PHP images use dumb-init for the same purpose if I understand correctly, this is just a leaner version for when you have a single process
There was a problem hiding this comment.
Thank you very much !
Motivation
Re-enable this test:
tests/test_resource_renaming.py::Test_Resource_Renaming_Stats_Aggregation_Keys::test_stats_aggregation_with_method_and_endpoint.The main problem is that the weblog never receive the SIGTERM signal from Docker and get killed right away without having the possibility to flush their buffers on shutdown.
This PR ensures that the SIGTERM signal properly propagates to the python interpreters running the app and and can be caught by
dd-trace-py.Changes
execto replace the shell process with the actualddtrace-runcommand.Workflow
🚀 Once your PR is reviewed and the CI green, you can merge it!
🛟 #apm-shared-testing 🛟
Reviewer checklist
tests/ormanifests/is modified ? I have the approval from R&P teambuild-XXX-imagelabel is present