Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault in dd-trace-py on Python 3.12 #9205

Closed
sampritipanda opened this issue May 8, 2024 · 5 comments
Closed

Segmentation Fault in dd-trace-py on Python 3.12 #9205

sampritipanda opened this issue May 8, 2024 · 5 comments
Assignees
Labels
Profiling Continous Profling stale

Comments

@sampritipanda
Copy link

Summary of problem

I'm getting frequent segmentation faults in my application after I started to use Python 3.12

Which version of dd-trace-py are you using?

ddtrace = "^2.8.1"

How can we reproduce your problem?

Not very reproducible so far.

What is the result that you get?

Here's a stack trace of the segfault from one of the core dumps generated:

The docker container we use is: thehale/python-poetry:1.8.2-py3.12-slim

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f2d1c47ae8f in __pthread_kill_internal (signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f2d1c42bfb2 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#3  <signal handler called>
#4  0x00007f2d1c7f656c in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#5  0x00007f2ceccfec71 in memalloc_malloc () from /.venv/lib/python3.12/site-packages/ddtrace/profiling/collector/_memalloc.cpython-312-x86_64-linux-gnu.so
#6  0x00007f2d1c7fdfd1 in PyUnicode_New () from /usr/local/bin/../lib/libpython3.12.so.1.0
#7  0x00007f2d1c7fd907 in ?? () from /usr/local/bin/../lib/libpython3.12.so.1.0
#8  0x00007f2d1c8900a2 in _PyErr_SetString () from /usr/local/bin/../lib/libpython3.12.so.1.0
#9  0x00007f2d1c82f53b in PyLong_AsLong () from /usr/local/bin/../lib/libpython3.12.so.1.0
#10 0x00007f2cec02c457 in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#11 0x00007f2cec02ca67 in Frame::Frame(PyCodeObject*, int) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#12 0x00007f2cec02cb6a in Frame::get(PyCodeObject*, int) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#13 0x00007f2cec02cd6b in Frame::read(_object*, _object**) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#14 0x00007f2cec02ce2c in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#15 0x00007f2cec02cf1a in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#16 0x00007f2cec02f37a in ThreadInfo::unwind(_ts*) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#17 0x00007f2cec02fcc1 in ThreadInfo::sample(long, _ts*, unsigned long) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#18 0x00007f2cec02e186 in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#19 0x00007f2cec02e2bc in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#20 0x00007f2cec02ba3d in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#21 0x00007f2cec02bac0 in Datadog::Sampler::sampling_thread(unsigned long) () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#22 0x00007f2cec034ac0 in ?? () from /.venv/lib/python3.12/site-packages/ddtrace/internal/datadog/profiling/stack_v2/_stack_v2.cpython-312-x86_64-linux-gnu.so
#23 0x00007f2d1c479134 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#24 0x00007f2d1c4f97dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

It's somewhat unclear what causes this segfault, but all of our segfaults have this exact same stack trace. I can share some code pointers which each stack frame refers to that I found while trying to root cause this:

4 - (I believe this the python allocator malloc implementation)
5 - https://github.com/DataDog/dd-trace-py/blob/main/ddtrace/profiling/collector/_memalloc.c#L116
9 - https://github.com/python/cpython/blob/3.12/Objects/longobject.c#L542-L543
10 - https://github.com/P403n1x87/echion/blob/main/echion/strings.h#L104

What is the result that you expected?

No segfaults pls 😄 Really like the product otherwise.

@sanchda sanchda self-assigned this May 9, 2024
@sanchda sanchda added the Profiling Continous Profling label May 9, 2024
@sanchda
Copy link
Contributor

sanchda commented May 9, 2024

👋 Thank you for the report! Unfortunately, this is a known problem in the "stack v2" implementation of the profiler on Python 3.12 (it does not occur in 3.11 or earlier). If you haven't tried the "legacy" stack collector (just omit the DD_PROFILING_STACK_V2_ENABLED environment variable, or use DD_PROFILING_STACK_V2_ENABLED=false), then please give it a shot and see if it offers you some relief.

If you're using "stack v2" for a reason (such as, avoiding the even greater number of segfaults originating from the cpython runtime for the legacy stack collector), then please ignore that advice. 😄

This may actually have been fixed in ddtrace 3.8.4. I'll be testing and working on a fix this upcoming week. I'll check back in on Math 15th or so to confirm whether that version actually has the fix. If you'd like to try the new release to see if it helps, please let me know how that goes!

@sampritipanda
Copy link
Author

Thanks, I indeed had DD_PROFILING_STACK_V2_ENABLED and DD_PROFILING_EXPORT_LIBDD_ENABLED enabled to mitigate segfaults from Python 3.11. Should I disable both of them or just STACK_V2?

@sanchda
Copy link
Contributor

sanchda commented May 23, 2024

👋 sorry, I'm not sure why I lost track of this thread.

If you're using stack v2 in order to mitigate segfaults, then unfortunately there's not much relief. Working on a fix.

@sanchda
Copy link
Contributor

sanchda commented Aug 16, 2024

Looking at stale issues. This should have been remediated in 2.10. Closing the issue, but please reopen if the issue persists.

@sanchda sanchda closed this as completed Aug 16, 2024
@leofukui
Copy link

leofukui commented Sep 9, 2024

Hi @sanchda,

It does not occur in 3.11 or earlier.

We are experiencing similar issues with Python 3.11. We are using dd-trace version 2.11.3 with both DD_PROFILING_STACK_V2_ENABLED and DD_PROFILING_EXPORT_LIBDD_ENABLED enabled, but it hasn't resolved the problem.

The application still intermittently exits with a segfault error. This issue has been happening for us since at least dd-trace version 2.5.1. Unfortunately, we've had to disable profiling and remove ddtrace-run from the stack.

Additionally, it's been difficult to find official information on why to enable these variables (DD_PROFILING_STACK_V2_ENABLED and DD_PROFILING_EXPORT_LIBDD_ENABLED) and how they are supposed to prevent crashes. Is there a resource we can refer to for more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Profiling Continous Profling stale
Projects
None yet
Development

No branches or pull requests

3 participants