Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Possible SIGSEGV from ddtrace or ddappsec #2030

Closed
NickStallman opened this issue Apr 28, 2023 · 26 comments
Closed

[BUG] Possible SIGSEGV from ddtrace or ddappsec #2030

NickStallman opened this issue Apr 28, 2023 · 26 comments
Labels
🐛 bug Something isn't working cat:app-crash

Comments

@NickStallman
Copy link

Bug description

We've had a bunch of SIGSEV errors which have been getting worse over the last few days.
Yesterday it started happening within minutes or seconds of a PHP-FPM 7.4.33 restart, and once it started then basically all PHP requests to that PHP-FPM pool failed.

PHP version

PHP 7.4.33 (cli) (built: Feb 14 2023 18:31:54) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with ddtrace v0.86.3, Copyright Datadog, by Datadog
    with ddappsec v0.7.2, Copyright Datadog, by Datadog

Tracer version

0.86.3

Installed extensions

[PHP Modules]
bcmath
calendar
Core
ctype
curl
date
ddappsec
ddtrace
dom
exif
FFI
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
igbinary
imap
intl
json
libxml
mbstring
msgpack
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
posix
readline
redis
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
standard
sysvmsg
sysvsem
sysvshm
tokenizer
xml
xmlreader
xmlwriter
xsl
zip
zlib

[Zend Modules]
ddappsec
ddtrace

OS info

PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy

Diagnostics and configuration

I have not been able to directly pin it to ddtrace or ddappsec, however after disabling both the issue has immediately vanished.
So that's about as close to as a smoking gun I seem to be able to get, which is unfortunate as ddappsec is doing a fantastic job.

I've managed to get a coredump from a crash, and it appears to be related to shutdown functions.

Stacktrace:
 #0  0x0000560956d95901 in zend_hash_add_new ()
 No symbol table info available.
 #1  0x0000560956db185f in ?? ()
 No symbol table info available.
 #2  0x0000560956d645a9 in zend_set_compiled_filename ()
 No symbol table info available.
 #3  0x0000560956d4e01a in open_file_for_scanning ()
 No symbol table info available.
 #4  0x0000560956d4e26d in compile_file ()
 No symbol table info available.
 #5  0x00007fe7aba5bdfd in ?? () from /usr/lib/php/20190902/phar.so
 No symbol table info available.
 #6  0x0000560956d4e357 in compile_filename ()
 No symbol table info available.
 #7  0x0000560956dd42d7 in ?? ()
 No symbol table info available.
 #8  0x0000560956df07ac in ?? ()
 No symbol table info available.
 #9  0x0000560956e0988c in execute_ex ()
 No symbol table info available.
 #10 0x0000560956d79884 in zend_call_function ()
 No symbol table info available.
 #11 0x0000560956c85794 in ?? ()
 No symbol table info available.
 #12 0x0000560956d7976d in zend_call_function ()
 No symbol table info available.
 #13 0x0000560956d79e52 in zend_lookup_class_ex ()
 No symbol table info available.
 #14 0x0000560956d7a773 in zend_fetch_class_by_name ()
 No symbol table info available.
 #15 0x0000560956dee5b7 in ?? ()
 No symbol table info available.
 #16 0x0000560956e07ef4 in execute_ex ()
 No symbol table info available.
 #17 0x0000560956d79884 in zend_call_function ()
 No symbol table info available.
 #18 0x0000560956d79b6c in _call_user_function_ex ()
 No symbol table info available.
 #19 0x0000560956cbe89d in ?? ()
 No symbol table info available.
 #20 0x0000560956d991da in zend_hash_apply ()
 No symbol table info available.
 #21 0x0000560956cc227e in php_call_shutdown_functions ()
 No symbol table info available.
 #22 0x0000560956d258a5 in php_request_shutdown ()
 No symbol table info available.
 #23 0x0000560956bf0e83 in ?? ()
 No symbol table info available.
 #24 0x00007fe7b1a06d90 in __libc_start_call_main (main=main@entry=0x560956befec0, argc=argc@entry=4, argv=argv@entry=0x7ffcaef1cd28) at ../sysdeps/nptl/libc_start_call_main.h:58

ddtrace and ddappsec have been installed for months now, but this issue has only popped up since the latest version was deployed. I have no explanation as to why it's slowly gotten worse over time however.
It's occurring over many different codebases (mostly Wordpress based, but totally different sites with very little in common).
And it's occurring over multiple different identical servers.
No other changes have been made to these servers for months, beyond keeping ddtrace and ddappsec up to date.

@NickStallman NickStallman added the 🐛 bug Something isn't working label Apr 28, 2023
@morrisonlevi
Copy link
Collaborator

Nick, thanks for the report. I tried to do a simple reproduction with no luck (shutdown callback auto-loads a class). Could you grab the .gdbinit file corresponding to your PHP version (e.g. https://github.com/php/php-src/blob/PHP-7.4/.gdbinit), and then when you have a core dump loaded, source /path/to/wherever/you/put/.gdbinit, and then run zbacktrace? This will report the PHP call stack, which is sometimes insightful.

@NickStallman
Copy link
Author

Thanks @morrisonlevi. Most of the crashes were occurring with Wordpress sites if that helps.

No luck with zbacktrace unfortunately. I'm not sure if it's because I'm not attached to a live gdb process or not? I'm using apport-retrace to run gdb over a crash dump. Below is what I'm doing with the noise trimmed out.
A little googling didn't seem to show anything.

# apport-retrace --gdb /var/crash/_usr_sbin_php-fpm7.4.0.crash  --rebuild-package-info
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Core was generated by `php-fpm: pool leytonre-3009                                                  '.
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) source .gdbinit
(Lots of this) Redefine command "set_ts"? (y or n) [answered Y; input not from terminal]
(gdb) zbacktrace
No symbol "basic_functions_module" in current context.
(gdb) 

@morrisonlevi
Copy link
Collaborator

Hmm, I haven't encountered that one before. Thanks for trying!

@bwoebi
Copy link
Collaborator

bwoebi commented May 1, 2023

@NickStallman Hey,

Your issue with zbacktrace is that you are missing debug symbols for your PHP installation.
Is it possible for you to share your coredump with our support?

It seems to us that your interned strings table is somehow broken.

@NickStallman
Copy link
Author

Thanks! I thought apport-retrace would fetch all the debug symbols but it looks like an extra step was needed.

This backtrace looks quite normal to me, and it's hitting the issue in the common Yoast SEO plugin.

(gdb) zbacktrace
[0x7fe7af41c7b0] Composer\Autoload\{closure}("/home/hillvite/public_html/wp-content/plugins/wordpress-seo/vendor/composer/../../lib/model.php") 
/home/hillvite/public_html/wp-content/plugins/meta-box/vendor/composer/ClassLoader.php:582 
[0x7fe7af41c6e0] Composer\Autoload\ClassLoader->loadClass("Yoast\WP\Lib\Model") /home/hillvite/public_html/wp-content/plugins/meta-box/vendor/composer/ClassLoader.php:433 
[0x7fe7af41c680] spl_autoload_call("Yoast\WP\Lib\Model") [internal function]
[0x7ffcaef1b8b0] ??? 
[0x7fe7af41c620] Yoast\WP\SEO\Repositories\Indexable_Repository->query() 
/home/hillvite/public_html/wp-content/plugins/wordpress-seo/src/repositories/indexable-repository.php:102 
[0x7fe7af41c290] Yoast\WP\SEO\Repositories\Indexable_Repository->for_current_page() 
/home/hillvite/public_html/wp-content/plugins/wordpress-seo/src/repositories/indexable-repository.php:146 
[0x7fe7af41c010] Yoast\WP\SEO\Memoizers\Meta_Tags_Context_Memoizer->for_current_page() 
/home/hillvite/public_html/wp-content/plugins/wordpress-seo/src/memoizers/meta-tags-context-memoizer.php:97 
[0x7fe7af41bf00] Yoast\WP\SEO\Integrations\Front_End\WP_Robots_Integration->get_robots_value() 
/home/hillvite/public_html/wp-content/plugins/wordpress-seo/src/integrations/front-end/wp-robots-integration.php:87 
[0x7fe7af41bd40] Yoast\WP\SEO\Integrations\Front_End\WP_Robots_Integration->add_robots(array(3)[0x7fe7af41bd90]) 
/home/hillvite/public_html/wp-content/plugins/wordpress-seo/src/integrations/front-end/wp-robots-integration.php:73 
[0x7fe7af41b9b0] WP_Hook->apply_filters(array(3)[0x7fe7af41ba00], array(1)[0x7fe7af41ba10]) /home/hillvite/public_html/wp-includes/class-wp-hook.php:308 
[0x7fe7af41b780] apply_filters("wp_robots", array(0)[0x7fe7af41b7e0]) /home/hillvite/public_html/wp-includes/plugin.php:205 
[0x7fe7af41b5e0] wp_robots() /home/hillvite/public_html/wp-includes/robots-template.php:32 
[0x7fe7af41b010] _default_wp_die_handler("<div class="wp-die-message"><p>There has been a critical error on this website.</p><p><a href="https://wordpress.org/documentation/art
icle/faq-troubleshooting/">Learn more about troubleshooting WordPress.</a></p></div>", "WordPress &rsaquo; Error", array(2)[0x7fe7af41b080]) 
/home/hillvite/public_html/wp-includes/functions.php:3798 
[0x7fe7af41ad00] wp_die(object[0x7fe7af41ad50], "", array(2)[0x7fe7af41ad70]) /home/hillvite/public_html/wp-includes/functions.php:3722 
[0x7fe7af41aa00] WP_Fatal_Error_Handler->display_default_error_template(array(4)[0x7fe7af41aa50], object[0x7fe7af41aa60]) 
/home/hillvite/public_html/wp-includes/class-wp-fatal-error-handler.php:239 
[0x7fe7af41a910] WP_Fatal_Error_Handler->display_error_template(array(4)[0x7fe7af41a960], object[0x7fe7af41a970]) 
/home/hillvite/public_html/wp-includes/class-wp-fatal-error-handler.php:154 
[0x7fe7af41a710] WP_Fatal_Error_Handler->handle() /home/hillvite/public_html/wp-includes/class-wp-fatal-error-handler.php:58 
[0x7ffcaef1bb70] ???

Or a different site, with the error occurring in a different plugin.

(gdb) zbacktrace
[0x7f89e4615550] analyst_resolve_sdk("/home/laingcom/public_html/wp-content/plugins/ultimate-social-media-icons/ultimate_social_media_icons.php") 
/home/laingcom/public_html/wp-content/plugins/ultimate-social-media-icons/analyst/sdk_resolver.php:78 
[0x7f89e46154a0] analyst_init(array(3)[0x7f89e46154f0]) /home/laingcom/public_html/wp-content/plugins/ultimate-social-media-icons/analyst/main.php:16 
[0x7f89e46153e0] (main) /home/laingcom/public_html/wp-content/plugins/ultimate-social-media-icons/ultimate_social_media_icons.php:16 
[0x7f89e4615220] (main) /home/laingcom/public_html/wp-settings.php:453 
[0x7f89e46151a0] (main) /home/laingcom/public_html/wp-config.php:95 
[0x7f89e4615100] (main) /home/laingcom/public_html/wp-load.php:50 
[0x7f89e4615070] (main) /home/laingcom/public_html/wp-blog-header.php:13 
[0x7f89e4615020] (main) /home/laingcom/public_html/index.php:17 

If it's the interned strings causing the issue, that would explain why once it started to occur, it would keep occurring non-stop. It would never fix the interened strings thus hit the same issue every single time.

I can confirm that after disabling ddtrace and ddappsec on Friday that there have been no further crashes so something is going on here. I may try getting a few more crash dumps out of hours now that I can get good backtraces.

I'll email a few dumps through to support.

@NickStallman
Copy link
Author

Ok now that I can get good backtraces I'm cautiously re-enabling it to see if I can get more data.

This backtrace looks different to the above backtrace but mentions ddtrace specifically.

(gdb) bt
#0  zend_mm_alloc_small (bin_num=0, heap=0x7f7b67200040) at ./Zend/zend_alloc.c:1255
#1  _emalloc_8 () at ./Zend/zend_alloc.c:2466
#2  0x0000565202c5c312 in init_op_array (op_array=op_array@entry=0x7f7b67203d78, type=type@entry=2 '\002', initial_ops_size=initial_ops_size@entry=64)
    at ./Zend/zend_opcode.c:53
#3  0x0000565202c541c7 in zend_compile_func_decl (result=0x0, ast=0x7f7b67290658, toplevel=<optimized out>) at ./Zend/zend_compile.c:5990
#4  0x0000565202c5350c in zend_compile_stmt (ast=0x7f7b67290658) at ./Zend/zend_compile.c:8548
#5  0x0000565202c53611 in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:5269
#6  zend_compile_stmt (ast=ast@entry=0x7f7b67290400) at ./Zend/zend_compile.c:8492
#7  0x0000565202c54ea1 in zend_compile_class_decl (ast=0x7f7b672908a0, toplevel=<optimized out>) at ./Zend/zend_compile.c:6480
#8  0x0000565202c56877 in zend_compile_top_stmt (ast=0x7f7b672908a0) at ./Zend/zend_compile.c:8467
#9  0x0000565202c568a0 in zend_compile_top_stmt (ast=0x7f7b672903b8) at ./Zend/zend_compile.c:8456
#10 0x0000565202c5697e in zend_compile_namespace (ast=<optimized out>) at ./Zend/zend_compile.c:6882
#11 0x0000565202c53688 in zend_compile_stmt (ast=ast@entry=0x7f7b672908e8) at ./Zend/zend_compile.c:8572
#12 0x0000565202c56824 in zend_compile_top_stmt (ast=0x7f7b672908e8) at ./Zend/zend_compile.c:8470
#13 0x0000565202c568a0 in zend_compile_top_stmt (ast=0x7f7b67294a70) at ./Zend/zend_compile.c:8456
#14 0x0000565202c2cc04 in zend_compile (type=type@entry=2) at Zend/zend_language_scanner.l:613
#15 0x0000565202c2e2da in compile_file (file_handle=0x7ffd735db0d0, type=8) at Zend/zend_language_scanner.l:649
#16 0x00007f7b63030f90 in _dd_compile_file (file_handle=0x7ffd735db0d0, type=8) at /home/circleci/datadog/tmp/build_extension/ext/engine_hooks.c:90
#17 0x00007f7b674502a7 in persistent_compile_file (file_handle=0x7ffd735db0d0, type=8) at ./ext/opcache/ZendAccelerator.c:2125
#18 0x00007f7b63010480 in zai_interceptor_compile_file (file_handle=0x7ffd735db0d0, type=8)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/resolver.c:35
#19 0x0000565202cb44d1 in zend_include_or_eval (inc_filename=0x7f7b53493ac8, type=16) at ./Zend/zend_execute.c:4285
#20 0x0000565202cd0082 in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER () at ./Zend/zend_vm_execute.h:4066
#21 0x0000565202ce7eb8 in execute_ex (ex=0x7f7b67200040) at ./Zend/zend_vm_execute.h:53811
#22 0x0000565202cf173f in zend_execute (op_array=0x7f7b672880e0, return_value=0x7ffd735db290) at ./Zend/zend_vm_execute.h:57617
#23 0x00007f7b63036b8a in dd_execute_php_file (filename=filename@entry=0x7f7b6726ec80 "/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php")
    at /home/circleci/datadog/tmp/build_extension/ext/request_hooks.c:82
#24 0x00007f7b63028b9a in zif_ddtrace_init (execute_data=<optimized out>, return_value=0x7ffd735db490) at /home/circleci/datadog/tmp/build_extension/ext/ddtrace.c:1335
#25 0x00007f7b6300f207 in zai_interceptor_execute_internal_impl (prev=false, return_value=0x7ffd735db490, execute_data=0x7f7b67213080)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:418
#26 zai_interceptor_execute_internal_no_prev (execute_data=0x7f7b67213080, return_value=0x7ffd735db490)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:424
#27 0x0000565202cedd3c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER () at ./Zend/zend_vm_execute.h:1620
#28 execute_ex (ex=0x7f7b67200040) at ./Zend/zend_vm_execute.h:53521
#29 0x0000565202cf173f in zend_execute (op_array=0x7f7b67288000, return_value=0x0) at ./Zend/zend_vm_execute.h:57617
#30 0x0000565202c687bc in zend_execute_scripts (type=type@entry=8, retval=0x7f7b53492420, retval@entry=0x0, file_count=1730228256, file_count@entry=3) at ./Zend/zend.c:1679
#31 0x0000565202c06ad8 in php_execute_script (primary_file=primary_file@entry=0x7ffd735dda90) at ./main/main.c:2675
#32 0x0000565202ad0e16 in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1940
(gdb) zbacktrace
[0x7f7b67213130] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/autoload.php:77 
[0x7f7b672130e0] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php:12 
[0x7f7b67213080] ddtrace_init("/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge") [internal function]
[0x7f7b67213020] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_wrap_autoloader.php:3

And another backtrace. This one has no zbacktrace at all.

(gdb) bt
#0  gc_mark_grey (stack=0x7fffd3376bb0, ref=0x1ef00000004) at ./Zend/zend_types.h:1041
#1  gc_mark_roots (stack=0x7fffd3376bb0) at ./Zend/zend_gc.c:977
#2  zend_gc_collect_cycles () at ./Zend/zend_gc.c:1452
#3  0x00007f63dee3d514 in ddtrace_serialize_closed_spans (serialized=serialized@entry=0x7fffd3377c60) at /home/circleci/datadog/tmp/build_extension/ext/span.c:707
#4  0x00007f63dee2d569 in ddtrace_flush_tracer (force_on_startup=force_on_startup@entry=false) at /home/circleci/datadog/tmp/build_extension/ext/auto_flush.c:14
#5  0x00007f63dee2bb5c in dd_force_shutdown_tracing () at /home/circleci/datadog/tmp/build_extension/ext/ddtrace.c:759
#6  0x00007f63dee2bc45 in zm_deactivate_ddtrace (type=<optimized out>, module_number=<optimized out>) at /home/circleci/datadog/tmp/build_extension/ext/ddtrace.c:778
#7  0x0000561aaacef9b4 in zend_deactivate_modules () at ./Zend/zend_API.c:2636
#8  0x0000561aaac87885 in php_request_shutdown (dummy=dummy@entry=0x0) at ./main/main.c:1945
#9  0x0000561aaab52e83 in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1968

I collected these with:

datadog.trace.enabled = On
datadog.appsec.enabled = Off

There appears to be no crashes experienced across any of our servers if ddappsec and ddtrace aren't loaded, or with this configuration.

datadog.trace.enabled = Off
datadog.appsec.enabled = Off

This is repeatable across many servers running Ubuntu 22.04.2 with the Ondrej PHP PPA.
It doesn't happen instantly, but it can occur as quickly as within a few minutes.

@bwoebi
Copy link
Collaborator

bwoebi commented May 3, 2023

Thanks for the backtraces.

Unfortunately the first backtrace of your last reply indicates that we have to deal with a memory corruption - and at that a corruption only happening after a few minutes. These are generally not debuggable with core dumps / backtraces.

There are a few possibilities:

  • We find a reproducer I can run locally to debug.
  • You manually compile php and ddtrace with asan (and use USE_ZEND_ALLOC=0).
  • You could try running php-fpm with USE_ZEND_ALLOC=0 (make sure to have clear_env=no in your fpm config) and hope getting more meaningful backtraces. (unlikely)
  • I'm also willing to jump on a (lengthy) call with you to get an asan build ready and try it out.

Just to make sure though, if you comment the extension = ddappsec.so out, you still get the issues?

@bwoebi
Copy link
Collaborator

bwoebi commented May 3, 2023

I also wonder, is the issue related to opcache? I.e., if you disable it, do you still get crashes?

@NickStallman
Copy link
Author

Ah I shouldn't have any problem with compiling a copy of PHP with ASAN, the only tricky bit is if I do it locally or on a scratch VM but in that case I might not be able to reproduce the issue, or do I do it on a production server.
I'll see if I can get any more clues by tinkering with configuration before looking at that next week maybe.

I did just realise that I might not be able to rule out ddappsec after all. I did have it disabled in the php.ini but may have still had it enabled in each FPM pool's config.
I've just been running a few tests in production and tweaking settings.
opcache on, ddtrace on, ddappsec off (verified properly this time), it ran for 20 minutes with no SIGSEGV.
opcache on, ddtrace on, ddappsec on, can have a SIGSEGV within 5 minutes.
Last Friday I was reproducing this issue really reliably but now it's still ocurring but less reliably so it's a little tricky for me to identify patterns.

Could this issue be related to either the opcache or interned strings cache being full, with ddappsec enabled by any chance?
We host a lot of PHP so I can run with a 4gig opcache and still have it full 100% of the time for example.
Or could it be related to specific hits reaching our server affecting ddappsec/opcache in a funny way, which could explain why the crash frequency seems to vary a bit?

It very likely is some interaction with ddtrace/ddappsec and opcache. I ran with just opcache over the weekend for safety with no datadog extensions and it's rock solid on it's own.
I could run some tests without opcache but it does quadruple the load on our servers so I don't want to be doing that too much.

I'm going to run with ddtrace on, ddappsec off, opcache on for a extended period of time today to verify if that is indeed stable.

@NickStallman
Copy link
Author

I have no clue why I keep getting different stack traces but this one looks quite interesting.

(gdb) bt
#0  zai_interceptor_pop_opline_before_binding () at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/resolver.c:118
#1  0x00007fbc666105ea in zai_interceptor_post_declare_handler (execute_data=0x7fbc6a813200)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/resolver.c:171
#2  0x00005600809748cd in ZEND_USER_OPCODE_SPEC_HANDLER () at ./Zend/zend_vm_execute.h:2651
#3  0x0000560080974cf7 in execute_ex (ex=0x7fbc6a873070) at ./Zend/zend_vm_execute.h:53597
#4  0x000056008097e73f in zend_execute (op_array=0x7fbc6a8841c0, return_value=0x7ffd8acac840) at ./Zend/zend_vm_execute.h:57617
#5  0x00007fbc66636b8a in dd_execute_php_file (filename=filename@entry=0x7fbc6a86ec30 "/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php")
    at /home/circleci/datadog/tmp/build_extension/ext/request_hooks.c:82
#6  0x00007fbc66628b9a in zif_ddtrace_init (execute_data=<optimized out>, return_value=0x7ffd8acaca40) at /home/circleci/datadog/tmp/build_extension/ext/ddtrace.c:1335
#7  0x00007fbc6660f207 in zai_interceptor_execute_internal_impl (prev=false, return_value=0x7ffd8acaca40, execute_data=0x7fbc6a813080)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:418
#8  zai_interceptor_execute_internal_no_prev (execute_data=0x7fbc6a813080, return_value=0x7ffd8acaca40)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:424
#9  0x000056008097ad3c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER () at ./Zend/zend_vm_execute.h:1620
#10 execute_ex (ex=0x7fbc6a873070) at ./Zend/zend_vm_execute.h:53521
#11 0x000056008097e73f in zend_execute (op_array=0x7fbc6a8840e0, return_value=0x0) at ./Zend/zend_vm_execute.h:57617
#12 0x00005600808f57bc in zend_execute_scripts (type=type@entry=8, retval=0x7fbc4735e420, retval@entry=0x0, file_count=1786851360, file_count@entry=3) at ./Zend/zend.c:1679
#13 0x0000560080893ad8 in php_execute_script (primary_file=primary_file@entry=0x7ffd8acaf040) at ./main/main.c:2675
#14 0x000056008075de16 in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1940
(gdb) zbacktrace
[0x7fbc6a813200] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/_generated_api.php:161 
[0x7fbc6a813130] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/autoload.php:77 
[0x7fbc6a8130e0] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php:12 
[0x7fbc6a813080] ddtrace_init("/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge") [internal function]
[0x7fbc6a813020] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_wrap_autoloader.php:3

@bwoebi
Copy link
Collaborator

bwoebi commented May 4, 2023

This last one again, is definitely
a) either an unrelated memory corruption
or maybe ...
b) do you have fatal errors in unrelated requests? (or, seeing fatal errors with DD_TRACE_DEBUG=1 (in your PHP log)?) - At least the first wordpress trace is after a fatal error.
c) ... or maybe, does it only ever happen after you had an appsec induced abort before?

With all these traces you've been showing me, I'm thinking we might be missing some critical cleanup handling after bailouts. Let me verify.

@NickStallman
Copy link
Author

NickStallman commented May 4, 2023

Oh well this isn't good, I just noticed I also had a PHP 8.1 crash dump file this morning.
We are only starting to migrate sites over to PHP 8.1 so the sample size is far smaller, but also the PHP-FPM 8.1 server is rather lightly loaded in comparison.
I was kind of hoping that 8.1 was unaffected as this would be a good excuse to push the upgrade. ;)

(gdb) bt
#0  _zend_hash_add_or_update_i (flag=8, pData=0x7ffe5dd0e5c0, key=0x7f6d0a7ad208, ht=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_hash.c:803
#1  zend_hash_add_new (ht=ht@entry=0x55ec4cd7d110 <compiler_globals+336>, key=key@entry=0x7f6d0a7ad208, pData=pData@entry=0x7ffe5dd0e5c0) at ./Zend/zend_hash.c:923
#2  0x000055ec4cb7750f in zend_add_interned_string (flags=0, interned_strings=<optimized out>, str=0x7f6d0a7ad208) at ./Zend/zend_string.c:180
#3  zend_new_interned_string_request (str=0x7f6d0a7ad208) at ./Zend/zend_string.c:255
#4  zend_new_interned_string_request (str=0x7f6d0a7ad208) at ./Zend/zend_string.c:217
#5  0x000055ec4cad220e in zval_make_interned_string (zv=0x7f6d0c85a260) at ./Zend/zend_compile.c:514
#6  zend_compile_prop_decl (attr_ast=0x0, flags=17, type_ast=0x0, ast=<optimized out>) at ./Zend/zend_compile.c:7329
#7  zend_compile_prop_group (ast=0x7f6d0c85a2f0) at ./Zend/zend_compile.c:7420
#8  zend_compile_stmt (ast=0x7f6d0c85a2f0) at ./Zend/zend_compile.c:10034
#9  0x000055ec4cad1e4f in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:6160
#10 zend_compile_stmt (ast=0x7f6d0c8a8cb0) at ./Zend/zend_compile.c:9975
#11 0x000055ec4cad6432 in zend_compile_class_decl (result=0x0, ast=<optimized out>, toplevel=<optimized out>) at ./Zend/zend_compile.c:7754
#12 0x000055ec4cad1c2c in zend_compile_top_stmt (ast=0x7f6d0c8fdea8) at ./Zend/zend_compile.c:9950
#13 0x000055ec4cad1c58 in zend_compile_top_stmt (ast=0x7f6d0c85a018) at ./Zend/zend_compile.c:9939
#14 0x000055ec4cab3f6e in zend_compile (type=type@entry=2) at ./Zend/zend_language_scanner.c:620
#15 0x000055ec4cab571a in compile_file (file_handle=0x7ffe5dd0ec10, type=8) at ./Zend/zend_language_scanner.c:656
#16 0x00007f6d0f632a60 in _dd_compile_file (file_handle=0x7ffe5dd0ec10, type=8) at /home/circleci/datadog/tmp/build_extension/ext/engine_hooks.c:90
#17 0x00007f6d0f610420 in zai_interceptor_compile_file (file_handle=0x7ffe5dd0ec10, type=8)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php8/resolver_pre-8_2.c:36
#18 0x000055ec4cb266ed in zend_include_or_eval (inc_filename_zv=<optimized out>, type=16) at ./Zend/zend_execute.c:4816
#19 0x000055ec4cb33299 in ZEND_INCLUDE_OR_EVAL_SPEC_OBSERVER_HANDLER () at ./Zend/zend_vm_execute.h:4939
#20 0x000055ec4cb5a145 in execute_ex (ex=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_vm_execute.h:56192
#21 0x000055ec4cae50d4 in zend_call_function (fci=fci@entry=0x7f6d0e6b2690, fci_cache=<optimized out>, fci_cache@entry=0x7f6d0e6b26d0) at ./Zend/zend_execute_API.c:929
#22 0x000055ec4cafda5b in zend_fcall_info_call (fci=0x7f6d0e6b2690, fcc=0x7f6d0e6b26d0, retval_ptr=retval_ptr@entry=0x7ffe5dd0eea0, args=args@entry=0x0)
    at ./Zend/zend_API.c:4058
#23 0x000055ec4ca9fce7 in php_output_handler_op (handler=0x7f6d0e62ef50, context=context@entry=0x7ffe5dd0ef30) at ./main/output.c:970
#24 0x000055ec4caa06da in php_output_stack_pop (flags=17) at ./main/output.c:1231
#25 0x000055ec4caa0822 in php_output_discard_all () at ./main/output.c:365
#26 0x000055ec4c91230a in php_error_cb (orig_type=1, error_filename=0x7f6d083d7a00, error_lineno=580, message=0x7f6d0839ab00) at ./main/main.c:1264
#27 0x00007f6d0f63df2d in ddtrace_error_cb (orig_type=1, error_filename=0x7f6d083d7a00, error_lineno=580, message=<optimized out>)
    at /home/circleci/datadog/tmp/build_extension/ext/serializer.c:1298
#28 0x000055ec4c917766 in zend_error_zstr_at (orig_type=orig_type@entry=1, error_filename=error_filename@entry=0x7f6d083d7a00, error_lineno=error_lineno@entry=580, 
    message=message@entry=0x7f6d0839ab00) at ./Zend/zend.c:1443
#29 0x000055ec4c917b12 in zend_error_va_list (orig_type=1, error_filename=0x7f6d083d7a00, error_lineno=580, format=<optimized out>, args=<optimized out>) at ./Zend/zend.c:1536
#30 0x000055ec4c917fb9 in zend_error_noreturn (type=type@entry=1, format=format@entry=0x55ec4cc44680 "Allowed memory size of %zu bytes exhausted (tried to allocate %zu bytes)")
    at ./Zend/zend.c:1643
#31 0x000055ec4c914d32 in zend_mm_safe_error (heap=heap@entry=0x7f6d13a00040, 
    format=format@entry=0x55ec4cc44680 "Allowed memory size of %zu bytes exhausted (tried to allocate %zu bytes)", limit=83886080, size=size@entry=5242880)
    at ./Zend/zend_alloc.c:382
#32 0x000055ec4c914e38 in zend_mm_alloc_huge (heap=0x7f6d13a00040, size=5242880) at ./Zend/zend_alloc.c:1814
#33 0x000055ec4cb01d67 in zend_hash_do_resize (ht=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_hash.c:1221
#34 0x000055ec4cb02126 in zend_hash_do_resize (ht=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_hash.c:1211
#35 _zend_hash_add_or_update_i (flag=8, pData=0x7ffe5dd0f4a0, key=0x7f6d0a7ad3e8, ht=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_hash.c:792
#36 zend_hash_add_new (ht=ht@entry=0x55ec4cd7d110 <compiler_globals+336>, key=key@entry=0x7f6d0a7ad3e8, pData=pData@entry=0x7ffe5dd0f4a0) at ./Zend/zend_hash.c:923
#37 0x000055ec4cb7750f in zend_add_interned_string (flags=0, interned_strings=<optimized out>, str=0x7f6d0a7ad3e8) at ./Zend/zend_string.c:180
#38 zend_new_interned_string_request (str=0x7f6d0a7ad3e8) at ./Zend/zend_string.c:255
#39 zend_new_interned_string_request (str=0x7f6d0a7ad3e8) at ./Zend/zend_string.c:217
#40 0x000055ec4cacf53a in zval_make_interned_string (zv=0x7f6d0c8161f8) at ./Zend/zend_compile.c:514
#41 zend_try_compile_cv (result=result@entry=0x7ffe5dd0f610, ast=<optimized out>, ast=<optimized out>) at ./Zend/zend_compile.c:2668
#42 0x000055ec4cada109 in zend_compile_simple_var (result=0x7ffe5dd0f610, ast=0x7f6d0c816208, type=1, delayed=<optimized out>) at ./Zend/zend_compile.c:2774
#43 0x000055ec4cad9146 in zend_compile_assign (ast=<optimized out>, ast=<optimized out>, result=0x7ffe5dd0f710) at ./Zend/zend_compile.c:3212
#44 zend_compile_expr_inner (result=0x7ffe5dd0f710, ast=<optimized out>) at ./Zend/zend_compile.c:10108
#45 0x000055ec4cad2272 in zend_compile_expr (ast=0x7f6d0c8162d8, result=0x7ffe5dd0f710) at ./Zend/zend_compile.c:10225
#46 zend_compile_stmt (ast=0x7f6d0c8162d8) at ./Zend/zend_compile.c:10067
#47 0x000055ec4cad1e4f in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:6160
#48 zend_compile_stmt (ast=0x7f6d0c816860) at ./Zend/zend_compile.c:9975
#49 0x000055ec4cae0974 in zend_compile_if (ast=0x7f6d0c816138) at ./Zend/zend_compile.c:5405
#50 0x000055ec4cad3598 in zend_compile_stmt (ast=0x7f6d0c816138) at ./Zend/zend_compile.c:10015
#51 0x000055ec4cad1e4f in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:6160
#52 zend_compile_stmt (ast=0x7f6d0c8179f0) at ./Zend/zend_compile.c:9975
#53 0x000055ec4cad5879 in zend_compile_func_decl (result=<optimized out>, ast=<optimized out>, toplevel=<optimized out>) at ./Zend/zend_compile.c:7278
#54 0x000055ec4cad1c84 in zend_compile_top_stmt (ast=0x7f6d0c817b70) at ./Zend/zend_compile.c:9946
--Type <RET> for more, q to quit, c to continue without paging--
#55 0x000055ec4cad1c58 in zend_compile_top_stmt (ast=0x7f6d0c818e10) at ./Zend/zend_compile.c:9939
#56 0x000055ec4cab3f6e in zend_compile (type=type@entry=2) at ./Zend/zend_language_scanner.c:620
#57 0x000055ec4cab571a in compile_file (file_handle=0x7ffe5dd10040, type=8) at ./Zend/zend_language_scanner.c:656
#58 0x00007f6d0f632a60 in _dd_compile_file (file_handle=0x7ffe5dd10040, type=8) at /home/circleci/datadog/tmp/build_extension/ext/engine_hooks.c:90
#59 0x00007f6d0f610420 in zai_interceptor_compile_file (file_handle=0x7ffe5dd10040, type=8)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php8/resolver_pre-8_2.c:36
#60 0x000055ec4cb266ed in zend_include_or_eval (inc_filename_zv=<optimized out>, type=16) at ./Zend/zend_execute.c:4816
#61 0x000055ec4cb33299 in ZEND_INCLUDE_OR_EVAL_SPEC_OBSERVER_HANDLER () at ./Zend/zend_vm_execute.h:4939
#62 0x000055ec4cb5a145 in execute_ex (ex=0x55ec4cd7d110 <compiler_globals+336>) at ./Zend/zend_vm_execute.h:56192
#63 0x000055ec4cb63d89 in zend_execute (op_array=0x7f6d13a62200, return_value=0x0) at ./Zend/zend_vm_execute.h:60151
#64 0x000055ec4caf3ad0 in zend_execute_scripts (type=type@entry=8, retval=retval@entry=0x0, file_count=file_count@entry=3) at ./Zend/zend.c:1845
#65 0x000055ec4ca8e9fa in php_execute_script (primary_file=primary_file@entry=0x7ffe5dd127a0) at ./main/main.c:2542
#66 0x000055ec4c935b2f in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1939
(gdb) zbacktrace
[0x7f6d13a14740] Gravity_Forms\Gravity_Forms\Libraries\Dom_Parser->get_injected_html() 
/home/perthpp/public_html/wp-content/plugins/gravityforms/includes/libraries/class-dom-parser.php:93 
[0x7f6d13a14650] GFForms->ensure_hook_js_output("", object[0x7f6d13a146b0]) /home/perthpp/public_html/wp-content/plugins/gravityforms/gravityforms.php:6721 
[0x7f6d13a143a0] (main) /home/perthpp/public_html/wp-admin/includes/admin.php:34 
[0x7f6d13a13020] (main) /home/perthpp/public_html/wp-admin/admin.php:97

This is the same server as the previous backtraces, same ddtrace/ddappsec install and a very similar configuration.

PHP 8.1.18 (cli) (built: Apr 14 2023 04:39:44) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.18, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.18, Copyright (c), by Zend Technologies
    with ddtrace v0.86.3, Copyright Datadog, by Datadog
    with ddappsec v0.7.2, Copyright Datadog, by Datadog

This one does nicely highlight a fatal out of memory error at line 31.
This site had 53 appsec traces in the last 24 hours, one trace was approximately around the crash timestamp, but it was not blocked. The closest blocked trace was about 45 minutes prior.

@bwoebi
Copy link
Collaborator

bwoebi commented May 5, 2023

If my hypothesis about the issue is right, PHP 8.2 actually should not be affected by the other crashes. I'll keep you updated.

This last backtrace is a bug in PHP itself though - it's not related to ddtrace at all. I've reported and fixed it: php/php-src#11189

@bwoebi
Copy link
Collaborator

bwoebi commented May 5, 2023

A self-contained reproducer exhibiting the problem on PHP 8.1 and older:

circleci(buster):~/app$ cat /tmp/t.php
<?php

ddtrace\trace_function("x", function() { class C extends D {} print "Continuing from traced function here...\n"; });

function x() {}

spl_autoload_register(function() use (&$s) {
	if ($s) {
		trigger_error("No D", E_USER_ERROR);
	} else {
		$s = true;
		x();
		class B {}
		print "Leaving Autoloader\n";
	}
});

class A extends B {}

circleci(buster):~/app$ DD_TRACE_DEBUG=1 DD_TRACE_CLI_ENABLED=1 php /tmp/t.php
Error raised in ddtrace's closure defined at /tmp/t.php:3 for x(): No D in /tmp/t.php on line 9
Leaving Autoloader
Continuing from traced function here...

A lot of different operations will exhibit this behavior: most importantly, a fatal error (bailout) needs to happen while a class is being autoloaded.

bwoebi added a commit that referenced this issue May 5, 2023
We fix this issue by ensuring that the opline is always connected to its proper execute_data and popping newer entries if they aren't matching.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
bwoebi added a commit that referenced this issue May 5, 2023
We fix this issue by ensuring that the opline is always connected to its proper execute_data and popping newer entries if they aren't matching.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
bwoebi added a commit that referenced this issue May 5, 2023
We fix this issue by ensuring that the opline is always connected to its proper execute_data and popping newer entries if they aren't matching.

Signed-off-by: Bob Weinand <bob.weinand@datadoghq.com>
@NickStallman
Copy link
Author

Awesome! That sounds like you are on to it. :)
That theory about a fatal error in an autoloader does sound plausible.

I did run that test case in our PHP 7.4 environment and it didn't seem to do anything unusual by the way. No crash.

Could it be a crash inside autoload AND shutdown functions? Quite a few of the crashes were shutdown function related.
That might have just been the shutdown function autoloading something.

@bwoebi
Copy link
Collaborator

bwoebi commented May 5, 2023

If you want to, you could try with the artifact from CI: https://output.circle-artifacts.com/output/job/850ba405-6510-419f-9fde-b5acdb9ad726/artifacts/0/datadog-setup.php

This reproducer does not crash (variations of it do). But the order of operations is very weird. (executing part of the x hook after it leaves the autoloader)

Could it be a crash inside autoload AND shutdown functions

Yes. Well, the autoloader corrupts the state, and the shutdown functions will read the corrupted state and explode.

@NickStallman
Copy link
Author

Thanks, I've deployed the latest build to one of my servers.

Your test script has gone from:

Leaving Autoloader
Continuing from traced function here...

to just

Leaving Autoloader

I'll report back with how we go today.

@NickStallman
Copy link
Author

Hmm still similar symptoms. This seems vaguely repeatable, I've gotten this exact same backtrace 3 times with 3 different php-fpm restarts in the same dd-trace autoload.php file.
It isn't occurring immediately, maybe lasting ~15-20 mins before starting to throw lots of SIGSEGV.
Looks like I'm not using the git version of the php files, but I assume that shouldn't be too relevant. I don't see much in the diff.

ddtrace = on, appsec = on produced this.
ddtrace = on, appsec = off seems somewhat more stable.
Maybe there's two issues, and one is fixed and one remaining is predominatently appsec related?
Once it starts crashing it keeps crashing, and across multiple separate php-fpm pools.

[08-May-2023 06:34:28] NOTICE: fpm is running, pid 771779
[08-May-2023 06:34:28] NOTICE: ready to handle connections
[08-May-2023 06:34:28] NOTICE: systemd monitor interval set to 10000ms
[08-May-2023 06:54:43] WARNING: [pool fullarto-3771] child 786095 exited with code 1 after 2.464070 seconds from start
[08-May-2023 06:54:43] NOTICE: [pool fullarto-3771] child 786118 started
[08-May-2023 06:54:44] WARNING: [pool fullarto-3771] child 786099 exited with code 1 after 3.207105 seconds from start
[08-May-2023 06:54:44] NOTICE: [pool fullarto-3771] child 786121 started
[08-May-2023 06:54:47] WARNING: [pool chasepg-3966] child 786130 exited with code 1 after 1.229913 seconds from start
[08-May-2023 06:54:47] NOTICE: [pool chasepg-3966] child 786142 started
[08-May-2023 06:54:49] WARNING: [pool fullarto-3771] child 786121 exited with code 1 after 4.521985 seconds from start
[08-May-2023 06:54:49] NOTICE: [pool fullarto-3771] child 786150 started
[08-May-2023 06:54:50] WARNING: [pool fullarto-3771] child 786118 exited with code 1 after 6.234323 seconds from start
[08-May-2023 06:54:50] NOTICE: [pool fullarto-3771] child 786158 started
[08-May-2023 06:54:50] WARNING: [pool chasepg-3966] child 786134 exited with code 1 after 4.463342 seconds from start
[08-May-2023 06:54:50] NOTICE: [pool chasepg-3966] child 786159 started
[08-May-2023 06:54:50] WARNING: [pool astras-4164] child 786151 exited with code 1 after 1.250458 seconds from start
[08-May-2023 06:54:50] NOTICE: [pool astras-4164] child 786162 started
(lots more after this)
(gdb) bt
#0  zend_mm_alloc_small (bin_num=0, heap=0x7f2ca2000040) at ./Zend/zend_alloc.c:1255
#1  zend_mm_alloc_heap (size=<optimized out>, heap=0x7f2ca2000040) at ./Zend/zend_alloc.c:1326
#2  _emalloc (size=<optimized out>) at ./Zend/zend_alloc.c:2539
#3  0x0000563863cfbcc3 in zend_compile_if (ast=0x7f2ca208ddd8) at ./Zend/zend_compile.c:4726
#4  0x0000563863cf95e8 in zend_compile_stmt (ast=0x7f2ca208ddd8) at ./Zend/zend_compile.c:8535
#5  0x0000563863cf9611 in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:5269
#6  zend_compile_stmt (ast=ast@entry=0x7f2ca208dbb0) at ./Zend/zend_compile.c:8492
#7  0x0000563863cf8f8a in zend_compile_foreach (ast=0x7f2ca208e1d0) at ./Zend/zend_compile.c:4699
#8  0x0000563863cf93ac in zend_compile_stmt (ast=0x7f2ca208e1d0) at ./Zend/zend_compile.c:8532
#9  0x0000563863cf9611 in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:5269
#10 zend_compile_stmt (ast=ast@entry=0x7f2ca208da98) at ./Zend/zend_compile.c:8492
#11 0x0000563863cfa759 in zend_compile_func_decl (result=<optimized out>, ast=<optimized out>, toplevel=<optimized out>) at ./Zend/zend_compile.c:6064
#12 0x0000563863cf950c in zend_compile_stmt (ast=0x7f2ca208e2b8) at ./Zend/zend_compile.c:8548
#13 0x0000563863cf9611 in zend_compile_stmt_list (ast=<optimized out>) at ./Zend/zend_compile.c:5269
#14 zend_compile_stmt (ast=ast@entry=0x7f2ca208d978) at ./Zend/zend_compile.c:8492
#15 0x0000563863cfaea1 in zend_compile_class_decl (ast=0x7f2ca208e300, toplevel=<optimized out>) at ./Zend/zend_compile.c:6480
#16 0x0000563863cfc877 in zend_compile_top_stmt (ast=0x7f2ca208e300) at ./Zend/zend_compile.c:8467
#17 0x0000563863cfc8a0 in zend_compile_top_stmt (ast=0x7f2ca208d930) at ./Zend/zend_compile.c:8456
#18 0x0000563863cfc97e in zend_compile_namespace (ast=<optimized out>) at ./Zend/zend_compile.c:6882
#19 0x0000563863cf9688 in zend_compile_stmt (ast=ast@entry=0x7f2ca208e348) at ./Zend/zend_compile.c:8572
#20 0x0000563863cfc824 in zend_compile_top_stmt (ast=0x7f2ca208e348) at ./Zend/zend_compile.c:8470
#21 0x0000563863cfc8a0 in zend_compile_top_stmt (ast=0x7f2ca2091a70) at ./Zend/zend_compile.c:8456
#22 0x0000563863cd2c04 in zend_compile (type=type@entry=2) at Zend/zend_language_scanner.l:613
#23 0x0000563863cd42da in compile_file (file_handle=0x7ffe648ec720, type=8) at Zend/zend_language_scanner.l:649
#24 0x00007f2c9dc32400 in _dd_compile_file (file_handle=0x7ffe648ec720, type=8) at /home/circleci/datadog/tmp/build_extension/ext/engine_hooks.c:90
#25 0x00007f2ca222b2a7 in persistent_compile_file (file_handle=0x7ffe648ec720, type=8) at ./ext/opcache/ZendAccelerator.c:2125
#26 0x00007f2c9dc11291 in zai_interceptor_compile_file (file_handle=0x7ffe648ec720, type=8)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/resolver.c:35
#27 0x0000563863d5a4d1 in zend_include_or_eval (inc_filename=0x7f2bae200ac8, type=16) at ./Zend/zend_execute.c:4285
#28 0x0000563863d76082 in ZEND_INCLUDE_OR_EVAL_SPEC_CONST_HANDLER () at ./Zend/zend_vm_execute.h:4066
#29 0x0000563863d8deb8 in execute_ex (ex=0x7f2ca2000040) at ./Zend/zend_vm_execute.h:53811
#30 0x0000563863d9773f in zend_execute (op_array=0x7f2ca20860e0, return_value=0x7ffe648ec8e0) at ./Zend/zend_vm_execute.h:57617
#31 0x00007f2c9dc37e6a in dd_execute_php_file (filename=filename@entry=0x7f2ca206fd20 "/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php")
    at /home/circleci/datadog/tmp/build_extension/ext/request_hooks.c:82
#32 0x00007f2c9dc29c1a in zif_ddtrace_init (execute_data=<optimized out>, return_value=0x7ffe648ecae0) at /home/circleci/datadog/tmp/build_extension/ext/ddtrace.c:1346
#33 0x00007f2c9dc0f847 in zai_interceptor_execute_internal_impl (prev=false, return_value=0x7ffe648ecae0, execute_data=0x7f2ca2013080)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:481
#34 zai_interceptor_execute_internal_no_prev (execute_data=0x7f2ca2013080, return_value=0x7ffe648ecae0)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/interceptor.c:487
#35 0x0000563863d93d3c in ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER () at ./Zend/zend_vm_execute.h:1620
#36 execute_ex (ex=0x7f2ca2000040) at ./Zend/zend_vm_execute.h:53521
#37 0x0000563863d9773f in zend_execute (op_array=0x7f2ca2086000, return_value=0x0) at ./Zend/zend_vm_execute.h:57617
#38 0x0000563863d0e7bc in zend_execute_scripts (type=type@entry=8, retval=0x7f2bae1ff420, retval@entry=0x0, file_count=-1576980448, file_count@entry=3) at ./Zend/zend.c:1679
#39 0x0000563863cacad8 in php_execute_script (primary_file=primary_file@entry=0x7ffe648ef0e0) at ./main/main.c:2675
#40 0x0000563863b76e16 in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1940
(gdb) zbacktrace
[0x7f2ca2013140] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/autoload.php:77 
[0x7f2ca20130e0] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_init.php:12 
[0x7f2ca2013080] ddtrace_init("/opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge") [internal function]
[0x7f2ca2013020] (main) /opt/datadog/dd-library/0.86.3/dd-trace-sources/bridge/dd_wrap_autoloader.php:3
PHP 7.4.33 (cli) (built: Feb 14 2023 18:31:54) ( NTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Zend OPcache v7.4.33, Copyright (c), by Zend Technologies
    with ddtrace v0.86.3+a0a6ef9ae4e73b10ac10d483cb6ea7df45b2aa36, Copyright Datadog, by Datadog
    with ddappsec v0.7.2, Copyright Datadog, by Datadog

@bwoebi
Copy link
Collaborator

bwoebi commented May 8, 2023

If you can provide the core dump for that trace, it might be helpful? Otherwise, as mentioned before, you might try:

  • You manually compile php and ddtrace with asan (and use USE_ZEND_ALLOC=0).
  • You could try running php-fpm with USE_ZEND_ALLOC=0 (make sure to have clear_env=no in your fpm config) and hope getting more meaningful backtraces. (unlikely)

@NickStallman
Copy link
Author

Thanks @bwoebi I've sent this particular crash dump via the ticket.

After running v0.86.3+a0a6ef9ae4e73b10ac10d483cb6ea7df45b2aa36 for a few days, I defininately think that's one bug gone, with one bug remaining.
I am not seeing any kind of issues with:

datadog.trace.enabled = On
datadog.appsec.enabled = Off

So I'm happy to say that configuration is stable.

But turning appsec.enabled to On brings back crashes, and at least this time they are all the same.
It's always crashing in the dd-library PHP code. I can make the exact same backtrace occur repeatedly.
It's probably a fair bet that it's related to how appsec terminates a script or something along those lines since I'd expect the dd-library code to be opcached basically immediately and it runs for several minutes before the issue pops up.

Let me see what I can do about an ASAN build on the same server. It might take a few days and I'll probably have to do the testing out of hours with the build since it'll probably involve some diisruption.

@bwoebi bwoebi closed this as completed in 6f6ff33 May 11, 2023
@bwoebi
Copy link
Collaborator

bwoebi commented May 12, 2023

Hey @NickStallman, so we just released 0.87.0, which includes all the fixes, including a new appsec release.

I don't know whether this changes anything, but appsec has reworked a lot of the shutdown sequence on their side. Could you please try it out with their new version?

At least from the crash dump investigation I have absolutely no idea where the issue could come from.

@NickStallman
Copy link
Author

Thanks, I'll go some testing this weekend when it's a little quieter and fingers crossed the other issue also vanishes.

@NickStallman
Copy link
Author

I've been running 0.87.0 over the weekend and have some results.

I did get 4,000 SIGSEGV's in the last 2 days, so there's still a bug somewhere.
However looking closely at them they seem to always occur when the request was already going to have a fatal error. I don't think that any SIGSEGV's occurred on what should have been a successful request. Not all fatal errors are doing it, I tried doing a few little test cases and couldn't get them to crash.
Importantly it doesn't seem to be snowballing anymore - one SIGSEGV isn't causing other requests to also crash.

Just for perspective, those 4,000 segfaults were out of 12.8 million requests and 100,000 appsec suspicious traces so quite a small portion of our traffic is crashing.

I think we can leave this ticket closed, and if I find any further information about this remaining issue then I'll make a separate ticket.
Thanks for your help!

@bwoebi
Copy link
Collaborator

bwoebi commented May 14, 2023

@NickStallman Do you have stacktraces for these? Is it still the same stacktrace we had for appsec starting with zend_mm_alloc_small?

By the way, something much less invasive you could try out: running fpm with rr record (see https://rr-project.org), then send us the rr pack'ed recording. (If performance counters are available on your target...).

I would definitely love to fix all these bugs, even if it's fataling anyway.

@NickStallman
Copy link
Author

Would rr make some very very big recordings? We deal with 40+ hits/second on a single server so if it records everything that wouldn't be good.

I do have a new observation, I got an alert that a specific account was crashing repeatedly and this account gets very low traffic.

[18-May-2023 09:21:48] WARNING: [pool ljcdix-3510] child 1823564 exited on signal 11 (SIGSEGV - core dumped) after 3.944620 seconds from start
[18-May-2023 09:21:48] NOTICE: [pool ljcdix-3510] child 1823597 started
[18-May-2023 09:21:49] WARNING: [pool ljcdix-3510] child 1823573 exited on signal 11 (SIGSEGV - core dumped) after 3.932763 seconds from start
[18-May-2023 09:21:49] NOTICE: [pool ljcdix-3510] child 1823601 started
[18-May-2023 09:21:51] WARNING: [pool ljcdix-3510] child 1823597 exited on signal 11 (SIGSEGV - core dumped) after 3.336217 seconds from start
[18-May-2023 09:21:51] NOTICE: [pool ljcdix-3510] child 1823620 started
[18-May-2023 09:21:53] WARNING: [pool ljcdix-3510] child 1823601 exited on signal 11 (SIGSEGV - core dumped) after 4.521710 seconds from start
[18-May-2023 09:21:53] NOTICE: [pool ljcdix-3510] child 1823724 started
[18-May-2023 09:21:54] WARNING: [pool ljcdix-3510] child 1823620 exited on signal 11 (SIGSEGV - core dumped) after 3.012951 seconds from start
[18-May-2023 09:21:54] NOTICE: [pool ljcdix-3510] child 1823738 started
[18-May-2023 09:21:57] WARNING: [pool ljcdix-3510] child 1823724 exited on signal 11 (SIGSEGV - core dumped) after 3.933260 seconds from start
[18-May-2023 09:21:57] NOTICE: [pool ljcdix-3510] child 1823788 started
[18-May-2023 09:21:58] WARNING: [pool ljcdix-3510] child 1823738 exited on signal 11 (SIGSEGV - core dumped) after 3.888516 seconds from start
[18-May-2023 09:21:58] NOTICE: [pool ljcdix-3510] child 1823793 started

The matching web server log was all a single IP, detected by appsec as a security scanner. (useragent trimmed)

95.142.x.x - - [18/May/2023:09:21:47 +1000] "GET /wp/wp-config.phpc HTTP/1.1" 404 48253 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:48 +1000] "GET /wp/wp-config.phpd HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:49 +1000] "GET /wp/wp-config.phpn HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:49 +1000] "GET /wp/wp-config.phpnew HTTP/1.1" 404 48269 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:51 +1000] "GET /wp/wp-config.phpold HTTP/1.1" 404 48269 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:52 +1000] "GET /wp/wp-config.phporiginal HTTP/1.1" 404 48309 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:53 +1000] "GET /wp/wp-config.phptmp HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:54 +1000] "GET /wp/wp-config.prod.php.txt HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:55 +1000] "GET /wp/wp-config.save HTTP/1.1" 404 48253 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:56 +1000] "GET /wp/wp-config.tar HTTP/1.1" 404 48245 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:57 +1000] "GET /wp/wp-config.temp HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:58 +1000] "GET /wp/wp-config.txt HTTP/1.1" 503 299 "-" "Mozilla/5.0"
95.142.x.x - - [18/May/2023:09:21:59 +1000] "GET /wp/wp-config.zip HTTP/1.1" 404 48245 "-" "Mozilla/5.0"

You can see how it keeps flip flopping between 404 and 503 errors. There was absolutely no other traffic of any kind at all.
I then added that IP to the Denylist in Datadog, and the crashes immediately ceased. All requests were returned as 403 perfectly. After the deny list all normal browser traffic was working correctly.

Unfortunately I could not get a usable backtrace as this occurred on a different server to the one I have all the debugging set up on.
However hopefully this reveals what code path in appsec is causing this crash. There's something specifically about appsec detecting a request but not blocking it which corrupts the internal state in some rare cases.

I have managed to get this backtrace a couple of times on the server set up for debugging which may be similar. It was again caused by a security scanner but the logs aren't quite as clear cut as the above example.

3.75.170.x - - [18/May/2023:10:53:33 +1000] "GET /property/19-glassop-street-balmain-nsw-2041/.git/HEAD HTTP/1.1" 503 299 "" "RepoLookoutBot/v1.1.0-94-ga3e2edc (abuse reports to abuse@repo-lookout.org)"
3.75.170.x - - [18/May/2023:10:53:47 +1000] "GET /property/19-glassop-balmain-nsw-2041/.git/HEAD HTTP/1.1" 404 46345 "" "RepoLookoutBot/v1.1.0-94-ga3e2edc (abuse reports to abuse@repo-lookout.org)"
(gdb) bt
#0  zend_mm_alloc_small (bin_num=0, heap=0x7fa9e4c00040) at ./Zend/zend_alloc.c:1255
#1  zend_mm_realloc_heap (copy_size=8, use_copy_size=0 '\000', size=<optimized out>, ptr=0x7fa9e4c59a00, heap=0x7fa9e4c00040) at ./Zend/zend_alloc.c:1552
#2  _erealloc (ptr=0x7fa9e4c59a00, size=8) at ./Zend/zend_alloc.c:2560
#3  0x000055e69845e611 in pass_two (op_array=0x7fa9e4c08700) at ./Zend/zend_opcode.c:896
#4  0x000055e69845578f in zend_compile_func_decl (result=<optimized out>, ast=<optimized out>, toplevel=<optimized out>) at ./Zend/zend_compile.c:6077
#5  0x000055e6984578cc in zend_compile_top_stmt (ast=0x7fa9e4ccd4b0) at ./Zend/zend_compile.c:8463
#6  0x000055e6984578a0 in zend_compile_top_stmt (ast=0x7fa9e4ce16b8) at ./Zend/zend_compile.c:8456
#7  0x000055e69842dc04 in zend_compile (type=type@entry=2) at Zend/zend_language_scanner.l:613
#8  0x000055e69842f2da in compile_file (file_handle=0x7ffd9c1740a0, type=8) at Zend/zend_language_scanner.l:649
#9  0x00007fa9e0a331e0 in _dd_compile_file (file_handle=0x7ffd9c1740a0, type=8) at /home/circleci/datadog/tmp/build_extension/ext/engine_hooks.c:90
#10 0x00007fa9e4ed32a7 in persistent_compile_file (file_handle=0x7ffd9c1740a0, type=8) at ./ext/opcache/ZendAccelerator.c:2125
#11 0x00007fa9e0a117a3 in zai_interceptor_compile_file (file_handle=0x7ffd9c1740a0, type=8)
    at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php7/resolver.c:63
#12 0x000055e69842f357 in compile_filename (type=type@entry=8, filename=filename@entry=0x7fa9e4c13ae0) at Zend/zend_language_scanner.l:670
#13 0x000055e6984b52d7 in zend_include_or_eval (inc_filename=inc_filename@entry=0x7fa9e4c13ae0, type=8) at ./Zend/zend_execute.c:4315
#14 0x000055e6984d1405 in ZEND_INCLUDE_OR_EVAL_SPEC_TMPVAR_HANDLER () at ./Zend/zend_vm_execute.h:13715
#15 0x000055e6984e948c in execute_ex (ex=0x55e698557be0 <bin_data_size>) at ./Zend/zend_vm_execute.h:54871
#16 0x000055e6984f273f in zend_execute (op_array=0x7fa9e4c840e0, return_value=0x0) at ./Zend/zend_vm_execute.h:57617
#17 0x000055e6984697bc in zend_execute_scripts (type=type@entry=8, retval=0x7fa9e4ce8240, retval@entry=0x0, file_count=-457099008, file_count@entry=3) at ./Zend/zend.c:1679
#18 0x000055e698407ad8 in php_execute_script (primary_file=primary_file@entry=0x7ffd9c176860) at ./main/main.c:2675
#19 0x000055e6982d1e16 in main (argc=<optimized out>, argv=<optimized out>) at ./sapi/fpm/fpm/fpm_main.c:1940
(gdb) zbacktrace
[0x7fa9e4c13900] (main) /home/lifestyl/public_html/wp-settings.php:34 
[0x7fa9e4c13750] (main) /home/lifestyl/public_html/wp-config.php:78 
[0x7fa9e4c131b0] (main) /home/lifestyl/public_html/wp-load.php:50 
[0x7fa9e4c130a0] (main) /home/lifestyl/public_html/wp-blog-header.php:13 
[0x7fa9e4c13020] (main) /home/lifestyl/public_html/index.php:17 

This is a crash very early in the WP boot sequence and each time I've seen a backtrace that looks like this it is always on this line:

require ABSPATH . WPINC . '/load.php';

@bwoebi
Copy link
Collaborator

bwoebi commented May 18, 2023

rr does relatively moderately sized recordings, it mostly scales with the amount of I/O traffic your application does.

Likely it's always at that line, because this is the first time during your script execution that a second allocation of size 8 is done within your application. (This information I could extract from an earlier core dump.) Which seems to be the corrupted pointer. The question is just why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working cat:app-crash
Projects
None yet
Development

No branches or pull requests

3 participants