Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] segfault in php-fpm after several page hits with ddtrace various versions #1797

Open
gjcarrette opened this issue Nov 16, 2022 · 6 comments
Labels
🐛 bug Something isn't working

Comments

@gjcarrette
Copy link

gjcarrette commented Nov 16, 2022

Bug description

In a production environment with nginx using fastcgi to talk to php-fpm there are different segfault SEGV manifestations depending on the version of ddtrace-trace-php. In all cases the SEGV as recorded by systemd coredumpctl is not the first page hit, but a later page hit. The page hit can be determined because the auto_prepend_file uses file_get_contents and file_put_contents on /proc/self/comm in order to keep a page hit counter that will be recorded by the kernel segfault handler.

Output of coredumpctl info will be attached.

There probably won't be sufficient information in this report to reproduce or fix a bug, but further investigation using valgrind and gdb will be forthcoming and the issue will be updated.
coredumpctl-info-ddtrace-080.txt
coredumpctl-info-ddtrace-071.txt

The SEGV with ddtrace 0.71.0 is pretty consistent inside zm_activate_ddtrace
But with 0.80.0 it appears that the stack has been corrupted with the SEGV inside some memory management subroutine.

Tests will be run with ddtrace 0.81 and from a source build, to be determined.

PHP version

PHP 8.1.10 (cli) (built: Aug 30 2022 16:09:36) (NTS gcc x86_64)
Copyright (c) The PHP Group
Zend Engine v4.1.10, Copyright (c) Zend Technologies
with Zend OPcache v8.1.10, Copyright (c), by Zend Technologies

The PHP itself and all the extensions are from the Remi build, except for
datadog-php-tracer-0.71.0-1.x86_64
php81-pdo-sqlrelay-1.8.1-4.x86_64

The pdo_sqlrelay is from https://github.com/wayfair-incubator/pdo_sqlrelay_builder

Tracer version

0.71.0

Installed extensions

[PHP Modules]
amqp
apcu
bcmath
bz2
calendar
Core
ctype
curl
date
dba
ddtrace
dom
exif
fileinfo
filter
ftp
gd
gettext
gmp
gnupg
hash
http
iconv
igbinary
imagick
imap
intl
json
ldap
libxml
lz4
mbstring
memcache
memcached
msgpack
mustache
mysqli
mysqlnd
odbc
openssl
pcntl
pcre
PDO
pdo_dblib
pdo_mysql
PDO_ODBC
pdo_sqlite
pdo_sqlrelay
pdo_sqlsrv
Phar
posix
raphf
rdkafka
readline
redis
Reflection
session
shmop
SimpleXML
soap
sockets
sodium
SPL
sqlite3
sqlsrv
standard
sysvmsg
sysvsem
sysvshm
tokenizer
xml
xmlreader
xmlrpc
xmlwriter
xsl
yaml
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache
ddtrace

OS info

NAME="Rocky Linux"
VERSION="8.6 (Green Obsidian)"
VERSION_ID="8.6"
PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
ROCKY_SUPPORT_PRODUCT_VERSION="8"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Diagnostics and configuration

Output of phpinfo() (ddtrace >= 0.47.0)

{"date":"2022-11-16T00:34:23Z","os_name":"Linux admin8-reserve-iad1-txp4.c.wf-gcp-us-adminhome-prod.internal 4.18.0-372.19.1.el8_6.cloud.x86_64 #1 SMP Tue Aug 9 18:27:58 UTC 2022 x86_64","os_version":"4.18.0-372.19.1.el8_6.cloud.x86_64","version":"0.71.0","lang":"php","lang_version":"8.1.0RC1","env":"prod","enabled":true,"service":"admin8","enabled_cli":false,"agent_url":"http://localhost:8126","debug":false,"analytics_enabled":false,"sample_rate":1,"sampling_rules":[],"tags":{"datacenter":"iad1","wf_hostname":"admin8-reserve-iad1-txp4.c.wf-gcp-us-adminhome-prod.internal"},"service_mapping":[],"distributed_tracing_enabled":true,"priority_sampling_enabled":true,"dd_version":"94037fe84549072c865","architecture":"x86_64","sapi":"fpm-fcgi","datadog.trace.request_init_hook":"/opt/datadog-php/dd-trace-sources/bridge/dd_wrap_autoloader.php","open_basedir_configured":false,"uri_fragment_regex":null,"uri_mapping_incoming":null,"uri_mapping_outgoing":null,"auto_flush_enabled":false,"generate_root_span":true,"http_client_split_by_domain":false,"measure_compile_time":true,"report_hostname_on_root_span":false,"traced_internal_functions":null,"auto_prepend_file_configured":true,"integrations_disabled":"default","enabled_from_env":true,"opcache.file_cache":null}

@gjcarrette gjcarrette added the 🐛 bug Something isn't working label Nov 16, 2022
@bwoebi
Copy link
Collaborator

bwoebi commented Nov 16, 2022

Hey @gjcarrette,

thanks for the report!
A gdb/valgrind trace would indeed be very appreciated. It would be great, if the crashes can be reproduced on the latest version of dd-trace-php. (0.71.0 is quite old and core structures have been changed since then)

@bwoebi
Copy link
Collaborator

bwoebi commented Nov 16, 2022

Ah, just seeing the crash dump for 0.80.0 has been added. Given the source function (_emalloc and zend_mm_gc mostly), this is most likely some use-after-free issue, for which a valgrind report would be most useful (with USE_ZEND_ALLOC=0 set).

@gjcarrette
Copy link
Author

The Foundational Platform Operations team wanted to try out the 0.81 release, and it exhibited a slightly different pattern of segfault revealed by coredumpctl info.

Getting some valgrind and other tests from the Software Development Environment (SDE) is still on the todo list.

@bwoebi
Copy link
Collaborator

bwoebi commented Nov 17, 2022

Hey, thanks for the update!

I wonder whether you could open a dump (of the 0.81 crashes) with gdb and get exact line numbers (gdb php-fpm -c path/to/corefile, then bt for backtrace)? It seems to me like coredumpctl is not able to.

Given the location of that crash, are you using preloading?

@gjcarrette
Copy link
Author

gjcarrette commented Nov 26, 2022

This being in the middle of CYBER-5, I wanted to show that there was some activity to get a valgrind going, in our Software Development Environment (SDE) because I'm not touching the production environment right now. Running valgrind with php-fpm is a bit uglier (having done it) so I am starting out by capturing the fastcgi protocol and using my decoder to turn it into the set of environment variables and stdin content for running tests using the php-cgi command line, that being a close easy test plan for a single page hit. The page hit under test is fairly vanilla idempotent GET request returning html in an internally facing customer
support web application.

The only ddtrace interesting thing I found so far is this, using --tool=memcheck with --leak-check=full
and just about this only thing this shows is that datadog-php-tracer-0.71.0-1.x86_64
had a leak. I will have more next week, with more symbols loaded etc.

416 bytes in 13 blocks are definitely lost in loss record 99 of 130
at 0x4C378C3: operator new(unsigned long) (vg_replace_malloc.c:422)
by 0x22949DF2: ???
by 0x177C226F: ???
by 0x227106DD: zif_ddtrace_internal_function_handler (engine_hooks.c:815)
by 0x96929A4: ???
by 0x20C273: ??? (in /usr/bin/php-cgi)
by 0x43FAC3: execute_ex (in /usr/bin/php-cgi)
by 0x96920D9: ???
by 0x20C1DE: ??? (in /usr/bin/php-cgi)
by 0x43FAC3: execute_ex (in /usr/bin/php-cgi)
by 0x96920D9: ???
by 0x20C1DE: ??? (in /usr/bin/php-cgi)

856 bytes in 101 blocks are definitely lost in loss record 105 of 130
at 0x4C37135: malloc (vg_replace_malloc.c:381)
by 0x3A2D7C: __zend_malloc (in /usr/bin/php-cgi)
by 0x3A7DE2: _estrndup (in /usr/bin/php-cgi)
by 0x177BF823: ???
by 0x22949C7F: ???
by 0x177C226F: ???
by 0x227106DD: zif_ddtrace_internal_function_handler (engine_hooks.c:815)
by 0x96929A4: ???
by 0x20C273: ??? (in /usr/bin/php-cgi)
by 0x43FAC3: execute_ex (in /usr/bin/php-cgi)
by 0x96920D9: ???
by 0x20C1DE: ??? (in /usr/bin/php-cgi)

@bwoebi
Copy link
Collaborator

bwoebi commented Jan 10, 2023

@gjcarrette Did you have a chance to look further into it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants