Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests segfault when run manually, but not when run via run-tests.php #138

Closed
nacc opened this issue Mar 1, 2016 · 12 comments
Closed

Tests segfault when run manually, but not when run via run-tests.php #138

nacc opened this issue Mar 1, 2016 · 12 comments

Comments

@nacc
Copy link

nacc commented Mar 1, 2016

Feel free to ignore this report, but I would appreciate any insight you can provide.

Running make test from the php-imagick git tree (HEAD @ 15527aa), all tests pass:

=====================================================================
TEST RESULT SUMMARY
---------------------------------------------------------------------
Exts skipped    :    0
Exts tested     :   13
---------------------------------------------------------------------

Number of tests :  251               247
Tests skipped   :    4 (  1.6%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Expected fail   :    0 (  0.0%) (  0.0%)
Tests passed    :  247 ( 98.4%) (100.0%)
---------------------------------------------------------------------
Time taken      :   97 seconds

For context, the Debian packaging of php-imagick is trying to move to using phpunit to run the tests, though, and I have it mostly working (using the phpt testrunner). But I see a few segfaults, specifically with tests 116, 154 and 161.

I went back to the git tree, and did the following:

TEST_PHP_EXECUTABLE=`which php` php run-tests.php --verbose --keep-all tests/116_Imagick_separateImageChannel_basic.phpt

which gave

=====================================================================
PHP         : /usr/bin/php 
PHP_SAPI    : cli
PHP_VERSION : 7.0.3-7ubuntu1
ZEND_VERSION: 3.0.0
PHP_OS      : Linux - Linux xenial 4.5.0-rc4+ #10 SMP Tue Feb 16 18:38:17 PST 2016 x86_64
INI actual  : /etc/php/7.0/cli/php.ini
More .INIs  : /etc/php/7.0/cli/conf.d/10-opcache.ini,/etc/php/7.0/cli/conf.d/10-pdo.ini,/etc/php/7.0/cli/conf.d/20-calendar.ini,/etc/php/7.0/cli/conf.d/20-ctype.ini,/etc/php/7.0/cli/conf.d/20-exif.ini,/etc/php/7.0/cli/conf.d/20-fileinfo.ini,/etc/php/7.0/cli/conf.d/20-ftp.ini,/etc/php/7.0/cli/conf.d/20-gettext.ini,/etc/php/7.0/cli/conf.d/20-iconv.ini,/etc/php/7.0/cli/conf.d/20-imagick.ini,/etc/php/7.0/cli/conf.d/20-json.ini,/etc/php/7.0/cli/conf.d/20-phar.ini,/etc/php/7.0/cli/conf.d/20-posix.ini,/etc/php/7.0/cli/conf.d/20-readline.ini,/etc/php/7.0/cli/conf.d/20-shmop.ini,/etc/php/7.0/cli/conf.d/20-sockets.ini,/etc/php/7.0/cli/conf.d/20-sysvmsg.ini,/etc/php/7.0/cli/conf.d/20-sysvsem.ini,/etc/php/7.0/cli/conf.d/20-sysvshm.ini,/etc/php/7.0/cli/conf.d/20-tokenizer.ini  
CWD         : /root/mkoppanen/imagick
Extra dirs  : 
VALGRIND    : Not used
=====================================================================
Running selected tests.

=================
TEST /root/mkoppanen/imagick/tests/116_Imagick_separateImageChannel_basic.phpt
TEST 1/1 [tests/116_Imagick_separateImageChannel_basic.phpt]
CONTENT_LENGTH  = 
CONTENT_TYPE    = 
PATH_TRANSLATED = /root/mkoppanen/imagick/tests/116_Imagick_separateImageChannel_basic.php
QUERY_STRING    = 
REDIRECT_STATUS = 1
REQUEST_METHOD  = GET
SCRIPT_FILENAME = /root/mkoppanen/imagick/tests/116_Imagick_separateImageChannel_basic.php
HTTP_COOKIE     = 
COMMAND /usr/bin/php   -d "output_handler=" -d "open_basedir=" -d "safe_mode=0" -d "disable_functions=" -d "output_buffering=Off" -d "error_reporting=32767" -d "display_errors=1" -d "display_startup_errors=1" -d "log_errors=0" -d "html_errors=0" -d "track_errors=1" -d "report_memleaks=1" -d "report_zend_debug=0" -d "docref_root=" -d "docref_ext=.html" -d "error_prepend_string=" -d "error_append_string=" -d "auto_prepend_file=" -d "auto_append_file=" -d "ignore_repeated_errors=0" -d "precision=14" -d "memory_limit=128M" -d "log_errors_max_len=0" -d "opcache.fast_shutdown=0" -d "opcache.file_update_protection=0" -d "session.auto_start=0" -f "/root/mkoppanen/imagick/tests/116_Imagick_separateImageChannel_basic.php"  2>&1
PASS Test Imagick, separateImageChannel [tests/116_Imagick_separateImageChannel_basic.phpt] 
=====================================================================
Number of tests :    1                 1
Tests skipped   :    0 (  0.0%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Expected fail   :    0 (  0.0%) (  0.0%)
Tests passed    :    1 (100.0%) (100.0%)
---------------------------------------------------------------------
Time taken      :    0 seconds
=====================================================================

But when I run the command as provided, I get:

usr/bin/php   -d "output_handler=" -d "open_basedir=" -d "safe_mode=0" -d "disable_functions=" -d "output_buffering=Off" -d "error_reporting=32767" -d "display_errors=1" -d "display_startup_errors=1" -d "log_errors=0" -d "html_errors=0" -d "track_errors=1" -d "report_memleaks=1" -d "report_zend_debug=0" -d "docref_root=" -d "docref_ext=.html" -d "error_prepend_string=" -d "error_append_string=" -d "auto_prepend_file=" -d "auto_append_file=" -d "ignore_repeated_errors=0" -d "precision=14" -d "memory_limit=128M" -d "log_errors_max_len=0" -d "opcache.fast_shutdown=0" -d "opcache.file_update_protection=0" -d "session.auto_start=0" -f "/root/mkoppanen/imagick/tests/116_Imagick_separateImageChannel_basic.php"  2>&1
OkSegmentation fault

As I believe this is basically what phpunit runs (if I add all the ini modifications to the invocation) I think the root cause of this segfault and the segfaults under phpunit are the same. Am I missing some environment variable that needs to be set when running php directly? I've read through run-tests.php but nothing stood out immediately.

@nacc
Copy link
Author

nacc commented Mar 1, 2016

Also for reference, if I remove the calls to separateImageChannel() and textureImage() from the relevant testscases, they do not segfault.

@Danack
Copy link
Collaborator

Danack commented Mar 1, 2016

For context, the Debian packaging of php-imagick is trying to move to using phpunit to run the tests,

......................O-kaaaaay.

Do you happen to have a link to where that decision was made, or justified? Adding an extra piece of software to the mix, just seems to make it harder for people to report issues - like this.

Am I missing some environment variable that needs to be set when running php directly?

If the extension can't be loaded, all of the tests would fail. If the extension is loaded all of them should pass. There are no subtle environment settings.

But I see a few segfaults, specifically with tests 116, 154 and 161.

If you have valgrind installed running the tests with the -m flag for memory checks will use valgrind - which just so happens to also give better information about segfaults.

TEST_PHP_EXECUTABLE=which phpphp run-tests.php -m --verbose --keep-all tests/116_Imagick_separateImageChannel_basic.phpt

Or just to use it directly: valgrind php foo.php

If valgrind doesn't show anything, running the a test script through gdb almost certainly should:

gdb -batch -ex "run" -ex "bt" --args /home/travis/.phpenv/versions/5.6/bin/php -d extension=imagick.so -d extension_dir=modules ./debug.php 2>&1

Those will almost certainly show exactly where the problem is occurring.

@Danack
Copy link
Collaborator

Danack commented Mar 1, 2016

btw - I normally test using Centos, in particular the very stable 6.4.

I have seen errors when running all the tests in Ubuntu, apparently due to problems in underlying libraries.

@nacc
Copy link
Author

nacc commented Mar 1, 2016

On Tue, Mar 1, 2016 at 1:31 PM, Danack notifications@github.com wrote:

For context, the Debian packaging of php-imagick is trying to move to using phpunit to run the tests,

......................O-kaaaaay.

Do you happen to have a link to where that decision was made, or justified?
Adding an extra piece of software to the mix, just seems to make it harder
for people to report issues - like this.

http://anonscm.debian.org/cgit/pkg-php/php-imagick.git/commit/?h=master-7.0&id=505ceab300d43b268ad44cd35195eb174de14e4b

was the commit where it was done. It wasn't very well thought out,
IMO, as the tests haven't worked since they did that.

I think the reasoning is that Debian/Ubuntu don't have the
run-tests.php script in the binary package. So to do autopkgtest style
testing we'd either have to package that script, or do a configure to
generate it. I think to minimize the difference to the tests run
upstream, I might just go ahead and revert that change.

Am I missing some environment variable that needs to be set when running php directly?

If the extension can't be loaded, all of the tests would fail. If the extension is
loaded all of them should pass. There are no subtle environment settings.

Ok, well, the extension is installed at this point, in /usr/share/php.

But I see a few segfaults, specifically with tests 116, 154 and 161.

If you have valgrind installed running the tests with the -m flag for memory checks will use valgrind - which just so happens to also give better information about segfaults.

TEST_PHP_EXECUTABLE=which phpphp run-tests.php -m --verbose --keep-all tests/116_Imagick_separateImageChannel_basic.phpt

Or just to use it directly: valgrind php foo.php

If valgrind doesn't show anything, running the a test script through gdb almost certainly should:

gdb -batch -ex "run" -ex "bt" --args /home/travis/.phpenv/versions/5.6/bin/php -d extension=imagick.so -d extension_dir=modules ./debug.php 2>&1

Those will almost certainly show exactly where the problem is occurring.

I will try that. I believe when I did the valgrind runs before, it
just reported that something was writing at random addresses, without
any resolution as to where.

@Danack
Copy link
Collaborator

Danack commented Mar 1, 2016

I believe when I did the valgrind runs before, it
just reported that something was writing at random addresses,

Feel free to post that info here. Sometimes the meaning of the output is not as clear as one might hope for.

In particular, if you have OpenMP on the system and ImageMagick has decided to use more than one thread, errors can occur in things like the shut down function.

@Danack
Copy link
Collaborator

Danack commented Mar 2, 2016

IMO, as the tests haven't worked since they did that.

btw - the switching tests will almost certainly make no difference. The errors are incredibly likely to be due to invalid memory accesses in ImageMagick, which "just happen" to work when the tests are run in one environment, but "just happen" to fail in another.

@nacc
Copy link
Author

nacc commented Mar 2, 2016

# valgrind php 116_Imagick_separateImageChannel_basic.phpt
==20833== Memcheck, a memory error detector
==20833== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==20833== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==20833== Command: php 116_Imagick_separateImageChannel_basic.phpt
==20833== 
--TEST--
Test Imagick, separateImageChannel
--SKIPIF--
--FILE--
Ok--EXPECTF--
Ok==20833== Thread 4:
==20833== Jump to the invalid address stated on the next line
==20833==    at 0xC63BB29: ???
==20833==    by 0xC639417: ???
==20833==    by 0x3: ???
==20833==  Address 0xc63bb29 is not stack'd, malloc'd or (recently) free'd
==20833== 
==20833== 
==20833== Process terminating with default action of signal 11 (SIGSEGV)
==20833==  Access not within mapped region at address 0xC63BB29
==20833==    at 0xC63BB29: ???
==20833==    by 0xC639417: ???
==20833==    by 0x3: ???
==20833==  If you believe this happened as a result of a stack
==20833==  overflow in your program's main thread (unlikely but
==20833==  possible), you can try to increase the size of the
==20833==  main thread stack using the --main-stacksize= flag.
==20833==  The main thread stack size used in this run was 8388608.
==20833== 
==20833== HEAP SUMMARY:
==20833==     in use at exit: 1,753,597 bytes in 13,543 blocks
==20833==   total heap usage: 41,285 allocs, 27,742 frees, 20,790,490 bytes allocated
==20833== 
==20833== LEAK SUMMARY:
==20833==    definitely lost: 2,192 bytes in 5 blocks
==20833==    indirectly lost: 10,496 bytes in 10 blocks
==20833==      possibly lost: 1,127,784 bytes in 10,110 blocks
==20833==    still reachable: 613,125 bytes in 3,418 blocks
==20833==                       of which reachable via heuristic:
==20833==                         newarray           : 1,536 bytes in 16 blocks
==20833==         suppressed: 0 bytes in 0 blocks
==20833== Rerun with --leak-check=full to see details of leaked memory
==20833== 
==20833== For counts of detected and suppressed errors, rerun with: -v
==20833== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Killed

@Danack
Copy link
Collaborator

Danack commented Mar 2, 2016

Use the Force, Luke.

Ok==20833== Thread 4:
==20833== Jump to the invalid address stated on the next line
==20833==    at 0xC63BB29: ???
==20833==    by 0xC639417: ???
==20833==    by 0x3: ???

So. That doesn't tell us what is the cause but it tells us:

  • The error probably isn't occurring inside PHP. The stack trace is far too small for that, and most PHP functions would usually have 'some' information, as well as be a far larger stack.
  • It appears to be happening in a thread that isn't number 1.....which implies there are more that one thread active.

You could try editing the policy.xml file for ImageMagick which should be somewhere in /usr or /etc and set the maximum number of threads to be 1 with:

<policy domain="resource" name="thread" value="1"/> 

Or recompiling ImageMagick with --disable-openmp to disable using threads - although it might work for some people it seems to not be entirely stable, which is why I recommend disabling it https://github.com/mkoppanen/imagick#openmp

Also, are you testing against a PHP version that has ZTS (aka ZendThreadSafe aka support for threads) enabled?

If that doesn't provide any information....I apparently have a test program at https://github.com/Danack/imagemagicktest/blob/master/channel.c

I'm not sure the makefile for that is going to work on anyone elses system, but it might! doing make channel should produce a test program that can be run through ./channel if you tweak the program to point it to an image.....

@nacc
Copy link
Author

nacc commented Mar 2, 2016

On Wed, Mar 2, 2016 at 3:21 PM, Danack notifications@github.com wrote:

Use the Force, Luke.

Ok==20833== Thread 4:
==20833== Jump to the invalid address stated on the next line
==20833== at 0xC63BB29: ???
==20833== by 0xC639417: ???
==20833== by 0x3: ???

So. That doesn't tell us what is the cause but it tells us:

The error probably isn't occurring inside PHP. The stack trace is far too
small for that, and most PHP functions would usually have 'some'
information, as well as be a far larger stack.
It appears to be happening in a thread that isn't number 1.....which implies
there are more that one thread active.

Yeah, but I set/export MAGICK_THREAD_LIMIT=1 (without that, most of
the tests segfault :) I am reading some bugs that imply the OpenMP
that comes with Ubuntu sometimes has issues still even so. I noticed,
that in one of the valgrind verbose logs, the address implied libgomp
was unmapped and where the error happened.

You could try editing the policy.xml file for ImageMagick which should be
somewhere in /usr or /etc and set the maximum number of threads to be 1
with:

Yep, has no impact on these three failing cases.

Or recompiling ImageMagick with --disable-openmp to disable using threads -
although it might work for some people it seems to not be entirely stable,
which is why I recommend disabling it
https://github.com/mkoppanen/imagick#openmp

Yep, that's where I learned about policy.xml originally and then the
env variable. I am going to try recompiling imagemagick now just to
see.

Also, are you testing against a PHP version that has ZTS (aka ZendThreadSafe
aka support for threads) enabled?

7.0.3-9 but not ZTS (I don't believe it has been built yet for Debian/Ubuntu).

If that doesn't provide any information....I apparently have a test program
at https://github.com/Danack/imagemagicktest/blob/master/channel.c

I'm not sure the makefile for that is going to work on anyone elses system,
but it might! doing make channel should produce a test program that can be
run through ./channel if you tweak the program to point it to an image.....

Ok, I'll try that too. Thanks for all your help!

-Nish

@nacc
Copy link
Author

nacc commented Mar 3, 2016

Just as an FYI -- I found that upstream ImageMagick (ImageMagick-6 branch) didn't segfault, so I'm now bisecting from roughly the point where Debian has forked to see if a single commit fixed the issue. I know it's not ideal to backport it given the version string dependence, but would like to narrow it down.

@nacc
Copy link
Author

nacc commented Mar 4, 2016

ImageMagick/ImageMagick@a54fe0e was the missing commit in Ubuntu's imagemagick (and Debian's). Test pass now. Thanks for your help & patience!

@nacc nacc closed this as completed Mar 4, 2016
@Danack
Copy link
Collaborator

Danack commented Mar 4, 2016

No problem.

FYI since I took over maintaining Imagick, a significant portion of my time on it is just chasing memory issues down and persuading the ImageMagick guys to fix them...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants