Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock ordering issue; deadlock in malloc()/Perl_atfork_lock() #13687

Open
p5pRT opened this issue Mar 22, 2014 · 21 comments
Open

Lock ordering issue; deadlock in malloc()/Perl_atfork_lock() #13687

p5pRT opened this issue Mar 22, 2014 · 21 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Mar 22, 2014

Migrated from rt.perl.org#121490 (status was 'open')

Searchable as RT121490$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 22, 2014

From prumpf@gmail.com

Created by prumpf@gmail.com

This is a bug report for perl from prumpf@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.19.11.

-----------------------------------------------------------------

Hi! I've run into a deadlock situation with the current git versions
of perl (5.19.11) and glibc (2.19), on x86_64-pc-linux-gnu with
ithreads and MY_MALLOC, though I've run into it with other setups
(recent Debian versions of Perl and glibc, no MY_MALLOC) as well. I
believe I've been able to track down the issue and come up with a
workaround, although I've not yet found the time to come up with a small
reproducible test case. Please feel free to ask me for one if it's
absolutely required, though, or ask for other information, and I'll do my
best.

In summary, the problem is inconsistent lock ordering between Perl's
PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation
arises when one thread tries to fork() at the same time that another
thread calls malloc().

Perl runs pthread_atfork before the first malloc() makes glibc install
its atfork handlers, so fork() calls ptmalloc_lock_all() first, then
Perl_atfork_lock(). That means locking glibc's list_lock first, then
PL_malloc_mutex. (pthread_atfork() has LIFO semantics)

However, Perl's malloc implementation locks PL_malloc_mutex first,
then (sometimes) runs out of memory and calls the real malloc(), which
tries to lock list_lock. We thus have a race condition and a deadlock,
which I've seen in practice.

I believe this is fundamentally a glibc bug​: its implementation of
pthread_atfork() behaves erratically depending on whether malloc() is
first called before or after pthread_atfork(). However, since the
broken versions of glibc are out there and multiplying, we should also
work around the issue in Perl itself.

The workaround should be as easy as including an extra
PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK,
but gcc has started "optimizing" such (otherwise) useless calls. I've
found a deliberately duplicate call to perl_alloc() works, but that's
both a one-time memory leak and horribly ugly, and most likely breaks
whatever code uses PL_do_undump.

Nevertheless, I'll include it here, because most of the work was
probably in tracking down the bug, and fixing it should be easier,
even if I cannot presently think of a good fix.

diff --git a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
index 730c565..a8092bf 100644
--- a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
+++ b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
@​@​ -129,6 +129,19 @​@​ main(int argc, char **argv, char **env)
  * call PTHREAD_ATFORK() explicitly, but if and only if it hasn't
  * been called at least once before in the current process.
  * --GSAR 2001-07-20 */
+ /* There's a nasty race condition with the current versions of Perl and
+ * glibc​: the call to PTHREAD_ATFORK in Perl's main() might be reached
+ * before the first malloc happens, in which
+ * case fork() locks malloc/arena.c's list_lock first, then tries to
lock
+ * PL_malloc_lock; another thread might have locked PL_malloc_lock
first,
+ * then tries to lock list_lock, resulting in a deadlock.
+ *
+ * A proper fix would be in glibc, ensuring that ptmalloc_init() is
called
+ * earlier, but a workaround is to make a malloc call ourselves. */
+ /* This leaks memory, but works. */
+ (void)perl_alloc();
+ /* This doesn't leak memory, but is optimized away by gcc */
+ PerlMem_free(PerlMem_malloc(1024));
  PTHREAD_ATFORK(Perl_atfork_lock,
  Perl_atfork_unlock,
  Perl_atfork_unlock);

Perl Info

Flags:
    category=core
    severity=medium

Site configuration information for perl 5.19.11:

Configured by pip at Sat Mar 22 10:40:51 UTC 2014.

Summary of my perl5 (revision 5 version 19 subversion 11) configuration:
  Derived from: b51c3e77dbb7e510319342a73163b3fbb59baf5a
  Platform:
    osname=linux, osvers=3.12-1-amd64, archname=x86_64-linux-thread-multi
    uname='linux philadelphia 3.12-1-amd64 #1 smp debian 3.12.8-1
(2014-01-19) x86_64 gnulinux '
    config_args='-er'
    hint=previous, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe
-fstack-protector -I/usr/local/include -D_REENTRANT -D_GNU_SOURCE
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_REENTRANT -D_GNU_SOURCE
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DPERL_POISON -D_REENTRANT
-D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    ccversion='', gccversion='4.8.2', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/local/lib
/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
/usr/include/x86_64-linux-gnu /usr/lib
    libs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=libc-2.17.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.18'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib
-fstack-protector'

Locally applied patches:
    uncommitted-changes


@INC for perl 5.19.11:
    /usr/local/lib/perl5/site_perl/5.19.10/x86_64-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.19.10
    /usr/local/lib/perl5/5.19.10/x86_64-linux-thread-multi
    /usr/local/lib/perl5/5.19.10
    .


Environment for perl 5.19.11:
    HOME=/home/pip
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LC_ALL=en_US.utf8
    LC_CTYPE=
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin
    PERL_BADLANG (unset)
    SHELL=/bin/zsh-beta

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 22, 2014

From prumpf@gmail.com

perl-deadlock-workaround.diff
diff --git a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
index 730c565..a8092bf 100644
--- a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
+++ b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
@@ -129,6 +129,19 @@ main(int argc, char **argv, char **env)
      * call PTHREAD_ATFORK() explicitly, but if and only if it hasn't
      * been called at least once before in the current process.
      * --GSAR 2001-07-20 */
+    /* There's a nasty race condition with the current versions of Perl and
+     * glibc: the call to PTHREAD_ATFORK in Perl's main() might be reached
+     * before the first malloc happens, in which
+     * case fork() locks malloc/arena.c's list_lock first, then tries to lock
+     * PL_malloc_lock; another thread might have locked PL_malloc_lock first,
+     * then tries to lock list_lock, resulting in a deadlock.
+     *
+     * A proper fix would be in glibc, ensuring that ptmalloc_init() is called
+     * earlier, but a workaround is to make a malloc call ourselves. */
+    /* This leaks memory, but works. */
+    (void)perl_alloc();
+    /* This doesn't leak memory, but is optimized away by gcc */
+    PerlMem_free(PerlMem_malloc(1024));
     PTHREAD_ATFORK(Perl_atfork_lock,
                    Perl_atfork_unlock,
                    Perl_atfork_unlock);
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 26, 2014

From @tonycoz

On Sat Mar 22 09​:53​:21 2014, prumpf@​gmail.com wrote​:

Hi! I've run into a deadlock situation with the current git versions
of perl (5.19.11) and glibc (2.19), on x86_64-pc-linux-gnu with
ithreads and MY_MALLOC, though I've run into it with other setups
(recent Debian versions of Perl and glibc, no MY_MALLOC) as well. I
believe I've been able to track down the issue and come up with a
workaround, although I've not yet found the time to come up with a small
reproducible test case. Please feel free to ask me for one if it's
absolutely required, though, or ask for other information, and I'll do my
best.

Have you reported the glibc part of the problem to your vendor (Debian?)

Since this seems to be a glibc specific issue, I wonder if there's a glibc specific way of forcing initialization.

In any case, the workaround would need to be protected by #ifdef __GLIBC__

https://bugzilla.redhat.com/show_bug.cgi?id=906468

seems like a different but related issue, unfortunately his post to the glibc mailing list​:

https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html

seems to have been ignored.

Tony

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 26, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 26, 2014

From @Leont

On Sat, Mar 22, 2014 at 5​:53 PM, Philipp Rumpf <perlbug-followup@​perl.org>wrote​:

Hi! I've run into a deadlock situation with the current git versions
of perl (5.19.11) and glibc (2.19), on x86_64-pc-linux-gnu with
ithreads and MY_MALLOC, though I've run into it with other setups
(recent Debian versions of Perl and glibc, no MY_MALLOC) as well. I
believe I've been able to track down the issue and come up with a
workaround, although I've not yet found the time to come up with a small
reproducible test case. Please feel free to ask me for one if it's
absolutely required, though, or ask for other information, and I'll do my
best.

In summary, the problem is inconsistent lock ordering between Perl's
PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation
arises when one thread tries to fork() at the same time that another
thread calls malloc().

Perl runs pthread_atfork before the first malloc() makes glibc install
its atfork handlers, so fork() calls ptmalloc_lock_all() first, then
Perl_atfork_lock(). That means locking glibc's list_lock first, then
PL_malloc_mutex. (pthread_atfork() has LIFO semantics)

However, Perl's malloc implementation locks PL_malloc_mutex first,
then (sometimes) runs out of memory and calls the real malloc(), which
tries to lock list_lock. We thus have a race condition and a deadlock,
which I've seen in practice.

I believe this is fundamentally a glibc bug​: its implementation of
pthread_atfork() behaves erratically depending on whether malloc() is
first called before or after pthread_atfork(). However, since the
broken versions of glibc are out there and multiplying, we should also
work around the issue in Perl itself.

The workaround should be as easy as including an extra
PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK,
but gcc has started "optimizing" such (otherwise) useless calls. I've
found a deliberately duplicate call to perl_alloc() works, but that's
both a one-time memory leak and horribly ugly, and most likely breaks
whatever code uses PL_do_undump.

Nevertheless, I'll include it here, because most of the work was
probably in tracking down the bug, and fixing it should be easier,
even if I cannot presently think of a good fix.

This doesn't make sense. Perl's malloc should only use the system's malloc
if both USE_PERL_SBRK and PERL_SBRK_VIA_MALLOC are set, which is not that
likely. I'm not sure what's going on here exactly.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 29, 2014

From @Leont

On Sat, Mar 29, 2014 at 3​:46 PM, Philipp Rumpf <prumpf@​gmail.com> wrote​:

Hello,
I tried responding via the perlbug system, but that appears to be broken.
Thank you for your responses so far!

As a reminder, the bug is specific to glibc/nptl-based systems with
ithreads, such as x86_64-pc-linux-gnu.

I've reported the issue on the glibc bugzilla after verifying it's not
Debian-specific.

Here's a much simpler fix/workaround, to metaconfig, that we can use until
fixed glibcs start appearing​:

---------------------------------------
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@​@​ -5,7 +5,7 @​@​
?RCS​: You may distribute under the terms of either the GNU General Public
?RCS​: License or the Artistic License, as specified in the README file.
?RCS​:
-?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar
+?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar d_gnulibc
?MAKE​: -pick add $@​ %<
?S​:d_pthread_atfork​:
?S​: This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@​@​ -37,6 +37,12 @​@​ if eval $compile; then
else
val="$undef"
fi
+case "$d_gnulibc" in
+*)
+ echo "Assuming pthread_atfork is broken, since this is glibc."
+ val="$undef"
+ ;;
+esac
case "$usethreads" in
$define)
case "$val" in
-------------------------------------------

And here's a test case for reproducing the bug (Leon was right to point
out that without -DPURIFY, which I had set but forgotten about, it's not
Perl's malloc that calls the real malloc(), but S_more_refcounted_fds.
However, it's the same bug).\

Yet I don't think pretending that at_fork is helpful at all. That will only
create new deadlocks.

This program should terminate (and would probably exhaust file descriptors
without a breakpoint), but by merely setting the right breakpoint and
attempting to continue once it's hit, we can get it to deadlock (after
opening a mere 16 file descriptors).

------------------------------------------
#!/usr/bin/perl
# set a breakpoint in S_more_refcounted_fds before running this

use threads;

async {
my @​fh;

for \(my $i = 0; ; $i\+\+\) \{
open\($fh\[$i\]\, "\</dev/zero"\);
\}

};

sleep(1);
fork();
--------------------------------------

To force the deadlock, set a breakpoint in S_more_refcounted_fds, then
wait for a while (for the sleep(1) to finish) before continuing after the
breakpoint is hit for the second time (the first time will be before the
second thread is spawned).

As you can see in this rather long GDB transcript, the bug is what I
described​: thread 2 is trying to malloc() with perlio_mutex held, thread 1
is trying to fork, is already holding glibc's malloc mutex, and is waiting
on perlio_mutex.

Yes that makes sense. I guess that means your original proposed solution
(calling malloc early if necessary) is warranted.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 30, 2014

From prumpf@gmail.com

Thanks for your response!

On Wed, Mar 26, 2014 at 6​:47 AM, Tony Cook via RT <perlbug-followup@​perl.org

wrote​:

Have you reported the glibc part of the problem to your vendor (Debian?)

I confirmed the problem is present in the git version of glibc and reported
it there​: https://sourceware.org/bugzilla/show_bug.cgi?id=16742 I'll file
a bug against the Debian package if I don't hear from them.

Since this seems to be a glibc specific issue, I wonder if there's a glibc
specific way of forcing initialization.

In any case, the workaround would need to be protected by #ifdef __GLIBC__

How about simply forcing HAS_PTHREAD_ATFORK to undef if __GLIBC__ is
defined? That should be a little cleaner than the malloc workaround, at
least.

Ideally, there would be a test case to determine at configuration time
whether our pthread_atfork() is broken. However, that's a little
unpredictable, even with appropriate sleep() statements, since our system
might be too busy.

Here's what I've come up with, as a patch against metaconfig​:

Inline Patch
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -5,7 +5,7 @@
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc
 ?MAKE: -pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:    This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@@ -37,6 +37,12 @@ if eval $compile; then
 else
     val="$undef"
 fi
+case "$d_gnulibc" in
+*)
+       echo "Assuming pthread_atfork is broken, since this is glibc."
+       val="$undef"
+       ;;
+esac
 case "$usethreads" in
 $define)
         case "$val" in

https://bugzilla.redhat.com/show_bug.cgi?id=906468 > > seems like a different but related issue\, unfortunately his post to the > glibc mailing list​: > > https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html > > seems to have been ignored\. >

I don't fully understand that report; it sounds like malloc_atfork()
shouldn't be performing I/O, but looking at the source it appears not to
be. I suspect that the original bug might have involved pthread_atfork
handlers running in the wrong order, though; maybe fork() should call
_IO_list_lock() before calling ptmalloc_lock_all()?

Anyway, I think that's a different issue, though it's a pity if it hasn't
been fixed.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 30, 2014

From prumpf@gmail.com

metaconfig-broken-pthread_atfork.diff
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -5,7 +5,7 @@
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc
 ?MAKE:	-pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:	This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@@ -37,6 +37,12 @@ if eval $compile; then
 else
     val="$undef"
 fi
+case "$d_gnulibc" in
+*)
+	echo "Assuming pthread_atfork is broken, since this is glibc."
+	val="$undef"
+	;;
+esac
 case "$usethreads" in
 $define)
         case "$val" in
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 30, 2014

From prumpf@gmail.com

Sorry, I hadn't noticed that PURIFY was still set in my configuration,
which does indeed set PERL_SBRK and PERL_SBRK_VIA_MALLOC. My guess is the
issue doesn't appear for MYMALLOC && !PURIFY, but it's still valid (and I'm
pretty sure "what's going on here" is what I've described) for !MYMALLOC
and MYMALLOC && PURIFY.

Hope that helps you make sense of it, and sorry for the confusion.

On Wed, Mar 26, 2014 at 10​:51 AM, Leon Timmermans via RT <
perlbug-followup@​perl.org> wrote​:

On Sat, Mar 22, 2014 at 5​:53 PM, Philipp Rumpf <perlbug-followup@​perl.org

wrote​:

Hi! I've run into a deadlock situation with the current git versions
of perl (5.19.11) and glibc (2.19), on x86_64-pc-linux-gnu with
ithreads and MY_MALLOC, though I've run into it with other setups
(recent Debian versions of Perl and glibc, no MY_MALLOC) as well. I
believe I've been able to track down the issue and come up with a
workaround, although I've not yet found the time to come up with a small
reproducible test case. Please feel free to ask me for one if it's
absolutely required, though, or ask for other information, and I'll do my
best.

In summary, the problem is inconsistent lock ordering between Perl's
PL_malloc_mutex and glibc's malloc/arena.c's list_lock. The situation
arises when one thread tries to fork() at the same time that another
thread calls malloc().

Perl runs pthread_atfork before the first malloc() makes glibc install
its atfork handlers, so fork() calls ptmalloc_lock_all() first, then
Perl_atfork_lock(). That means locking glibc's list_lock first, then
PL_malloc_mutex. (pthread_atfork() has LIFO semantics)

However, Perl's malloc implementation locks PL_malloc_mutex first,
then (sometimes) runs out of memory and calls the real malloc(), which
tries to lock list_lock. We thus have a race condition and a deadlock,
which I've seen in practice.

I believe this is fundamentally a glibc bug​: its implementation of
pthread_atfork() behaves erratically depending on whether malloc() is
first called before or after pthread_atfork(). However, since the
broken versions of glibc are out there and multiplying, we should also
work around the issue in Perl itself.

The workaround should be as easy as including an extra
PerlMem_free(PerlMem_malloc(1024)) call before calling PTHREAD_ATFORK,
but gcc has started "optimizing" such (otherwise) useless calls. I've
found a deliberately duplicate call to perl_alloc() works, but that's
both a one-time memory leak and horribly ugly, and most likely breaks
whatever code uses PL_do_undump.

Nevertheless, I'll include it here, because most of the work was
probably in tracking down the bug, and fixing it should be easier,
even if I cannot presently think of a good fix.

This doesn't make sense. Perl's malloc should only use the system's malloc
if both USE_PERL_SBRK and PERL_SBRK_VIA_MALLOC are set, which is not that
likely. I'm not sure what's going on here exactly.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 31, 2014

From prumpf@gmail.com

Hello,
I tried responding via the perlbug system, but that appears to be broken.
Thank you for your responses so far!

As a reminder, the bug is specific to glibc/nptl-based systems with
ithreads, such as x86_64-pc-linux-gnu.

I've reported the issue on the glibc bugzilla after verifying it's not
Debian-specific.

Here's a much simpler fix/workaround, to metaconfig, that we can use until
fixed glibcs start appearing​:


Inline Patch
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -5,7 +5,7 @@
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc
 ?MAKE:    -pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:    This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@@ -37,6 +37,12 @@ if eval $compile; then
 else
     val="$undef"
 fi
+case "$d_gnulibc" in
+*)
+    echo "Assuming pthread_atfork is broken, since this is glibc."
+    val="$undef"
+    ;;
+esac
 case "$usethreads" in
 $define)
         case "$val" in
-------------------------------------------

And here's a test case for reproducing the bug (Leon was right to point out that without \-DPURIFY\, which I had set but forgotten about\, it's not Perl's malloc that calls the real malloc\(\)\, but S\_more\_refcounted\_fds\. However\, it's the same bug\)\. This program should terminate \(and would probably exhaust file descriptors without a breakpoint\)\, but by merely setting the right breakpoint and attempting to continue once it's hit\, we can get it to deadlock \(after opening a mere 16 file descriptors\)\.

#!/usr/bin/perl
# set a breakpoint in S_more_refcounted_fds before running this

use threads;

async {
  my @​fh;

  for (my $i = 0; ; $i++) {
  open($fh[$i], "</dev/zero");
  }
};

sleep(1);
fork();


To force the deadlock, set a breakpoint in S_more_refcounted_fds, then wait
for a while (for the sleep(1) to finish) before continuing after the
breakpoint is hit for the second time (the first time will be before the
second thread is spawned).

As you can see in this rather long GDB transcript, the bug is what I
described​: thread 2 is trying to malloc() with perlio_mutex held, thread 1
is trying to fork, is already holding glibc's malloc mutex, and is waiting
on perlio_mutex.

Sorry again for the -DPURIFY confusion.

Philipp Rumpf


GDB transcript​:
% gdb --args perl glibc-bug.pl
gdb --args perl glibc-bug.pl
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+​: GNU GPL version 3 or later <http​://gnu.org/licenses/gpl.html

This is free software​: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see​:
<http​://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/perl...Reading symbols from
/usr/lib/debug/usr/bin/perl...done.
done.
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6b3c700 (LWP 19617)]
Perl exited with active threads​:
  1 running and unjoined
  0 finished and unjoined
  0 running and detached
Perl exited with active threads​:
  1 running and unjoined
  0 finished and unjoined
  0 running and detached
[Thread 0x7ffff6b3c700 (LWP 19617) exited]
[Inferior 1 (process 19613) exited normally]
(gdb) b S_more_refcounted_fds
b S_more_refcounted_fds
Breakpoint 1 at 0x7ffff7b83060​: file perlio.c, line 2320.
(gdb) set target-async 1
set target-async 1
(gdb) set non-stop on
set non-stop on
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: no loadable sections found in added symbol-file system-supplied
DSO at 0x7ffff7ffa000
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2372
2372 perlio.c​: No such file or directory.
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
[New Thread 0x7ffff6b3c700 (LWP 19621)]

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
Cannot execute this command while the selected thread is running.
(gdb) i thr
i thr
  Id Target Id Frame
  2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc
(fd=16)
  at perlio.c​:2372
* 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running)
(gdb) thr 2
thr 2
[Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))]
#0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) c
c
Continuing.
  C-c C-c^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private ()
  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such file
or directory.
(gdb) interrupt -a
interrupt -a
(gdb)
[Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped.
__lll_lock_wait ()
  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such file
or directory.

(gdb) thr app all bt
thr app all bt

Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))​:
#0 __lll_lock_wait_private ()
  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
#1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c​:5154
#2 0x00007ffff6ffa198 in __GI___libc_realloc (
  oldmem=0x7ffff7321620 <main_arena>, bytes=128) at malloc.c​:2975
#3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0,
new_fd=16)
  at perlio.c​:2334
#4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
#5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0, f=0x6d8710,
  imode=0, fd=<optimized out>) at perlio.c​:2655
#6 PerlIOUnix_open (my_perl=0x6bfef0, self=0x7ffff7ddc820 <PerlIO_unix>,
  layers=0x6d84b0, n=0, mode=0x7ffff6b3ba70 "r", fd=<optimized out>,
  imode=0, perm=438, f=0x6d8710, narg=1, args=0x7ffff6b3ba68)
  at perlio.c​:2736
#7 0x00007ffff7b82c06 in PerlIOBuf_open (my_perl=0x6bfef0,
  self=0x7ffff7ddc660 <PerlIO_perlio>, layers=0x6d84b0, n=1,
  mode=0x7ffff6b3ba70 "r", fd=-1, imode=0, perm=0, f=0x0, narg=1,
  args=0x7ffff6b3ba68) at perlio.c​:3862
#8 0x00007ffff7b84b2b in PerlIO_openn (my_perl=my_perl@​entry=0x6bfef0,
  layers=layers@​entry=0x0, mode=mode@​entry=0x7ffff6b3ba70 "r",
  fd=fd@​entry=-1, imode=imode@​entry=0, perm=perm@​entry=0, f=f@​entry=0x0,
  narg=narg@​entry=1, args=args@​entry=0x7ffff6b3ba68) at perlio.c​:1648
#9 0x00007ffff7b5d83e in Perl_do_openn (my_perl=my_perl@​entry=0x6bfef0,
  gv=gv@​entry=0x7362f8, oname=0x724830 "</dev/zero", len=<optimized out>,
  as_raw=as_raw@​entry=0, rawmode=rawmode@​entry=0, rawperm=rawperm@​entry=0,
  supplied_fp=supplied_fp@​entry=0x0, svp=0x7ffff6b3ba68, num_svs=1,
  num_svs@​entry=0) at doio.c​:453
#10 0x00007ffff7b4c36e in Perl_pp_open (my_perl=0x6bfef0) at pp_sys.c​:640
#11 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x6bfef0) at
run.c​:42
#12 0x00007ffff7a96930 in Perl_call_sv (my_perl=my_perl@​entry=0x6bfef0,
  sv=0x736058, flags=<optimized out>) at perl.c​:2766
#13 0x00007ffff6b43589 in S_ithread_run (arg=0x630020) at threads.xs​:517
#14 0x00007ffff732f062 in start_thread (arg=0x7ffff6b3c700)
  at pthread_create.c​:312
#15 0x00007ffff7063a3d in clone ()
  at ../sysdeps/unix/sysv/linux/x86_64/clone.S​:111

Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619))​:
#0 __lll_lock_wait ()
  at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
#1 0x00007ffff7331467 in _L_lock_913 ()
  from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff7331290 in __GI___pthread_mutex_lock (
  mutex=0x7ffff7ddce20 <PL_perlio_mutex>) at
../nptl/pthread_mutex_lock.c​:79
#3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c​:2811
#4 0x00007ffff7035122 in __libc_fork ()
  at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c​:95
#5 0x00007ffff7338305 in __fork ()
  at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c​:25
#6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c​:2849
#7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at pp_sys.c​:4022
#8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at
run.c​:42
#9 0x00007ffff7a9dce4 in S_run_body (oldscope=1, my_perl=0x603010)
  at perl.c​:2467
#10 perl_run (my_perl=0x603010) at perl.c​:2383
#11 0x0000000000400e19 in main (argc=2, argv=0x7fffffffeaf8,
  env=0x7fffffffeb10) at perlmain.c​:114


@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 31, 2014

From prumpf@gmail.com

metaconfig-broken-pthread_atfork.diff
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -5,7 +5,7 @@
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar d_gnulibc
 ?MAKE:	-pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:	This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@@ -37,6 +37,12 @@ if eval $compile; then
 else
     val="$undef"
 fi
+case "$d_gnulibc" in
+*)
+	echo "Assuming pthread_atfork is broken, since this is glibc."
+	val="$undef"
+	;;
+esac
 case "$usethreads" in
 $define)
         case "$val" in
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 31, 2014

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 31, 2014

From @Tux

On Sat, 29 Mar 2014 14​:46​:08 +0000, Philipp Rumpf <prumpf@​gmail.com>
wrote​:

Hello,
I tried responding via the perlbug system, but that appears to be broken.
Thank you for your responses so far!

As a reminder, the bug is specific to glibc/nptl-based systems with
ithreads, such as x86_64-pc-linux-gnu.

I admire the fact that this is a genuine patch to the meta-system, but
looking at the scope, I wonder if it better is located in hints/linux.sh

I've reported the issue on the glibc bugzilla after verifying it's not
Debian-specific.

Here's a much simpler fix/workaround, to metaconfig, that we can use until
fixed glibcs start appearing​:

---------------------------------------
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@​@​ -5,7 +5,7 @​@​
?RCS​: You may distribute under the terms of either the GNU General Public
?RCS​: License or the Artistic License, as specified in the README file.
?RCS​:
-?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar
+?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar d_gnulibc
?MAKE​: -pick add $@​ %<
?S​:d_pthread_atfork​:
?S​: This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@​@​ -37,6 +37,12 @​@​ if eval $compile; then
else
val="$undef"
fi
+case "$d_gnulibc" in
+*)
+ echo "Assuming pthread_atfork is broken, since this is glibc."
+ val="$undef"
+ ;;
+esac
case "$usethreads" in
$define)
case "$val" in
-------------------------------------------

And here's a test case for reproducing the bug (Leon was right to point out
that without -DPURIFY, which I had set but forgotten about, it's not Perl's
malloc that calls the real malloc(), but S_more_refcounted_fds. However,
it's the same bug). This program should terminate (and would probably
exhaust file descriptors without a breakpoint), but by merely setting the
right breakpoint and attempting to continue once it's hit, we can get it to
deadlock (after opening a mere 16 file descriptors).

------------------------------------------
#!/usr/bin/perl
# set a breakpoint in S_more_refcounted_fds before running this

use threads;

async {
my @​fh;

for \(my $i = 0; ; $i\+\+\) \{
open\($fh\[$i\]\, "\</dev/zero"\);
\}

};

sleep(1);
fork();
--------------------------------------

To force the deadlock, set a breakpoint in S_more_refcounted_fds, then wait
for a while (for the sleep(1) to finish) before continuing after the
breakpoint is hit for the second time (the first time will be before the
second thread is spawned).

As you can see in this rather long GDB transcript, the bug is what I
described​: thread 2 is trying to malloc() with perlio_mutex held, thread 1
is trying to fork, is already holding glibc's malloc mutex, and is waiting
on perlio_mutex.

Sorry again for the -DPURIFY confusion.

Philipp Rumpf

--------------------------------------
GDB transcript​:
% gdb --args perl glibc-bug.pl
gdb --args perl glibc-bug.pl
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+​: GNU GPL version 3 or later <http​://gnu.org/licenses/gpl.html

This is free software​: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see​:
<http​://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/perl...Reading symbols from
/usr/lib/debug/usr/bin/perl...done.
done.
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6b3c700 (LWP 19617)]
Perl exited with active threads​:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads​:
1 running and unjoined
0 finished and unjoined
0 running and detached
[Thread 0x7ffff6b3c700 (LWP 19617) exited]
[Inferior 1 (process 19613) exited normally]
(gdb) b S_more_refcounted_fds
b S_more_refcounted_fds
Breakpoint 1 at 0x7ffff7b83060​: file perlio.c, line 2320.
(gdb) set target-async 1
set target-async 1
(gdb) set non-stop on
set non-stop on
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: no loadable sections found in added symbol-file system-supplied
DSO at 0x7ffff7ffa000
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2372
2372 perlio.c​: No such file or directory.
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
[New Thread 0x7ffff6b3c700 (LWP 19621)]

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
Cannot execute this command while the selected thread is running.
(gdb) i thr
i thr
Id Target Id Frame
2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc
(fd=16)
at perlio.c​:2372
* 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running)
(gdb) thr 2
thr 2
[Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))]
#0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) c
c
Continuing.
C-c C-c^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such file
or directory.
(gdb) interrupt -a
interrupt -a
(gdb)
[Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped.
__lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such file
or directory.

(gdb) thr app all bt
thr app all bt

Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))​:
#0 __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
#1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c​:5154
#2 0x00007ffff6ffa198 in __GI___libc_realloc (
oldmem=0x7ffff7321620 <main_arena>, bytes=128) at malloc.c​:2975
#3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0,
new_fd=16)
at perlio.c​:2334
#4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
#5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0, f=0x6d8710,
imode=0, fd=<optimized out>) at perlio.c​:2655
#6 PerlIOUnix_open (my_perl=0x6bfef0, self=0x7ffff7ddc820 <PerlIO_unix>,
layers=0x6d84b0, n=0, mode=0x7ffff6b3ba70 "r", fd=<optimized out>,
imode=0, perm=438, f=0x6d8710, narg=1, args=0x7ffff6b3ba68)
at perlio.c​:2736
#7 0x00007ffff7b82c06 in PerlIOBuf_open (my_perl=0x6bfef0,
self=0x7ffff7ddc660 <PerlIO_perlio>, layers=0x6d84b0, n=1,
mode=0x7ffff6b3ba70 "r", fd=-1, imode=0, perm=0, f=0x0, narg=1,
args=0x7ffff6b3ba68) at perlio.c​:3862
#8 0x00007ffff7b84b2b in PerlIO_openn (my_perl=my_perl@​entry=0x6bfef0,
layers=layers@​entry=0x0, mode=mode@​entry=0x7ffff6b3ba70 "r",
fd=fd@​entry=-1, imode=imode@​entry=0, perm=perm@​entry=0, f=f@​entry=0x0,
narg=narg@​entry=1, args=args@​entry=0x7ffff6b3ba68) at perlio.c​:1648
#9 0x00007ffff7b5d83e in Perl_do_openn (my_perl=my_perl@​entry=0x6bfef0,
gv=gv@​entry=0x7362f8, oname=0x724830 "</dev/zero", len=<optimized out>,
as_raw=as_raw@​entry=0, rawmode=rawmode@​entry=0, rawperm=rawperm@​entry=0,
supplied_fp=supplied_fp@​entry=0x0, svp=0x7ffff6b3ba68, num_svs=1,
num_svs@​entry=0) at doio.c​:453
#10 0x00007ffff7b4c36e in Perl_pp_open (my_perl=0x6bfef0) at pp_sys.c​:640
#11 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x6bfef0) at
run.c​:42
#12 0x00007ffff7a96930 in Perl_call_sv (my_perl=my_perl@​entry=0x6bfef0,
sv=0x736058, flags=<optimized out>) at perl.c​:2766
#13 0x00007ffff6b43589 in S_ithread_run (arg=0x630020) at threads.xs​:517
#14 0x00007ffff732f062 in start_thread (arg=0x7ffff6b3c700)
at pthread_create.c​:312
#15 0x00007ffff7063a3d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S​:111

Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619))​:
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
#1 0x00007ffff7331467 in _L_lock_913 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff7331290 in __GI___pthread_mutex_lock (
mutex=0x7ffff7ddce20 <PL_perlio_mutex>) at
../nptl/pthread_mutex_lock.c​:79
#3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c​:2811
#4 0x00007ffff7035122 in __libc_fork ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c​:95
#5 0x00007ffff7338305 in __fork ()
at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c​:25
#6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c​:2849
#7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at pp_sys.c​:4022
#8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at
run.c​:42
#9 0x00007ffff7a9dce4 in S_run_body (oldscope=1, my_perl=0x603010)
at perl.c​:2467
#10 perl_run (my_perl=0x603010) at perl.c​:2383
#11 0x0000000000400e19 in main (argc=2, argv=0x7fffffffeaf8,
env=0x7fffffffeb10) at perlmain.c​:114
----------------------------------------------

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 3, 2014

From prumpf@gmail.com

On Mon, Mar 31, 2014 at 6​:28 AM, H. Merijn Brand via RT <
perlbug-followup@​perl.org> wrote​:

I admire the fact that this is a genuine patch to the meta-system, but
looking at the scope, I wonder if it better is located in hints/linux.sh

I don't know. The build system is a bit of a mystery to me (I'm not sure,
but I think the first patch was broken in the non-glibc case).

There are four options here​: put the test in metaconfig or the hints file,
and use version number testing or a test program. Testing by version
numbers seems to be discouraged, and while I have a test program, the only
easy way to tell whether it deadlocked is to wait for a timeout. I'm
paranoid about that reporting false failures on very busy systems with
fixed glibcs. In the failure case, it also incurs a delay on the build
system while it waits for the timeout—I chose two seconds, we could
probably get away with one second.

I'd argue that the code with the test program might well go into
metaconfig​: pthread_atfork() is broken for all users, not just Perl. The
test isn't specific to glibc or linux—it should work on all POSIX systems,
and if it fails on a non-glibc system we definitely don't want to use
pthread_atfork() there.

So I've attached the two test-program-based versions, as patches to
metaconfig and perl. Either one appears to work, and installing both also
appears to work.

Philipp

I've reported the issue on the glibc bugzilla after verifying it's not
Debian-specific.

Here's a much simpler fix/workaround, to metaconfig, that we can use
until
fixed glibcs start appearing​:

---------------------------------------
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..9f0332a 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@​@​ -5,7 +5,7 @​@​
?RCS​: You may distribute under the terms of either the GNU General
Public
?RCS​: License or the Artistic License, as specified in the README file.
?RCS​:
-?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar
+?MAKE​:d_pthread_atfork​: Inlibc cat Compile usethreads Setvar d_gnulibc
?MAKE​: -pick add $@​ %<
?S​:d_pthread_atfork​:
?S​: This variable conditionally defines the HAS_PTHREAD_ATFORK
symbol,
@​@​ -37,6 +37,12 @​@​ if eval $compile; then
else
val="$undef"
fi
+case "$d_gnulibc" in
+*)
+ echo "Assuming pthread_atfork is broken, since this is glibc."
+ val="$undef"
+ ;;
+esac
case "$usethreads" in
$define)
case "$val" in
-------------------------------------------

And here's a test case for reproducing the bug (Leon was right to point
out
that without -DPURIFY, which I had set but forgotten about, it's not
Perl's
malloc that calls the real malloc(), but S_more_refcounted_fds. However,
it's the same bug). This program should terminate (and would probably
exhaust file descriptors without a breakpoint), but by merely setting the
right breakpoint and attempting to continue once it's hit, we can get it
to
deadlock (after opening a mere 16 file descriptors).

------------------------------------------
#!/usr/bin/perl
# set a breakpoint in S_more_refcounted_fds before running this

use threads;

async {
my @​fh;

for \(my $i = 0; ; $i\+\+\) \{
open\($fh\[$i\]\, "\</dev/zero"\);
\}

};

sleep(1);
fork();
--------------------------------------

To force the deadlock, set a breakpoint in S_more_refcounted_fds, then
wait
for a while (for the sleep(1) to finish) before continuing after the
breakpoint is hit for the second time (the first time will be before the
second thread is spawned).

As you can see in this rather long GDB transcript, the bug is what I
described​: thread 2 is trying to malloc() with perlio_mutex held, thread
1
is trying to fork, is already holding glibc's malloc mutex, and is
waiting
on perlio_mutex.

Sorry again for the -DPURIFY confusion.

Philipp Rumpf

--------------------------------------
GDB transcript​:
% gdb --args perl glibc-bug.pl
gdb --args perl glibc-bug.pl
GNU gdb (GDB) 7.6.2 (Debian 7.6.2-1)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+​: GNU GPL version 3 or later <
http​://gnu.org/licenses/gpl.html

This is free software​: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see​:
<http​://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/perl...Reading symbols from
/usr/lib/debug/usr/bin/perl...done.
done.
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6b3c700 (LWP 19617)]
Perl exited with active threads​:
1 running and unjoined
0 finished and unjoined
0 running and detached
Perl exited with active threads​:
1 running and unjoined
0 finished and unjoined
0 running and detached
[Thread 0x7ffff6b3c700 (LWP 19617) exited]
[Inferior 1 (process 19613) exited normally]
(gdb) b S_more_refcounted_fds
b S_more_refcounted_fds
Breakpoint 1 at 0x7ffff7b83060​: file perlio.c, line 2320.
(gdb) set target-async 1
set target-async 1
(gdb) set non-stop on
set non-stop on
(gdb) r
r
Starting program​: /usr/bin/perl glibc-bug.pl
warning​: no loadable sections found in added symbol-file system-supplied
DSO at 0x7ffff7ffa000
warning​: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=0) at perlio.c​:2372
2372 perlio.c​: No such file or directory.
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
[New Thread 0x7ffff6b3c700 (LWP 19621)]

Breakpoint 1, PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) shell sleep 5
shell sleep 5
(gdb) c
c
Continuing.
Cannot execute this command while the selected thread is running.
(gdb) i thr
i thr
Id Target Id Frame
2 Thread 0x7ffff6b3c700 (LWP 19621) "perl" PerlIOUnix_refcnt_inc
(fd=16)
at perlio.c​:2372
* 1 Thread 0x7ffff7fd3700 (LWP 19619) "perl" (running)
(gdb) thr 2
thr 2
[Switching to thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))]
#0 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
2372 in perlio.c
(gdb) c
c
Continuing.
C-c C-c^C
Program received signal SIGINT, Interrupt.
__lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
95 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such file
or directory.
(gdb) interrupt -a
interrupt -a
(gdb)
[Thread 0x7ffff7fd3700 (LWP 19619)] #1 stopped.
__lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
135 ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​: No such
file
or directory.

(gdb) thr app all bt
thr app all bt

Thread 2 (Thread 0x7ffff6b3c700 (LWP 19621))​:
#0 __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:95
#1 0x00007ffff6ffc527 in _L_lock_10982 () at malloc.c​:5154
#2 0x00007ffff6ffa198 in __GI___libc_realloc (
oldmem=0x7ffff7321620 <main_arena>, bytes=128) at malloc.c​:2975
#3 0x00007ffff7b83098 in S_more_refcounted_fds (my_perl=0x6bfef0,
new_fd=16)
at perlio.c​:2334
#4 PerlIOUnix_refcnt_inc (fd=16) at perlio.c​:2372
#5 0x00007ffff7b839c4 in PerlIOUnix_setfd (my_perl=0x6bfef0, f=0x6d8710,
imode=0, fd=<optimized out>) at perlio.c​:2655
#6 PerlIOUnix_open (my_perl=0x6bfef0, self=0x7ffff7ddc820 <PerlIO_unix>,
layers=0x6d84b0, n=0, mode=0x7ffff6b3ba70 "r", fd=<optimized out>,
imode=0, perm=438, f=0x6d8710, narg=1, args=0x7ffff6b3ba68)
at perlio.c​:2736
#7 0x00007ffff7b82c06 in PerlIOBuf_open (my_perl=0x6bfef0,
self=0x7ffff7ddc660 <PerlIO_perlio>, layers=0x6d84b0, n=1,
mode=0x7ffff6b3ba70 "r", fd=-1, imode=0, perm=0, f=0x0, narg=1,
args=0x7ffff6b3ba68) at perlio.c​:3862
#8 0x00007ffff7b84b2b in PerlIO_openn (my_perl=my_perl@​entry=0x6bfef0,
layers=layers@​entry=0x0, mode=mode@​entry=0x7ffff6b3ba70 "r",
fd=fd@​entry=-1, imode=imode@​entry=0, perm=perm@​entry=0, f=f@​entry
=0x0,
narg=narg@​entry=1, args=args@​entry=0x7ffff6b3ba68) at perlio.c​:1648
#9 0x00007ffff7b5d83e in Perl_do_openn (my_perl=my_perl@​entry=0x6bfef0,
gv=gv@​entry=0x7362f8, oname=0x724830 "</dev/zero", len=<optimized
out>,
as_raw=as_raw@​entry=0, rawmode=rawmode@​entry=0,
rawperm=rawperm@​entry=0,
supplied_fp=supplied_fp@​entry=0x0, svp=0x7ffff6b3ba68, num_svs=1,
num_svs@​entry=0) at doio.c​:453
#10 0x00007ffff7b4c36e in Perl_pp_open (my_perl=0x6bfef0) at pp_sys.c​:640
#11 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x6bfef0) at
run.c​:42
#12 0x00007ffff7a96930 in Perl_call_sv (my_perl=my_perl@​entry=0x6bfef0,
sv=0x736058, flags=<optimized out>) at perl.c​:2766
#13 0x00007ffff6b43589 in S_ithread_run (arg=0x630020) at threads.xs​:517
#14 0x00007ffff732f062 in start_thread (arg=0x7ffff6b3c700)
at pthread_create.c​:312
#15 0x00007ffff7063a3d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S​:111

Thread 1 (Thread 0x7ffff7fd3700 (LWP 19619))​:
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S​:135
#1 0x00007ffff7331467 in _L_lock_913 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff7331290 in __GI___pthread_mutex_lock (
mutex=0x7ffff7ddce20 <PL_perlio_mutex>) at
../nptl/pthread_mutex_lock.c​:79
#3 0x00007ffff7ae9c70 in Perl_atfork_lock () at util.c​:2811
#4 0x00007ffff7035122 in __libc_fork ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c​:95
#5 0x00007ffff7338305 in __fork ()
at ../nptl/sysdeps/unix/sysv/linux/pt-fork.c​:25
#6 0x00007ffff7ae9d05 in Perl_my_fork () at util.c​:2849
#7 0x00007ffff7b556bc in Perl_pp_fork (my_perl=0x603010) at
pp_sys.c​:4022
#8 0x00007ffff7b05326 in Perl_runops_standard (my_perl=0x603010) at
run.c​:42
#9 0x00007ffff7a9dce4 in S_run_body (oldscope=1, my_perl=0x603010)
at perl.c​:2467
#10 perl_run (my_perl=0x603010) at perl.c​:2383
#11 0x0000000000400e19 in main (argc=2, argv=0x7fffffffeaf8,
env=0x7fffffffeb10) at perlmain.c​:114
----------------------------------------------

--
H.Merijn Brand http​://tux.nl Perl Monger http​://amsterdam.pm.org/
using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE
http​://mirrors.develooper.com/hpux/ http​://www.test-smoke.org/
http​://qa.perl.org http​://www.goldmark.org/jeff/stupid-disclaimers/

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 3, 2014

From prumpf@gmail.com

perl-hints-002.diff
diff --git a/hints/linux.sh b/hints/linux.sh
index 956adfc..d1e2737 100644
--- a/hints/linux.sh
+++ b/hints/linux.sh
@@ -516,3 +516,104 @@ case "$libdb_needs_pthread" in
     libswanted="$libswanted pthread"
     ;;
 esac
+
+cat >try.c <<'EOM'
+#include <stdio.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+
+int pipe_fd[4];
+
+void lock(void)
+{
+  if (write(pipe_fd[3], "\n", 1) <= 0) {
+    _exit(1);
+  }
+  pthread_mutex_lock(&mutex);
+}
+
+void *lock_then_malloc(void *dummy)
+{
+  char c;
+
+  pthread_mutex_lock(&mutex);
+  if (write(pipe_fd[1], "\n", 1) <= 0) {
+    _exit(1);
+  }
+
+  if (read(pipe_fd[2], &c, 1) <= 0) {
+    _exit(1);
+  }
+  volatile void *throwaway = malloc(1024);
+  pthread_mutex_unlock(&mutex);
+
+  return NULL;
+}
+
+void alarm_handler(int dummy)
+{
+  _exit(1);
+}
+
+struct sigaction sa;
+
+int main(int argc, char **argv)
+{
+  pthread_attr_t attr;
+  pthread_t tid;
+
+  if (pthread_atfork(lock, NULL, NULL)) {
+    return 1;
+  }
+  volatile void *throwaway = malloc(1024);
+
+  if (pipe(pipe_fd)) {
+    return 1;
+  }
+
+  if (pipe(pipe_fd+2)) {
+    return 1;
+  }
+
+  if (pthread_attr_init(&attr)) {
+    return 1;
+  }
+  if (pthread_create(&tid, &attr, lock_then_malloc, NULL)) {
+    return 1;
+  }
+
+  char c;
+  if (read(pipe_fd[0], &c, 1) <= 0) {
+    return 1;
+  }
+
+  sa.sa_handler = alarm_handler;
+  sigemptyset(&sa.sa_mask);
+  if (sigaction(SIGALRM, &sa, NULL)) {
+    return 1;
+  }
+  alarm(2);
+
+  if (fork() < 0)
+    return 1;
+
+  return 0;
+}
+EOM
+
+if ${cc:-gcc} $ccflags $ldflags try.c -lpthread >/dev/null 2>&1 && $run ./a.out; then
+    cat <<'EOM' >&4
+
+You appear to have a working pthread_atfork().
+EOM
+else
+    cat <<'EOM' >&4
+
+Your pthread_atfork() might be broken, not using it.
+EOM
+    d_pthread_atfork='undef'
+fi
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 3, 2014

From prumpf@gmail.com

metaconfig-pthread-002.diff
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..2a84eac 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -1,11 +1,13 @@
 ?RCS: $Id$
 ?RCS:
 ?RCS: Copyright (c) 2001 Jarkko Hietaniemi
+?RCS: Parts taken from d_pthreadj.U, which is:
+?RCS:   Copyright (c) 1998 Andy Dougherty
 ?RCS:
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork: Inlibc cat Compile Setvar run rm
 ?MAKE:	-pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:	This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
@@ -19,30 +21,112 @@
 ?H:#$d_pthread_atfork HAS_PTHREAD_ATFORK		/**/
 ?H:.
 ?LINT:set d_pthread_atfork
-: see whether the pthread_atfork exists
-$cat >try.c <<EOP
-#include <pthread.h>
+?T:yyy
+?F:!try
+: see whether pthread_atfork exists and works
+echo "Checking whether pthread_atfork is usable..." >&4
+$cat >try.c <<'EOP'
 #include <stdio.h>
-int main() {
-#ifdef  PTHREAD_ATFORK
-        pthread_atfork(NULL,NULL,NULL);
-#endif
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+
+int pipe_fd[4];
+
+void lock(void)
+{
+  if (write(pipe_fd[3], "\n", 1) <= 0) {
+    _exit(1);
+  }
+  pthread_mutex_lock(&mutex);
+}
+
+void *lock_then_malloc(void *dummy)
+{
+  char c;
+
+  pthread_mutex_lock(&mutex);
+  if (write(pipe_fd[1], "\n", 1) <= 0) {
+    _exit(1);
+  }
+
+  if (read(pipe_fd[2], &c, 1) <= 0) {
+    _exit(1);
+  }
+  volatile void *throwaway = malloc(1024);
+  pthread_mutex_unlock(&mutex);
+
+  return NULL;
+}
+
+void alarm_handler(int dummy)
+{
+  _exit(1);
+}
+
+struct sigaction sa;
+
+int main(int argc, char **argv)
+{
+  pthread_attr_t attr;
+  pthread_t tid;
+
+  if (pthread_atfork(lock, NULL, NULL)) {
+    return 1;
+  }
+  volatile void *throwaway = malloc(1024);
+
+  if (pipe(pipe_fd)) {
+    return 1;
+  }
+
+  if (pipe(pipe_fd+2)) {
+    return 1;
+  }
+
+  if (pthread_attr_init(&attr)) {
+    return 1;
+  }
+  if (pthread_create(&tid, &attr, lock_then_malloc, NULL)) {
+    return 1;
+  }
+
+  char c;
+  if (read(pipe_fd[0], &c, 1) <= 0) {
+    return 1;
+  }
+
+  sa.sa_handler = alarm_handler;
+  sigemptyset(&sa.sa_mask);
+  if (sigaction(SIGALRM, &sa, NULL)) {
+    return 1;
+  }
+  alarm(2);
+
+  int ret = fork();
+  if (ret < 0)
+    return 1;
+
+  if (ret == 0)
+    printf("success\n");
+  return 0;
 }
 EOP
 
-: see if pthread_atfork exists
-set try -DPTHREAD_ATFORK
+: see if pthread_atfork exists and works
+set try
 if eval $compile; then
-    val="$define"
+    yyy=`$run ./try`
 else
     val="$undef"
 fi
-case "$usethreads" in
-$define)
-        case "$val" in
-        $define) echo 'pthread_atfork found.' >&4        ;;
-        *)       echo 'pthread_atfork NOT found.' >&4    ;;
-        esac
+$rm -f try try.*
+case "$yyy" in
+     success) echo "It does work." >&4; val="$define" ;;
+     *) echo "Doesn't work." >&4; val="$undef" ;;
 esac
 set d_pthread_atfork
 eval $setvar
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 3, 2014

From @Leont

On Tue, Apr 1, 2014 at 5​:12 PM, Philipp Rumpf <prumpf@​gmail.com> wrote​:

On Mon, Mar 31, 2014 at 6​:28 AM, H. Merijn Brand via RT <
perlbug-followup@​perl.org> wrote​:

I admire the fact that this is a genuine patch to the meta-system, but
looking at the scope, I wonder if it better is located in hints/linux.sh

I don't know. The build system is a bit of a mystery to me (I'm not sure,
but I think the first patch was broken in the non-glibc case).

There are four options here​: put the test in metaconfig or the hints file,
and use version number testing or a test program. Testing by version
numbers seems to be discouraged, and while I have a test program, the only
easy way to tell whether it deadlocked is to wait for a timeout. I'm
paranoid about that reporting false failures on very busy systems with
fixed glibcs. In the failure case, it also incurs a delay on the build
system while it waits for the timeout—I chose two seconds, we could
probably get away with one second.

I'd argue that the code with the test program might well go into
metaconfig​: pthread_atfork() is broken for all users, not just Perl. The
test isn't specific to glibc or linux—it should work on all POSIX systems,
and if it fails on a non-glibc system we definitely don't want to use
pthread_atfork() there.

So I've attached the two test-program-based versions, as patches to
metaconfig and perl. Either one appears to work, and installing both also
appears to work.

But you now introduced exactly the deadlock that the use of pthread_at_fork
was supposed to fix​: if thread 1 forks while thread 2 holds a perl mutex,
the new process will deadlock as soon as it tries to acquire that mutex.

This is not a solution in any way.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 4, 2014

From prumpf@gmail.com

If HAS_PTHREAD_ATFORK is undefined, Perl_my_fork() calls the same handlers
that would otherwise have been installed by pthread_atfork(). So the
deadlock you describe would only happen if someone called fork() (the C
function, not the Perl function) directly, rather than going through
Perl_my_fork(). Is that the case you're worrying about?

If the malloc hack is considered the better workaround, we can do that, of
course.

On Thu, Apr 3, 2014 at 5​:48 PM, Leon Timmermans via RT <
perlbug-followup@​perl.org> wrote​:

On Tue, Apr 1, 2014 at 5​:12 PM, Philipp Rumpf <prumpf@​gmail.com> wrote​:

On Mon, Mar 31, 2014 at 6​:28 AM, H. Merijn Brand via RT <
perlbug-followup@​perl.org> wrote​:

I admire the fact that this is a genuine patch to the meta-system, but
looking at the scope, I wonder if it better is located in hints/linux.sh

I don't know. The build system is a bit of a mystery to me (I'm not sure,
but I think the first patch was broken in the non-glibc case).

There are four options here​: put the test in metaconfig or the hints
file,
and use version number testing or a test program. Testing by version
numbers seems to be discouraged, and while I have a test program, the
only
easy way to tell whether it deadlocked is to wait for a timeout. I'm
paranoid about that reporting false failures on very busy systems with
fixed glibcs. In the failure case, it also incurs a delay on the build
system while it waits for the timeout—I chose two seconds, we could
probably get away with one second.

I'd argue that the code with the test program might well go into
metaconfig​: pthread_atfork() is broken for all users, not just Perl. The
test isn't specific to glibc or linux—it should work on all POSIX
systems,
and if it fails on a non-glibc system we definitely don't want to use
pthread_atfork() there.

So I've attached the two test-program-based versions, as patches to
metaconfig and perl. Either one appears to work, and installing both also
appears to work.

But you now introduced exactly the deadlock that the use of pthread_at_fork
was supposed to fix​: if thread 1 forks while thread 2 holds a perl mutex,
the new process will deadlock as soon as it tries to acquire that mutex.

This is not a solution in any way.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 6, 2014

From prumpf@gmail.com

If it's possible to add a configuration variable in hints/linux.sh, I
haven't figured out how. So here's the version that changes metaconfig, but
uses the malloc() hack. Applications that embed Perl are likely to
copy-and-paste the code that calls PTHREAD_ATFORK, so I've exported the
Perl_atfork_fix symbol; they're very likely not to need the workaround,
anyway.

On Thu, Apr 3, 2014 at 10​:15 PM, Philipp Rumpf <prumpf@​gmail.com> wrote​:

If HAS_PTHREAD_ATFORK is undefined, Perl_my_fork() calls the same handlers
that would otherwise have been installed by pthread_atfork(). So the
deadlock you describe would only happen if someone called fork() (the C
function, not the Perl function) directly, rather than going through
Perl_my_fork(). Is that the case you're worrying about?

If the malloc hack is considered the better workaround, we can do that, of
course.

On Thu, Apr 3, 2014 at 5​:48 PM, Leon Timmermans via RT <
perlbug-followup@​perl.org> wrote​:

On Tue, Apr 1, 2014 at 5​:12 PM, Philipp Rumpf <prumpf@​gmail.com> wrote​:

On Mon, Mar 31, 2014 at 6​:28 AM, H. Merijn Brand via RT <
perlbug-followup@​perl.org> wrote​:

I admire the fact that this is a genuine patch to the meta-system, but
looking at the scope, I wonder if it better is located in
hints/linux.sh

I don't know. The build system is a bit of a mystery to me (I'm not
sure,
but I think the first patch was broken in the non-glibc case).

There are four options here​: put the test in metaconfig or the hints
file,
and use version number testing or a test program. Testing by version
numbers seems to be discouraged, and while I have a test program, the
only
easy way to tell whether it deadlocked is to wait for a timeout. I'm
paranoid about that reporting false failures on very busy systems with
fixed glibcs. In the failure case, it also incurs a delay on the build
system while it waits for the timeout—I chose two seconds, we could
probably get away with one second.

I'd argue that the code with the test program might well go into
metaconfig​: pthread_atfork() is broken for all users, not just Perl. The
test isn't specific to glibc or linux—it should work on all POSIX
systems,
and if it fails on a non-glibc system we definitely don't want to use
pthread_atfork() there.

So I've attached the two test-program-based versions, as patches to
metaconfig and perl. Either one appears to work, and installing both
also
appears to work.

But you now introduced exactly the deadlock that the use of
pthread_at_fork
was supposed to fix​: if thread 1 forks while thread 2 holds a perl mutex,
the new process will deadlock as soon as it tries to acquire that mutex.

This is not a solution in any way.

Leon

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 6, 2014

From prumpf@gmail.com

perl-deadlock-workaround-004.diff
diff --git a/cpan/Devel-PPPort/parts/embed.fnc b/cpan/Devel-PPPort/parts/embed.fnc
index e076893..b8ba5c0 100644
--- a/cpan/Devel-PPPort/parts/embed.fnc
+++ b/cpan/Devel-PPPort/parts/embed.fnc
@@ -877,6 +877,7 @@ Apr	|void	|my_exit	|U32 status
 Apr	|void	|my_failure_exit
 Ap	|I32	|my_fflush_all
 Anp	|Pid_t	|my_fork
+np	|void   |atfork_fix
 Anp	|void	|atfork_lock
 Anp	|void	|atfork_unlock
 Apmb	|I32	|my_lstat
diff --git a/embed.fnc b/embed.fnc
index 567e587..16615e8 100644
--- a/embed.fnc
+++ b/embed.fnc
@@ -898,6 +898,7 @@ Apr	|void	|my_exit	|U32 status
 Apr	|void	|my_failure_exit
 Ap	|I32	|my_fflush_all
 Anp	|Pid_t	|my_fork
+np	|void   |atfork_fix
 Anp	|void	|atfork_lock
 Anp	|void	|atfork_unlock
 Apmb	|I32	|my_lstat
diff --git a/embed.h b/embed.h
index 0ddaca7..18c02f1 100644
--- a/embed.h
+++ b/embed.h
@@ -1027,6 +1027,7 @@
 #define allocmy(a,b,c)		Perl_allocmy(aTHX_ a,b,c)
 #define amagic_is_enabled(a)	Perl_amagic_is_enabled(aTHX_ a)
 #define apply(a,b,c)		Perl_apply(aTHX_ a,b,c)
+#define atfork_fix		Perl_atfork_fix
 #define av_extend_guts(a,b,c,d,e)	Perl_av_extend_guts(aTHX_ a,b,c,d,e)
 #define bind_match(a,b,c)	Perl_bind_match(aTHX_ a,b,c)
 #define block_end(a,b)		Perl_block_end(aTHX_ a,b)
diff --git a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
index 730c565..b486e20 100644
--- a/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
+++ b/ext/ExtUtils-Miniperl/lib/ExtUtils/Miniperl.pm
@@ -129,6 +129,9 @@ main(int argc, char **argv, char **env)
      * call PTHREAD_ATFORK() explicitly, but if and only if it hasn't
      * been called at least once before in the current process.
      * --GSAR 2001-07-20 */
+#ifdef USE_PTHREAD_ATFORK_MALLOC_HACK
+    Perl_atfork_fix();
+#endif
     PTHREAD_ATFORK(Perl_atfork_lock,
                    Perl_atfork_unlock,
                    Perl_atfork_unlock);
diff --git a/proto.h b/proto.h
index dd5edde..bbba40a 100644
--- a/proto.h
+++ b/proto.h
@@ -140,6 +140,7 @@ PERL_CALLCONV void	Perl_apply_attrs_string(pTHX_ const char *stashpv, CV *cv, co
 #define PERL_ARGS_ASSERT_APPLY_ATTRS_STRING	\
 	assert(stashpv); assert(cv); assert(attrstr)
 
+PERL_CALLCONV void	Perl_atfork_fix(void);
 PERL_CALLCONV void	Perl_atfork_lock(void);
 PERL_CALLCONV void	Perl_atfork_unlock(void);
 PERL_CALLCONV SV**	Perl_av_arylen_p(pTHX_ AV *av)
diff --git a/util.c b/util.c
index a5451c1..df26259 100644
--- a/util.c
+++ b/util.c
@@ -2569,6 +2569,19 @@ Perl_my_popen(pTHX_ const char *cmd, const char *mode)
 
 #endif /* !DOSISH */
 
+#ifdef USE_PTHREAD_ATFORK_MALLOC_HACK
+/* needs to be global so GCC doesn't optimize away the malloc() */
+void *pthread_atfork_fix_pointer;
+
+void Perl_atfork_fix(void)
+{
+    /* To avoid a deadlock situation, glibc's malloc must be initialized
+     * before we call pthread_atfork. We can't just use (void)malloc(0)
+     * because GCC removes such calls. */
+    pthread_atfork_fix_pointer = malloc(0);
+}
+#endif
+
 /* this is called in parent before the fork() */
 void
 Perl_atfork_lock(void)
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Apr 6, 2014

From prumpf@gmail.com

metaconfig-pthread-004.diff
diff --git a/U/threads/d_pthread_atfork.U b/U/threads/d_pthread_atfork.U
index 77a8b43..2cc9eca 100644
--- a/U/threads/d_pthread_atfork.U
+++ b/U/threads/d_pthread_atfork.U
@@ -1,24 +1,38 @@
 ?RCS: $Id$
 ?RCS:
 ?RCS: Copyright (c) 2001 Jarkko Hietaniemi
+?RCS: Parts taken from d_pthreadj.U, which is:
+?RCS:   Copyright (c) 1998 Andy Dougherty
 ?RCS:
 ?RCS: You may distribute under the terms of either the GNU General Public
 ?RCS: License or the Artistic License, as specified in the README file.
 ?RCS:
-?MAKE:d_pthread_atfork: Inlibc cat Compile usethreads Setvar
+?MAKE:d_pthread_atfork d_pthread_atfork_malloc_hack: Inlibc cat Compile Setvar run rm usethreads
 ?MAKE:	-pick add $@ %<
 ?S:d_pthread_atfork:
 ?S:	This variable conditionally defines the HAS_PTHREAD_ATFORK symbol,
 ?S:	which indicates to the C program that the pthread_atfork()
 ?S:	routine is available.
 ?S:.
+?S:d_pthread_atfork_malloc_hack:
+?S:	This variable conditionally defines the USE_PTHREAD_ATFORK_MALLOC_HACK
+?S:	symbol, which indicates to the C program that malloc() needs to be
+?S:	called before pthread_atfork() is.
+?S:.
 ?C:HAS_PTHREAD_ATFORK:
 ?C:	This symbol, if defined, indicates that the pthread_atfork routine
 ?C:	is available to setup fork handlers.
 ?C:.
+?C:USE_PTHREAD_ATFORK_MALLOC_HACK:
+?C:	This symbol, if defined, indicates that pthread_atfork is broken
+?C:	unless malloc is called before it.
+?C:.
 ?H:#$d_pthread_atfork HAS_PTHREAD_ATFORK		/**/
+?H:#$d_pthread_atfork_malloc_hack USE_PTHREAD_ATFORK_MALLOC_HACK /**/
 ?H:.
-?LINT:set d_pthread_atfork
+?LINT:set d_pthread_atfork d_pthread_atfork_malloc_hack
+?T:yyy
+?F:!try
 : see whether the pthread_atfork exists
 $cat >try.c <<EOP
 #include <pthread.h>
@@ -47,3 +61,113 @@ esac
 set d_pthread_atfork
 eval $setvar
 
+: see whether pthread_atfork exists and works
+echo "Checking whether pthread_atfork requires a workaround..." >&4
+$cat >try.c <<'EOP'
+#include <stdio.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <signal.h>
+
+pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+
+int pipe_fd[4];
+
+void lock(void)
+{
+  if (write(pipe_fd[3], "\n", 1) <= 0) {
+    _exit(1);
+  }
+  pthread_mutex_lock(&mutex);
+}
+
+/* This needs to be a global variable, or GCC gets clever on us
+ * and throws out the malloc() call. */
+volatile void *throwaway;
+
+void *lock_then_malloc(void *dummy)
+{
+  char c;
+
+  pthread_mutex_lock(&mutex);
+  if (write(pipe_fd[1], "\n", 1) <= 0) {
+    _exit(1);
+  }
+
+  if (read(pipe_fd[2], &c, 1) <= 0) {
+    _exit(1);
+  }
+  throwaway = malloc(1024);
+  pthread_mutex_unlock(&mutex);
+
+  return NULL;
+}
+
+void alarm_handler(int dummy)
+{
+  _exit(1);
+}
+
+struct sigaction sa;
+
+int main(int argc, char **argv)
+{
+  pthread_attr_t attr;
+  pthread_t tid;
+  char c;
+  int ret;
+
+  if (pthread_atfork(lock, NULL, NULL)) {
+    return 1;
+  }
+
+  if (pipe(pipe_fd)) {
+    return 1;
+  }
+
+  if (pipe(pipe_fd+2)) {
+    return 1;
+  }
+
+  if (pthread_attr_init(&attr)) {
+    return 1;
+  }
+  if (pthread_create(&tid, &attr, lock_then_malloc, NULL)) {
+    return 1;
+  }
+
+  if (read(pipe_fd[0], &c, 1) <= 0) {
+    return 1;
+  }
+
+  sa.sa_handler = alarm_handler;
+  sigemptyset(&sa.sa_mask);
+  if (sigaction(SIGALRM, &sa, NULL)) {
+    return 1;
+  }
+  alarm(2);
+
+  ret = fork();
+  if (ret < 0)
+    return 1;
+
+  if (ret == 0)
+    printf("success\n");
+  return 0;
+}
+EOP
+
+: see if pthread_atfork exists and works
+set try
+if eval $compile; then
+    yyy=`$run ./try`
+fi
+$rm -f try try.*
+case "$yyy" in
+     success) echo "It does work without a workaround." >&4; val="$undef" ;;
+     *) echo "Workaround required." >&4; val="$define" ;;
+esac
+set d_pthread_atfork_malloc_hack
+eval $setvar
+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant