Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark.pm uses excessive CPU on OpenBSD #13891

Closed
p5pRT opened this issue Jun 1, 2014 · 9 comments
Closed

Benchmark.pm uses excessive CPU on OpenBSD #13891

p5pRT opened this issue Jun 1, 2014 · 9 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Jun 1, 2014

Migrated from rt.perl.org#122003 (status was 'resolved')

Searchable as RT122003$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 1, 2014

From jeremy.devenport@gmail.com

Created by jeremy.devenport@gmail.com

The tests for perl 5.20.0 sometimes take a very long time to run on
OpenBSD. The problem appears to be with lib/Benchmark.t. I see there was
an earlier report
http​://www.nntp.perl.org/group/perl.perl5.porters/2013/09/msg208168.html
but that doesn't appear to be sufficient.

I can reliable reproduce the failure by running lib/Benchmark.t but I
can also reproduce it with pretty much any use of Benchmark​:

  perl -MBenchmark -e 'timethis(1, sub { })'

OpenBSD5.5$ time ktrace -tc /tmp/perl-v5.20.0/bin/perl5.20.0
-MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  (warning​: too few iterations for a reliable count)
  0m57.36s real 0m0.02s user 0m55.09s system

$ kdump | grep 'CALL gettimeofday' | wc -l
2073647
$ kdump | grep 'CALL getrusage' | wc -l
2073646

Compare that to a linux box (also 5.20.0, configured with defaults).

DebianJessie$ time strace -c perl -MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  (warning​: too few iterations for a reliable count)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000001 0 10051 times
...snip, nothing else of consequence...

real 0m0.095s
user 0m0.036s
sys 0m0.052s

I tried to track down where the time was going using some debug prints
and it looks like the culprit is the busy wait in runloops ("Wait for
the user timer to tick..."). The combination of the while loop condition
and empty loop body uses almost no user cpu time relative to the system
cpu time used for the getrusage/gettimeofday calls so this can end up
using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in
user space via vdso (see man vdso) so the while loop uses more user
cpu time per iteration there.

It looks like if that while loop in Benchmark.pm needs to stay then it
should do something non-trivial in the loop body similar to the fix
made in "5900049 avoid ultra-lightweight code".

Perl Info

Flags:
    category=library
    severity=medium
    module=Benchmark

Site configuration information for perl 5.20.0:

Configured by jeremy at Sat May 31 20:26:45 EDT 2014.

Summary of my perl5 (revision 5 version 20 subversion 0) configuration:
  Commit id: d7d4eceb184d8fc54797d661a6d5ff9a4a7048c3
  Platform:
    osname=openbsd, osvers=5.5, archname=OpenBSD.amd64-openbsd
    uname='openbsd openbsd55.local 5.5 generic#0 amd64 '
    config_args='-de -Dusedevel -Dprefix=/tmp/perl-v5.20.0'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
-I/usr/local/include',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 20070719 ', gccosandvers='openbsd5.5'
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lm -lutil -lc
    perllibs=-lm -lutil -lc
    libc=/usr/lib/libc.so.73.1, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC
-L/usr/local/lib -fstack-protector'



@INC for perl 5.20.0:
    /tmp/perl-v5.20.0/lib/site_perl/5.20.0/OpenBSD.amd64-openbsd
    /tmp/perl-v5.20.0/lib/site_perl/5.20.0
    /tmp/perl-v5.20.0/lib/5.20.0/OpenBSD.amd64-openbsd
    /tmp/perl-v5.20.0/lib/5.20.0
    .


Environment for perl 5.20.0:
    HOME=/home/jeremy
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/jeremy/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/X11R6/bin:/usr/local/bin:/usr/local/sbin:/usr/games:.
    PERL_BADLANG (unset)
    SHELL=/bin/ksh

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 1, 2014

From @jkeenan

On Sat May 31 18​:13​:26 2014, jeremy.devenport@​gmail.com wrote​:

This is a bug report for perl from jeremy.devenport@​gmail.com,
generated with the help of perlbug 1.40 running under perl 5.20.0.

-----------------------------------------------------------------
[Please describe your issue here]

The tests for perl 5.20.0 sometimes take a very long time to run on
OpenBSD. The problem appears to be with lib/Benchmark.t. I see there
was
an earlier report
http​://www.nntp.perl.org/group/perl.perl5.porters/2013/09/msg208168.html
but that doesn't appear to be sufficient.

I can reliable reproduce the failure by running lib/Benchmark.t but I
can also reproduce it with pretty much any use of Benchmark​:

perl -MBenchmark -e 'timethis(1, sub { })'

OpenBSD5.5$ time ktrace -tc /tmp/perl-v5.20.0/bin/perl5.20.0
-MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(warning​: too few iterations for a reliable count)
0m57.36s real 0m0.02s user 0m55.09s system

$ kdump | grep 'CALL gettimeofday' | wc -l
2073647
$ kdump | grep 'CALL getrusage' | wc -l
2073646

Compare that to a linux box (also 5.20.0, configured with defaults).

DebianJessie$ time strace -c perl -MBenchmark -e 'timethis(1, sub {
})'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
(warning​: too few iterations for a reliable count)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000001 0 10051 times
...snip, nothing else of consequence...

real 0m0.095s
user 0m0.036s
sys 0m0.052s

I tried to track down where the time was going using some debug prints
and it looks like the culprit is the busy wait in runloops ("Wait for
the user timer to tick..."). The combination of the while loop
condition
and empty loop body uses almost no user cpu time relative to the
system
cpu time used for the getrusage/gettimeofday calls so this can end up
using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in
user space via vdso (see man vdso) so the while loop uses more user
cpu time per iteration there.

It looks like if that while loop in Benchmark.pm needs to stay then it
should do something non-trivial in the loop body similar to the fix
made in "5900049 avoid ultra-lightweight code".

Porters​: Investigation and resolution of this ticket would benefit from having smoke reports run on OpenBSD. Can anyone configure this and send results to perl.develop-help.org?

Thank you very much.
Jim Keenan

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 1, 2014

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 7, 2014

From @iabyn

On Sat, May 31, 2014 at 06​:13​:26PM -0700, Jeremy Devenport wrote​:

I tried to track down where the time was going using some debug prints
and it looks like the culprit is the busy wait in runloops ("Wait for
the user timer to tick..."). The combination of the while loop condition
and empty loop body uses almost no user cpu time relative to the system
cpu time used for the getrusage/gettimeofday calls so this can end up
using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in
user space via vdso (see man vdso) so the while loop uses more user
cpu time per iteration there.

can you confirm that on your system, the following one-liner takes
many seconds to execute?​:

  time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

And if so, can you find a minimum bit of 'make work' to add in the {}
block that reduces the wall time to something sensible?

--
Monto Blanco... scorchio!

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 7, 2014

From andrew@afresh1.com

On Sat, Jun 07, 2014 at 03​:52​:52PM +0100, Dave Mitchell wrote​:

On Sat, May 31, 2014 at 06​:13​:26PM -0700, Jeremy Devenport wrote​:

I tried to track down where the time was going using some debug prints
and it looks like the culprit is the busy wait in runloops ("Wait for
the user timer to tick..."). The combination of the while loop condition
and empty loop body uses almost no user cpu time relative to the system
cpu time used for the getrusage/gettimeofday calls so this can end up
using quite a bit of wall clock before a single user clock tick.

This doesn't show up on linux because gettimeofday is implemented in
user space via vdso (see man vdso) so the while loop uses more user
cpu time per iteration there.

can you confirm that on your system, the following one-liner takes
many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

And if so, can you find a minimum bit of 'make work' to add in the {}
block that reduces the wall time to something sensible?

I can't seem to reproduce this with stock perl-5.20.0, I tried on a couple
architectures.

$ sysctl kern.version hw.model
kern.version=OpenBSD 5.5-current (GENERIC.MP) #122​: Sat May 10 14​:16​:11 MDT 2014
  deraadt@​amd64.openbsd.org​:/usr/src/sys/arch/amd64/compile/GENERIC.MP

hw.model=Intel(R) Core(TM) i7-2620M CPU @​ 2.70GHz
$ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5, version 20, subversion 0 (v5.20.0) built for OpenBSD.amd64-openbsd
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1
  0.00 real 0.00 user 0.01 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1
  0.06 real 0.01 user 0.06 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  (warning​: too few iterations for a reliable count)
  0.16 real 0.02 user 0.14 sys

kern.version=OpenBSD 5.5-current (GENERIC) #122​: Fri May 23 21​:23​:59 MDT 2014
  deraadt@​i386.openbsd.org​:/usr/src/sys/arch/i386/compile/GENERIC

hw.model=Intel(R) Pentium(R) M processor 1.30GHz ("GenuineIntel" 686-class)
$ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5, version 20, subversion 0 (v5.20.0) built for OpenBSD.i386-openbsd
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1
  0.00 real 0.00 user 0.00 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1
  0.04 real 0.03 user 0.02 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -e'$t = (times)[0]; while ($t == (times)[0]) {}'
  0.01 real 0.01 user 0.01 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  (warning​: too few iterations for a reliable count)
  0.06 real 0.06 user 0.00 sys

I also tried on a much slower machine, but I only have 5.20.0 plus OpenBSD's
local patches available and didn't have a couple days to compile a stock
version of perl. It didn't manifest here either, although loading
Benchmark is obviously more noticeably slower.

$ sysctl kern.version hw.model
kern.version=OpenBSD 5.5-current (GENERIC) #39​: Thu May 8 02​:11​:39 MDT 2014
  deraadt@​alpha.openbsd.org​:/usr/src/sys/arch/alpha/compile/GENERIC

hw.model=AlphaStation 200 4/166
$ LD_LIBRARY_PATH=./ ./perl -Ilib -v | head -2

This is perl 5, version 20, subversion 0 (v5.20.0) built for alpha-openbsd
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -e1
  0.52 real 0.13 user 0.32 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e1
  3.08 real 1.97 user 0.95 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -e'$t = (times)[0]; while ($t == (times)[0]) {}'
  0.54 real 0.13 user 0.34 sys
$ LD_LIBRARY_PATH=./ time ./perl -Ilib -MBenchmark -e 'timethis(1, sub { })'
timethis 1​: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
  (warning​: too few iterations for a reliable count)
  3.15 real 2.00 user 0.99 sys

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 8, 2014

From jeremy.devenport@gmail.com

can you confirm that on your system, the following one-liner takes
many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes, that one-liner demonstrates the same problem of high system/user cpu usage​:
  0m6.14s real 0m0.01s user 0m6.10s system

It also seems to be relevant that I'm running OpenBSD under
VirtualBox. I did a few more experiments with other OSs under
VirtualBox and the same OpenBSD version under VMware Player with both
a simple use of timethis() and with your one-liner. (I've reformatted
the timing output to be consistent across platforms)

$ time ./perl -Ilib -MBenchmark -e 'timethis(1, sub { }) for 1..10'
VirtualBox
OpenBSD 5.5
  1m59.00s real 0m0.21s user 1m58.19s system
FreeBSD 10
  0m4.80s real 0m0.16s user 0m4.64s system
Ubuntu Jessie
  0m0.26s real 0m0.20s user 0m0.04s system
VMware Player
OpenBSD 5.5
  0m0.39s real 0m0.20s user 0m0.20s system

$ time ./perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'
VirtualBox
OpenBSD 5.5
  0m6.14s real 0m0.01s user 0m6.10s system
FreeBSD 10
  0m1.18s real 0m0.00s user 0m1.18s system
Ubuntu Jessie
  0m0.03s real 0m0.01s user 0m0.02s system

VMware Player
OpenBSD 5.5
  0m0.00s real 0m0.01s user 0m0.00s system

It looks like FreeBSD under VirtualBox also takes longer than it
should (4 seconds to do 10 no-op benchmarks) but is probably fast
enough to be acceptable. And the same version of OpenBSD using VMware
works just fine. So this seems to be a problem specifically when
running under VirtualBox.

Adding somewhere between 1000 and 10,000 floating point divisions into
the body of the while loop in Benchmark.pm seems to be enough to get
it to behave.

With this change​:

- while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {} ;
+ while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {
+ for (my $i=0; $i < 10_000; $i++) { $i / 1.5 }
+ }

$ time ./perl -Ilib -MBenchmark -e 'timethis(1, sub { }) for 1..10'
VirtualBox
OpenBSD 5.5
  0m0.23s real 0m0.21s user 0m0.03s system

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 9, 2014

From @iabyn

On Sun, Jun 08, 2014 at 08​:55​:56AM -0400, Jeremy Devenport wrote​:

can you confirm that on your system, the following one-liner takes
many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes, that one-liner demonstrates the same problem of high system/user cpu usage​:
0m6.14s real 0m0.01s user 0m6.10s system

I've pushed the following fix to blead​:

commit 78462dd
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Mon Jun 9 11​:41​:22 2014 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Mon Jun 9 11​:51​:18 2014 +0100

  Benchmark.pm​: avoid long walltime on calibration
 
  RT #122003
 
  runloop() in Benchmark.pm does the equivalent of the following​:
 
  $t = (times)[0];
  while ($t == (times)[0]) {}
 
  so that it finishes in a position where the user CPU counter has just
  incremented.
 
  This is all well and good, but on some platforms (specifically OpenBSD
  running under VirtualBox), the system CPU burned each time round the loop
  to get times() is far greater than the user CPU burned. This can cause the
  loop to run for minutes consuming system CPU until enough user CPU has
  been burned to tick the user CPU counter.
 
  The fix in this commit is to replace the empty body of the while loop with
  something that does a gradually increasing amount of busy work.

Affected files ...
 
  M lib/Benchmark.pm

Differences ...

Inline Patch
diff --git a/lib/Benchmark.pm b/lib/Benchmark.pm
index 9a43a2b..73b3211 100644
--- a/lib/Benchmark.pm
+++ b/lib/Benchmark.pm
@@ -700,8 +700,18 @@ sub runloop {
     # getting a too low initial $n in the initial, 'find the minimum' loop
     # in &countit.  This, in turn, can reduce the number of calls to
     # &runloop a lot, and thus reduce additive errors.
+    #
+    # Note that its possible for the act of reading the system clock to
+    # burn lots of system CPU while we burn very little user clock in the
+    # busy loop, which can cause the loop to run for a very long wall time.
+    # So gradually ramp up the duration of the loop. See RT #122003
+    #
     my $tbase = Benchmark->new(0)->[1];
-    while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {} ;
+    my $limit = 1;
+    while ( ( $t0 = Benchmark->new(0) )->[1] == $tbase ) {
+        for (my $i=0; $i < $limit; $i++) { my $x = $i / 1.5 } # burn user CPU
+        $limit *= 1.1;
+    }
     $subref->();
     $t1 = Benchmark->new($n);
     $td = &timediff($t1, $t0);


-- 

A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 9, 2014

From @bingos

On Mon, Jun 09, 2014 at 11​:53​:59AM +0100, Dave Mitchell wrote​:

On Sun, Jun 08, 2014 at 08​:55​:56AM -0400, Jeremy Devenport wrote​:

can you confirm that on your system, the following one-liner takes
many seconds to execute?​:

time perl -e'$t = (times)[0]; while ($t == (times)[0]) {}'

Yes, that one-liner demonstrates the same problem of high system/user cpu usage​:
0m6.14s real 0m0.01s user 0m6.10s system

I've pushed the following fix to blead​:

commit 78462dd
Author​: David Mitchell <davem@​iabyn.com>
AuthorDate​: Mon Jun 9 11​:41​:22 2014 +0100
Commit​: David Mitchell <davem@​iabyn.com>
CommitDate​: Mon Jun 9 11​:51​:18 2014 +0100

Benchmark\.pm&#8203;: avoid long walltime on calibration

RT \#122003

runloop\(\) in Benchmark\.pm does the equivalent of the following&#8203;:

    $t = \(times\)\[0\];
    while \($t == \(times\)\[0\]\) \{\}

so that it finishes in a position where the user CPU counter has just
incremented\.

This is all well and good\, but on some platforms \(specifically OpenBSD
running under VirtualBox\)\, the system CPU burned each time round the loop
to get times\(\) is far greater than the user CPU burned\. This can cause the
loop to run for minutes consuming system CPU until enough user CPU has
been burned to tick the user CPU counter\.

The fix in this commit is to replace the empty body of the while loop with
something that does a gradually increasing amount of busy work\.

This has fixed the tardiness of this particular test on OpenBSD, NetBSD,
MidnightBSD and DragonflyBSD that I noticed when running on VirtualBox.

Many thanks!

--
Chris Williams
aka BinGOs
PGP ID 0x4658671F
http​://www.gumbynet.org.uk

@p5pRT p5pRT closed this Jun 10, 2014
@p5pRT
Copy link
Author

@p5pRT p5pRT commented Jun 10, 2014

@iabyn - Status changed from 'open' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant