Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/unicore/mktables takes too long #16401

Open
p5pRT opened this issue Feb 1, 2018 · 17 comments
Open

lib/unicore/mktables takes too long #16401

p5pRT opened this issue Feb 1, 2018 · 17 comments

Comments

@p5pRT
Copy link

@p5pRT p5pRT commented Feb 1, 2018

Migrated from rt.perl.org#132800 (status was 'open')

Searchable as RT132800$

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 1, 2018

From @dur-randir

Created by @dur-randir

High-parallel builds of perl are currently stranded by the
lib/unicore/mktables step. It takes ~60% of the total build time​:

git clean -dfx && ./Configure -de -Dusedevel && time make -j20 test_prep
make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total

./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-p -w 15.82s user 0.23s system 99% cpu 16.057 total

If I include configure step in the total time measurement, it's still ~27%

bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s
user 14.36s system 340% cpu 58.733 total

This happened only during this development cycle, this step has been
taking around 12-16 seconds on 5.20-5.26 with ~second growth each
release. Can this be addressed in some way? This makes bisecting
through 5.27.x and, later, all future perls, much slower.

Perl Info

Flags:
    category=core
    severity=low

Site configuration information for perl 5.27.9:

Configured by dur-randir at Wed Jan 31 10:14:37 MSK 2018.

Summary of my perl5 (revision 5 version 27 subversion 9) configuration:
  Commit id: 577d3e04be845580196418dd9df1575e2cb4c0b6
  Platform:
    osname=darwin
    osvers=13.4.0
    archname=darwin-2level
    uname='darwin isengard.local 13.4.0 darwin kernel version 13.4.0:
mon jan 11 18:17:34 pst 2016; root:xnu-2422.115.15~1release_x86_64
x86_64 '
    config_args='-de -Dusedevel'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=undef
    usemultiplicity=undef
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='cc'
    ccflags ='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-DPERL_USE_SAFE_PUTENV'
    optimize='-O3'
    cppflags='-fno-common -DPERL_DARWIN -mmacosx-version-min=10.9
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion=''
    gccversion='4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='cc'
    ldflags =' -mmacosx-version-min=10.9 -fstack-protector -L/usr/local/lib'
    libpth=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.0/lib
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk/usr/lib
/usr/local/lib /usr/lib
    libs=-lpthread -lgdbm -ldbm -ldl -lm -lutil -lc
    perllibs=-lpthread -ldl -lm -lutil -lc
    libc=
    so=dylib
    useshrplib=false
    libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=bundle
    d_dlsymun=undef
    ccdlflags=' '
    cccdlflags=' '
    lddlflags=' -mmacosx-version-min=10.9 -bundle -undefined
dynamic_lookup -L/usr/local/lib -fstack-protector'



@INC for perl 5.27.9:
    lib
    /usr/local/lib/perl5/site_perl/5.27.9/darwin-2level
    /usr/local/lib/perl5/site_perl/5.27.9
    /usr/local/lib/perl5/5.27.9/darwin-2level
    /usr/local/lib/perl5/5.27.9


Environment for perl 5.27.9:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/dur-randir
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin
    PERLBREW_BASHRC_VERSION=0.80
    PERLBREW_HOME=/Users/dur-randir/.perlbrew
    PERLBREW_MANPATH=/Users/dur-randir/perlbrew/perls/perl-5.22.1/man
    PERLBREW_PATH=/Users/dur-randir/perlbrew/bin:/Users/dur-randir/perlbrew/perls/perl-5.22.1/bin
    PERLBREW_PERL=perl-5.22.1
    PERLBREW_ROOT=/Users/dur-randir/perlbrew
    PERLBREW_VERSION=0.80
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/zsh

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @jkeenan

On Thu, 01 Feb 2018 21​:21​:59 GMT, randir wrote​:

This is a bug report for perl from sergey.aleynikov@​gmail.com,
generated with the help of perlbug 1.41 running under perl 5.27.9.

-----------------------------------------------------------------
[Please describe your issue here]

High-parallel builds of perl are currently stranded by the
lib/unicore/mktables step. It takes ~60% of the total build time​:

git clean -dfx && ./Configure -de -Dusedevel && time make -j20
test_prep
make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total

./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-p -w 15.82s user 0.23s system 99% cpu 16.057 total

If I include configure step in the total time measurement, it's still
~27%

bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s
user 14.36s system 340% cpu 58.733 total

This happened only during this development cycle, this step has been
taking around 12-16 seconds on 5.20-5.26 with ~second growth each
release. Can this be addressed in some way? This makes bisecting
through 5.27.x and, later, all future perls, much slower.

I don't find compelling evidence of this trend. I built perl at v5.24.0, v5.26.0 and HEAD twice in each of three environments. See the attachment for results. I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results. The data is obviously not statistically significant, but my impressions are that​:

* there is more variation in timings between different 'make' runs for the same Perl version than there is between different Perl versions;

* if one machine is inherently faster than another (typically, more cores), then the percentage of total clock time taken up by ./Configure is greater on that machine than the other; ('make' flies like a rocket on dromedary where 'nproc -all' returns 24);

* if you're running Porting/bisect.pl with the '--module' switch, the total time spent in building and testing prerequisites swamps that taken during 'make'.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @jkeenan

$ git clean -dfx && regular_configure && /usr/bin/time -f "\t%e real,\t%U user,\t%S sys" make -j${TEST_JOBS} test_prep

# zareason
# v5.24.0 (be2c0c6)
  67.48 real, 219.17 user, 8.35 sys
  71.22 real, 226.06 user, 8.65 sys

# v5.26.0 (95388f2)
  72.21 real, 232.49 user, 8.97 sys
  92.11 real, 249.32 user, 10.05 sys

# HEAD (1654584)
  72.38 real, 237.66 user, 9.01 sys
  69.66 real, 232.01 user, 8.06 sys

#####

# Linode
# v5.24.0 (be2c0c6)
  116.54 real, 212.20 user, 11.78 sys
  112.83 real, 203.69 user, 11.34 sys

# v5.26.0 (95388f2)
  123.03 real, 222.28 user, 12.84 sys
  128.87 real, 234.61 user, 13.41 sys

# HEAD (1654584)
  124.46 real, 227.50 user, 12.93 sys
  120.40 real, 218.99 user, 11.85 sys

#####

# dromedary
# v5.24.0 (be2c0c6)
  40.89 real, 182.25 user, 13.89 sys
  32.69 real, 90.58 user, 9.01 sys

# v5.26.0 (95388f2)
  34.62 real, 90.69 user, 8.91 sys
  34.43 real, 92.46 user, 9.18 sys

# HEAD (1654584)
  44.22 real, 186.02 user, 13.55 sys
  35.09 real, 91.11 user, 8.91 sys

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @khwilliamson

On 02/01/2018 06​:20 PM, James E Keenan via RT wrote​:

On Thu, 01 Feb 2018 21​:21​:59 GMT, randir wrote​:

This is a bug report for perl from sergey.aleynikov@​gmail.com,
generated with the help of perlbug 1.41 running under perl 5.27.9.

-----------------------------------------------------------------
[Please describe your issue here]

High-parallel builds of perl are currently stranded by the
lib/unicore/mktables step. It takes ~60% of the total build time​:

git clean -dfx && ./Configure -de -Dusedevel && time make -j20
test_prep
make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total

./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-p -w 15.82s user 0.23s system 99% cpu 16.057 total

If I include configure step in the total time measurement, it's still
~27%

bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s
user 14.36s system 340% cpu 58.733 total

This happened only during this development cycle, this step has been
taking around 12-16 seconds on 5.20-5.26 with ~second growth each
release. Can this be addressed in some way? This makes bisecting
through 5.27.x and, later, all future perls, much slower.

I don't find compelling evidence of this trend. I built perl at v5.24.0, v5.26.0 and HEAD twice in each of three environments. See the attachment for results. I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results. The data is obviously not statistically significant, but my impressions are that​:

* there is more variation in timings between different 'make' runs for the same Perl version than there is between different Perl versions;

* if one machine is inherently faster than another (typically, more cores), then the percentage of total clock time taken up by ./Configure is greater on that machine than the other; ('make' flies like a rocket on dromedary where 'nproc -all' returns 24);

* if you're running Porting/bisect.pl with the '--module' switch, the total time spent in building and testing prerequisites swamps that taken during 'make'.

I also haven't anecdotally noticed any marked decrease in speed. I
looked through the list of commits in 5.27 to see what might be causing
it, and nothing stands out. Each release of Unicode brings more data
that has to be processed, including whole new files and properties.
There was some of that in Unicode 10.

Note that my goal is to make mktables not run very frequently. It
really only needs to be run when a yearly new Unicode release comes
along, or it itself changes. Although, suppose there is an underlying
bug in the core that causes it to generate defective tables, and that
gets fixed. Then we would need to run it, but we wouldn't know that we
do. So it does get complicated. There in fact may still be a bug
which affects the outputs, but it generally only causes comment lines to
get garbaged, so there's no real harm. I actually haven't noticed this
bug in a long time, and I don't know if it still happens. (Once Tux and
I were trying to debug a mktables problem. It was slow going, and then
he rebased and the problem was suddenly gone. Independently Dave
Mitchell had fixed a problem in blead, and that was our underlying
cause. So this scenario has happened.)

And I have a question for you git people. I was looking at the blame
history

https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables

It includes

https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45

in the list of recent changes. But I don't see that that commit
actually modified mktables. What's going on?

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @demerphq

On 2 February 2018 at 06​:36, Karl Williamson <public@​khwilliamson.com> wrote​:

On 02/01/2018 06​:20 PM, James E Keenan via RT wrote​:

On Thu, 01 Feb 2018 21​:21​:59 GMT, randir wrote​:

This is a bug report for perl from sergey.aleynikov@​gmail.com,
generated with the help of perlbug 1.41 running under perl 5.27.9.

-----------------------------------------------------------------
[Please describe your issue here]

High-parallel builds of perl are currently stranded by the
lib/unicore/mktables step. It takes ~60% of the total build time​:

git clean -dfx && ./Configure -de -Dusedevel && time make -j20
test_prep
make -j20 test_prep 162.93s user 9.08s system 670% cpu 25.648 total

./miniperl -Ilib lib/unicore/mktables -C lib/unicore -P pod -maketest
-p -w 15.82s user 0.23s system 99% cpu 16.057 total

If I include configure step in the total time measurement, it's still
~27%

bash -c './Configure -de -Dusedevel && make -j20 test_prep' 185.76s
user 14.36s system 340% cpu 58.733 total

This happened only during this development cycle, this step has been
taking around 12-16 seconds on 5.20-5.26 with ~second growth each
release. Can this be addressed in some way? This makes bisecting
through 5.27.x and, later, all future perls, much slower.

I don't find compelling evidence of this trend. I built perl at v5.24.0,
v5.26.0 and HEAD twice in each of three environments. See the attachment
for results. I focus on 'real' time because with bisection the clock time
measures the time a human is waiting for results. The data is obviously not
statistically significant, but my impressions are that​:

* there is more variation in timings between different 'make' runs for the
same Perl version than there is between different Perl versions;

* if one machine is inherently faster than another (typically, more
cores), then the percentage of total clock time taken up by ./Configure is
greater on that machine than the other; ('make' flies like a rocket on
dromedary where 'nproc -all' returns 24);

* if you're running Porting/bisect.pl with the '--module' switch, the
total time spent in building and testing prerequisites swamps that taken
during 'make'.

I also haven't anecdotally noticed any marked decrease in speed. I looked
through the list of commits in 5.27 to see what might be causing it, and
nothing stands out. Each release of Unicode brings more data that has to be
processed, including whole new files and properties. There was some of that
in Unicode 10.

Note that my goal is to make mktables not run very frequently. It really
only needs to be run when a yearly new Unicode release comes along, or it
itself changes. Although, suppose there is an underlying bug in the core
that causes it to generate defective tables, and that gets fixed. Then we
would need to run it, but we wouldn't know that we do. So it does get
complicated. There in fact may still be a bug which affects the outputs,
but it generally only causes comment lines to get garbaged, so there's no
real harm. I actually haven't noticed this bug in a long time, and I don't
know if it still happens. (Once Tux and I were trying to debug a mktables
problem. It was slow going, and then he rebased and the problem was
suddenly gone. Independently Dave Mitchell had fixed a problem in blead,
and that was our underlying cause. So this scenario has happened.)

And I have a question for you git people. I was looking at the blame
history

https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables

It includes

https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45

in the list of recent changes. But I don't see that that commit actually
modified mktables. What's going on?

Personally I have never understood why we treat the output of mktables
any differently than we treat the output of say,
Porting/regcharclass.pl or Porting/regen_headers.pl

IMO we should just check the output files into the repo, and only run
mktables when the input data is updated. I see no point in
regenerating the tables every time someone does a clean build.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @dur-randir

On Thu, 01 Feb 2018 17​:20​:00 -0800, jkeenan wrote​:

* there is more variation in timings between different 'make' runs for
the same Perl version than there is between different Perl versions;

Yes, this depends on a machine load. My measurements have been taken during low-load times, so the were repeatable. What you've measured is only total build time, not what it constitutes of, but I'm focusing on the latter here.

I focus on 'real' time because with bisection the clock time measures the time? I focus on 'real' time because with bisection the clock time measures the time a human is waiting for results.

For me, as a human, only real-world waiting time matters - as it's about how fast can I move on with doing different things.

* if one machine is inherently faster than another (typically, more
cores), then the percentage of total clock time taken up by

But why should we make building perl artificially slower? As of now, mktables step already taken half as much as Configure step, - I've shown this in my timings.

* if you're running Porting/bisect.pl with the '--module' switch, the
total time spent in building and testing prerequisites swamps that
taken during 'make'.

That's not about --module, that's for example about bisecting fuzzer findings. And, again, not making perl build artificially slower.

2018-02-02 8​:36 GMT+03​:00 Karl Williamson <public@​khwilliamson.com>

There was some of that in Unicode 10.

I don't want to be in a situation that, after some years and Unicode 20, mktables takes ages to complete due to standard being bloated. So I've raised this question sooner.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Feb 2, 2018

From @dur-randir

On Thu, 01 Feb 2018 21​:37​:27 -0800, public@​khwilliamson.com wrote​:

And I have a question for you git people. I was looking at the blame
history

It includes
https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45

in the list of recent changes. But I don't see that that commit
actually modified mktables. What's going on?

Each merge-commit in git (and mercurial) have at least two parents. So, for da4e040, they are

commit da4e040
Merge​: b2cd5cb d6374f3

and now we can look at separate diffs between merge-commit and each of it's parents​:

git diff b2cd5cb da4e040
git diff d6374f3 da4e040

And the second diff does touch mktables.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 2, 2018

From @khwilliamson

I ran across this thread which happened in a previous incarnation of this issue. It may be useful
http​://nntp.perl.org/group/perl.perl5.porters/168880
--
Karl Williamson

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 2, 2018

From @craigberry

On Thu, Feb 1, 2018 at 11​:36 PM, Karl Williamson
<public@​khwilliamson.com> wrote​:

And I have a question for you git people. I was looking at the blame
history

https://perl5.git.perl.org/perl.git/history/HEAD:/lib/unicore/mktables

It includes

https://perl5.git.perl.org/perl.git/commit/da4e040f42421764ef069371d77c008e6b801f45

in the list of recent changes. But I don't see that that commit actually
modified mktables. What's going on?

I believe it's because that merge was not a fast-forward merge so it
altered the state of the entire repository and now the history of
every file includes that commit. Someone with more git fu can
probably explain it better.

This IMO is reason enough not to do merges that way and there was some
discussion at the time about that. IIRC there is some way to exclude
such commits with git log when working from the command line, but that
doesn't do you any good with gitweb.

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 8, 2018

From @tonycoz

On Thu, 01 Feb 2018 21​:37​:27 -0800, public@​khwilliamson.com wrote​:

I also haven't anecdotally noticed any marked decrease in speed.

I tried running it under NYTProf.

I managed to find a roughly 10% speed up by re-implementing get_invalid_code_point() to avoid inverting the range list each time.

You can see the profile results after that change at​:

http​://files.develop-help.com/nytprof/index.html

Tony

@p5pRT
Copy link
Author

@p5pRT p5pRT commented Mar 8, 2018

From @tonycoz

0001-perl-132800-slightly-speed-up-mktables.patch
From 1eac41dd9489898aadbc753de0bb63b4c6621ac7 Mon Sep 17 00:00:00 2001
From: Tony Cook <tony@develop-help.com>
Date: Thu, 8 Mar 2018 10:42:59 +1100
Subject: [PATCH] (perl #132800) slightly speed up mktables

get_invalid_code_point() inverted the supplied range list and then
called get_valid_code_point() on the result.

Inverting is cheap for simple range lists, but it's more expensive
for more complex lists, and in most cases only first few entries (or
only the first) ranges need to be checked for a code point.
---
 lib/unicore/mktables | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/lib/unicore/mktables b/lib/unicore/mktables
index 88d9c036a3..81914f0600 100644
--- a/lib/unicore/mktables
+++ b/lib/unicore/mktables
@@ -5297,8 +5297,31 @@ sub trace { return main::trace(@_); }
         my $self = shift;
         Carp::carp_extra_args(\@_) if main::DEBUG && @_;
 
-        # Just find a valid code point of the inverse, if any.
-        return Range_List->new(Initialize => ~ $self)->get_valid_code_point;
+        # On first pass, don't choose less desirable code points; if no good
+        # one is found, repeat, allowing a less desirable one to be selected.
+        for my $try_hard (0, 1) {
+            my $end = $try_hard ? $MAX_WORKING_CODEPOINT : $MAX_UNICODE_CODEPOINT;
+
+            # Look through all the gaps between ranges for a usable code point.
+            for my $set (reverse $self->ranges) {
+                # Try the edge cases first, starting with the end point of the
+                # gap.
+                my $start = $set->end+1;
+                return $end if $end >= $start && is_code_point_usable($end, $try_hard);
+
+                # End point didn't, work.  Start at the beginning and try
+                # every one until find one that does work.
+                for my $trial ($start .. $end - 1) {
+                    return $trial if is_code_point_usable($trial, $try_hard);
+                }
+                $end = $set->start-1;
+            }
+            my $start = 0;
+            for my $trial ($start .. $end) {
+                return $trial if is_code_point_usable($trial, $try_hard);
+            }
+        }
+        return ();  # If none found, give up.
     }
 } # end closure for Range_List
 
-- 
2.11.0

@jkeenan
Copy link
Contributor

@jkeenan jkeenan commented Feb 3, 2020

It appears that this past November @khwilliamson applied the patch originally submitted by @tonycoz in 2018.

What are the criteria which have to be met in order to close this ticket?

Thank you very much.
Jim Keenan

@jkeenan jkeenan added the Closable? label Feb 3, 2020
@khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Feb 3, 2020

@demerphq
Copy link
Contributor

@demerphq demerphq commented Feb 3, 2020

@dur-randir
Copy link
Member

@dur-randir dur-randir commented Feb 3, 2020

It appears that this past November @khwilliamson applied the patch originally submitted by @tonycoz in 2018.

This step takes now 28 seconds to complete on that machine, which is slower than in the report. So, while that optimization might've improved it, it's still regressing, as Unicode standard grows.

@khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Feb 3, 2020

@jkeenan jkeenan removed the Closable? label Feb 4, 2020
@xenu xenu removed the affects-5.27 label Nov 19, 2021
@xenu xenu removed the Severity Low label Dec 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants