Make newSV_type() an inline function #19414

richardleach · 2022-02-13T22:54:37Z

Note: This is an alternative to #19381, following feedback from @xenu.

When a new SV is created and upgraded to a type known at compile time,
using the general-purpose upgrade function (sv_upgrade) is clunky.
Specifically, while uprooting a SV head is lightweight (assuming there are
unused SVs), sv_upgrade is too big to be inlined, contains many branches
that can logically be resolved at compile time for known start & end types,
and the lookup of the correct body_details struct may add CPU cycles.

This PR:

Adds a new file - sv_inline.h, into which are moved many of the definitions
and structures from sv.c. This seemed necessary because of the spread
of type definitions across existing header files.
Converts newSV_type into an inline function and adds to it the logic from
sv_upgrade necessary to upgrade a SVt_NULL.

Building on that, the commits in this PR:

Modify existing calls to newSV(sv) followed by an sv_upgrade(sv, type) to
just use newSV_type, so that they also benefit.
Replaces calls to newSV(0) with newSV_type(SVt_NULL)
Add a new inline function, newSV_type_mortal, to address the absence of
an efficient way to make a new non-SVt_NULL mortal SV.

With gcc version 10.2.1 on Debian Linux, the resulting perl binary was 25k
larger than blead. (The main commit accounts for almost all of this.)

I used the following trivial benchmark as a gauge of the performance
difference, finding the patched version to be about 30% faster:
perl -e '$str="A"x64; for (0 .. 1_000_000) { @svs = split //, $str }'

perf showed numbers in this region for blead:

          2,509.68 msec task-clock                #    1.000 CPUs utilized
                 5      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               205      page-faults               #    0.082 K/sec
    10,344,797,368      cycles                    #    4.122 GHz                      (62.29%)
        23,884,608      stalled-cycles-frontend   #    0.23% frontend cycles idle     (62.45%)
     2,800,367,628      stalled-cycles-backend    #   27.07% backend cycles idle      (62.61%)
    32,567,445,991      instructions              #    3.15  insn per cycle
                                                  #    0.09  stalled cycles per insn  (62.70%)
     7,666,288,647      branches                  # 3054.684 M/sec                    (62.70%)
        11,063,728      branch-misses             #    0.14% of all branches          (62.57%)
    13,941,078,593      L1-dcache-loads           # 5554.916 M/sec                    (62.41%)
       149,071,315      L1-dcache-load-misses     #    1.07% of all L1-dcache accesses  (62.25%)

with sv_upgrade taking 25% of the run time:

  27.82%  perlblead  perlblead           [.] Perl_sv_upgrade
  13.75%  perlblead  perlblead           [.] Perl_sv_clear
  11.20%  perlblead  libc-2.31.so        [.] _int_free
  10.08%  perlblead  libc-2.31.so        [.] malloc
   6.30%  perlblead  libc-2.31.so        [.] _int_malloc
   6.01%  perlblead  perlblead           [.] Perl_newSVpvn_flags
   5.49%  perlblead  perlblead           [.] Perl_sv_setpvn_fresh.part.0
   4.35%  perlblead  perlblead           [.] Perl_safesysmalloc
   4.17%  perlblead  perlblead           [.] Perl_av_clear
   3.28%  perlblead  perlblead           [.] Perl_sv_free2
   1.97%  perlblead  perlblead           [.] Perl_pp_split

perf showed numbers in this region for patched:

          1,780.95 msec task-clock                #    1.000 CPUs utilized
                 9      context-switches          #    0.005 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               205      page-faults               #    0.115 K/sec
     7,361,942,324      cycles                    #    4.134 GHz                      (83.22%)
        15,476,039      stalled-cycles-frontend   #    0.21% frontend cycles idle     (83.38%)
     1,205,408,051      stalled-cycles-backend    #   16.37% backend cycles idle      (83.38%)
    27,395,234,336      instructions              #    3.72  insn per cycle
                                                  #    0.04  stalled cycles per insn  (83.38%)
     6,510,317,247      branches                  # 3655.522 M/sec                    (83.38%)
        11,044,449      branch-misses             #    0.17% of all branches          (83.26%)

and other functions coming to the fore:

  16.61%  perl     perl                [.] Perl_sv_clear
  14.34%  perl     libc-2.31.so        [.] malloc
  13.69%  perl     libc-2.31.so        [.] _int_free
  10.74%  perl     perl                [.] Perl_newSVpvn_flags
   8.48%  perl     libc-2.31.so        [.] _int_malloc
   7.20%  perl     perl                [.] Perl_sv_setpvn_fresh.part.0
   7.02%  perl     perl                [.] Perl_safesysmalloc
   5.39%  perl     perl                [.] Perl_sv_free2
   5.35%  perl     perl                [.] Perl_av_clear
   3.54%  perl     libc-2.31.so        [.] cfree@GLIBC_2.2.5
   2.56%  perl     perl                [.] Perl_pp_split

Note: This might be the best-case benchmark, as it is pretty much all about SV creation
and destruction, with little overhead from op dispatch or other functions.

This commit has left the big chunk of body commentary in sv.c somewhat adrift of
e.g. the bodies_by_type table. I don't know how best to tidy that up. Hoping for some
feedback and suggestions!

richardleach · 2022-02-13T22:55:38Z

Note: xenu looked at preventing inlining when the desired type is not know
at compile time - see patch below - but said that the difference in perl binary
size was only 24 bytes.

> git diff
diff --git a/embed.h b/embed.h
index 94a51192ed..eeb9ca7e6a 100644
--- a/embed.h
+++ b/embed.h
@@ -385,7 +385,7 @@
 #define newSV(a)               Perl_newSV(aTHX_ a)
 #define newSVOP(a,b,c)         Perl_newSVOP(aTHX_ a,b,c)
 #define newSVREF(a)            Perl_newSVREF(aTHX_ a)
-#define newSV_type(a)          Perl_newSV_type(aTHX_ a)
+#define newSV_type(a)          (__builtin_constant_p(a) ? Perl_newSV_type(aTHX_ a) : Perl_newSV_type_noinline(aTHX_ a))
 #define newSVhek(a)            Perl_newSVhek(aTHX_ a)
 #define newSViv(a)             Perl_newSViv(aTHX_ a)
 #define newSVnv(a)             Perl_newSVnv(aTHX_ a)
diff --git a/proto.h b/proto.h
index 7c7dd528ba..19ff6a067d 100644
--- a/proto.h
+++ b/proto.h
@@ -7062,6 +7062,9 @@ PERL_CALLCONV int Perl_magic_regdatum_set(pTHX_ SV* sv, MAGIC* mg);
 #define PERL_ARGS_ASSERT_MAGIC_REGDATUM_SET    \
        assert(sv); assert(mg)
 #endif
+
+SV * Perl_newSV_type_noinline(pTHX_ const svtype type);
+
 #ifdef PERL_CORE
 #  include "pp_proto.h"
 #endif
diff --git a/sv.c b/sv.c
index 742fb9c332..f918cbbe33 100644
--- a/sv.c
+++ b/sv.c
@@ -17094,6 +17094,12 @@ Perl_report_uninit(pTHX_ const SV *uninit_sv)
     GCC_DIAG_RESTORE_STMT;
 }
 
+SV *
+Perl_newSV_type_noinline(pTHX_ const svtype type)
+{
+    return Perl_newSV_type(aTHX_ type);
+}
+
 /*
  * ex: set ts=8 sts=4 sw=4 et:
  */

demerphq · 2022-02-14T01:49:02Z

On Sun, 13 Feb 2022 at 23:54, Richard Leach ***@***.***> wrote: Note: This is an alternative to #19381 <#19381>, following feedback from @xenu <https://github.com/xenu>. When a new SV is created and upgraded to a type known at compile time, using the general-purpose upgrade function (sv_upgrade) is clunky. Specifically, while uprooting a SV head is lightweight (assuming there are unused SVs), sv_upgrade is too big to be inlined, contains many branches that can logically be resolved at compile time for known start & end types, and the lookup of the correct body_details struct may add CPU cycles. This PR: - Adds a new file - *sv_inline.h*, into which are moved many of the definitions and structures from *sv.c*. This seemed necessary because of the spread of type definitions across existing header files. - Converts newSV_type into an inline function and adds to it the logic from sv_upgrade necessary to upgrade a SVt_NULL. Building on that, the commits in this PR: - Modify existing calls to newSV(sv) followed by an sv_upgrade(sv, type) to just use newSV_type, so that they also benefit. - Replaces calls to newSV(0) with newSV_type(SVt_NULL) - Add a new inline function, newSV_type_mortal, to address the absence of an efficient way to make a new non-SVt_NULL mortal SV. With gcc version 10.2.1 on Debian Linux, the resulting perl binary was 25k larger than blead. (The main commit accounts for almost all of this.) I used the following trivial benchmark as a gauge of the performance difference, finding the patched version to be about 30% faster: perl -e '$str="A"x64; for (0 .. 1_000_000) { @svs = split //, $str }'

dumbbench concurs: sv_upgrade_fresh1:~/git_tree/perl$ dumbbench -- ./perl -Ilib -e '$str="A"x64; for (0 .. 1_000_000) { @svs = split //, $str }' cmd: Ran 21 iterations (1 outliers). cmd: Rounded run time per iteration: 3.0000e+00 +/- 5.6e-03 (0.2%) blead:~/git_tree/perl2$ dumbbench -- ./perl -Ilib -e '$str="A"x64; for (0 .. 1_000_000) { @svs = split //, $str }' cmd: Ran 24 iterations (4 outliers). cmd: Rounded run time per iteration: 4.0168e+00 +/- 2.9e-03 (0.1%) Nice. cheers, Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

sv_inline.h

xenu · 2022-03-04T19:19:25Z

Two small issues remaining:

Typo in the commit message: "unprooting".
Leftover commented-out code in sv.h (I think it should be uncommented?)

Otherwise LGTM.

When a new SV is created and upgraded to a type known at compile time, uprooting a SV head and then using the general-purpose upgrade function (sv_upgrade) is clunky. Specifically, while uprooting a SV head is lightweight (assuming there are unused SVs), sv_upgrade is too big to be inlined, contains many branches that can logically be resolved at compile time for known start & end types, and the lookup of the correct body_details struct may add CPU cycles. This commit tries to address that by making newSV_type an inline function and including only the parts of sv_upgrade needed to upgrade a SVt_NULL. When the destination type is known at compile time, a decent compiler will inline a call to newSV_type and use the type information to throw away all the irrelevant parts of the sv_upgrade logic. Because of the spread of type definitions across header files, it did not seem possible to make the necessary changed inside sv.h, and so a new header file (sv_inline.h) was created. For the inlined function to work outside of sv.c, many definitions from that file were moved to sv_inline.h. Finally, in order to also benefit from this change, existing code in sv.c that does things like this: SV* sv; new_SV(sv); sv_upgrade(sv, SVt_PV) has been modified to read something like: SV* sv = newSV_type(SVt_PV);

When a function outside of sv.c creates a SV via newSV(0): * There is a call to Perl_newSV * A SV head is uprooted and its flags set * A runtime check is made to effectively see if 0 > 0 * The new SV* is returned Replacing newSV(0) with newSV_type(SVt_NULL) should be more efficient, because (assuming there are SV heads to uproot), the only step is: * A SV head is uprooted and its flags set

There's no efficient way to create a mortal SV of any type other than SVt_NULL (via sv_newmortal). The options are either to do: * SV* sv = sv_newmortal; sv_upgrade(sv, SVt_SOMETYPE); but sv_upgrade is needlessly inefficient on new SVs. * SV* sv = sv_2mortal(newSV_type(SVt_SOMETYPE) but this will perform runtime checks to see if (sv) and if (SvIMMORTAL(sv), and for a new SV we know that those answers will always be yes and no. This commit adds a new inline function which is basically a mortalizing wrapper around the now-inlined newSV_type.

richardleach · 2022-03-07T00:05:51Z

Two small issues remaining:

Typo in the commit message: "unprooting".

Leftover commented-out code in sv.h (I think it should be uncommented?)

Otherwise LGTM.

Thanks, I've addressed those two points. (Odd how the waas-commented-out code didn't have an apparent effect, but I can always look at why separately.)

richardleach force-pushed the hydahy/inline-newSV_type branch from cd77597 to 6f86b0e Compare February 13, 2022 23:34

richardleach added the needs-work The pull request needs changes still label Feb 14, 2022

richardleach force-pushed the hydahy/inline-newSV_type branch from 6f86b0e to 562f242 Compare February 14, 2022 00:24

richardleach force-pushed the hydahy/inline-newSV_type branch from 1b996aa to 83babae Compare February 14, 2022 23:31

richardleach removed the needs-work The pull request needs changes still label Feb 15, 2022

richardleach requested review from nwc10, tonycoz and xenu February 15, 2022 03:12

xenu reviewed Feb 16, 2022

View reviewed changes

sv_inline.h Show resolved Hide resolved

sv_inline.h Outdated Show resolved Hide resolved

richardleach force-pushed the hydahy/inline-newSV_type branch from b33fd2d to 8a82591 Compare March 2, 2022 21:30

richardleach force-pushed the hydahy/inline-newSV_type branch from 8a82591 to d310d10 Compare March 6, 2022 00:40

richardleach added 3 commits March 6, 2022 22:37

richardleach force-pushed the hydahy/inline-newSV_type branch from d310d10 to f327c5a Compare March 6, 2022 22:39

xenu merged commit 7ea8b04 into Perl:blead Mar 7, 2022

richardleach deleted the hydahy/inline-newSV_type branch March 7, 2022 00:13

xenu mentioned this pull request Mar 7, 2022

Add Perl_sv_upgrade_fresh for more efficient upgrades of SVt_NULL SVs #19381

Closed

rjbs added a commit to rjbs/perl5 that referenced this pull request May 21, 2022

perldoc: add performance enhancement note about Perl#19414

610e100

scottchiefbaker pushed a commit to scottchiefbaker/perl5 that referenced this pull request Nov 3, 2022

perldoc: add performance enhancement note about Perl#19414

cccd2c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make newSV_type() an inline function #19414

Make newSV_type() an inline function #19414

richardleach commented Feb 13, 2022

richardleach commented Feb 13, 2022

demerphq commented Feb 14, 2022 via email

xenu commented Mar 4, 2022 •

edited

Loading

richardleach commented Mar 7, 2022

Make newSV_type() an inline function #19414

Make newSV_type() an inline function #19414

Conversation

richardleach commented Feb 13, 2022

richardleach commented Feb 13, 2022

demerphq commented Feb 14, 2022 via email

xenu commented Mar 4, 2022 • edited Loading

richardleach commented Mar 7, 2022

xenu commented Mar 4, 2022 •

edited

Loading