Skip to content

Conversation

@book
Copy link
Contributor

@book book commented Nov 11, 2022

No description provided.

@book book requested a review from leonerd November 11, 2022 11:01
@demerphq
Copy link
Collaborator

demerphq commented Nov 13, 2022

General comments, even though these are static functions they really should be registered in embed.fnc, and you should use the generated wrapper macros for accessing the sub, not explicit calls to the fully prefixed function. This will make peoples life easier if the subs need to be changed to be exported as code gets refactored and saves the need to add the annoying aTHX_ bumf at the start of the function calls. Also you should be providing pod for the new functions, and using the generated PERL_ARGS_ macros to validate that your arguments are correct. (For instance it will generate the appropriate code to validate that your arguments are non-null if specificed, etc).

Also you should use the standard symbols perl uses internally for specifying static, and in modern code we normally don't use 'int' where 'bool' will do.

So basically, the code looks ok, aside from some minor nits and the fact it is not using the proper build machinery for declaring subs.

@iabyn
Copy link
Contributor

iabyn commented Nov 13, 2022

General comments from me too (I haven't reviewed the correctness of the code). I see no point in splitting most of this into separate commits - for example, in commit N, adding a function S_foo(), then in commit N+1 making use of S_foo() in one place. Commits N and N+1 are semantically part of the same commit - neither is any use without the other. Splitting it just makes it harder to understand what's being changed.
Also, the new functions do not have any comments or pod describing what they are for. And in particular in the case of S_sv_has_magic_method(), it's badly named. Magic encompasses a lot more things than just overloading - e.g. tie magic has methods too (like FETCH). Perhaps it should be called S_sv_has_overload_method() or similar instead? And the docs should make clear what the method arg is.
I'm surprised that that there aren't already API functions to do this sort of thing?

Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall appears to be heading in the right direction but a few more changes needed yet.

Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 28, 2022

Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance.

It's much slower.

My test code:

./perl -e 'my @x = ("a" x 100) x 1000; for my $i (0 .. 10000) { my $z = join(",", @x); }'

Summary, minimums of three runs:

blead     #20503   Measure
148.00    234.29   ms "task-clock"
2.031     3.314    G instructions
472.4     843.1    M branches

Making S_concat_maybe_overloaded() inline and applying the suggested flags optimization in that function improved these to 189.09/2.794/663.1

Raw results from perl stat
#20503 (rebased on blead @99d24a03f4f65d155416e04274b943ae7245dc21:

            238.42 msec task-clock                #    0.999 CPUs utilized          
                 1      context-switches          #    0.004 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,101,882,222      cycles                    #    4.622 GHz                    
     3,314,272,902      instructions              #    3.01  insn per cycle         
       843,052,837      branches                  # 3536.025 M/sec                  
            75,232      branch-misses             #    0.01% of all branches        

       0.238678574 seconds time elapsed

       0.238690000 seconds user
       0.000000000 seconds sys

            239.34 msec task-clock                #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.001 M/sec                  
     1,110,324,651      cycles                    #    4.639 GHz                    
     3,314,323,674      instructions              #    2.99  insn per cycle         
       843,062,902      branches                  # 3522.382 M/sec                  
            92,861      branch-misses             #    0.01% of all branches        

       0.239640817 seconds time elapsed

       0.235731000 seconds user
       0.003995000 seconds sys

            234.29 msec task-clock                #    0.999 CPUs utilized          
                 2      context-switches          #    0.009 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,098,648,396      cycles                    #    4.689 GHz                    
     3,314,298,427      instructions              #    3.02  insn per cycle         
       843,058,695      branches                  # 3598.403 M/sec                  
            90,207      branch-misses             #    0.01% of all branches        

       0.234566521 seconds time elapsed

       0.234606000 seconds user
       0.000000000 seconds sys

20503 (-g -O2): (-O2 -g was slower)

            237.04 msec task-clock                #    0.996 CPUs utilized          
                 1      context-switches          #    0.004 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,095,326,580      cycles                    #    4.621 GHz                    
     3,314,319,311      instructions              #    3.03  insn per cycle         
       843,061,149      branches                  # 3556.615 M/sec                  
            90,923      branch-misses             #    0.01% of all branches        

       0.238089997 seconds time elapsed

       0.238116000 seconds user
       0.000000000 seconds sys


blead@99d24a03f4f65d155416e04274b943ae7245dc21:

            153.64 msec task-clock                #    0.998 CPUs utilized          
                 1      context-switches          #    0.007 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.002 M/sec                  
       694,223,108      cycles                    #    4.518 GHz                    
     2,031,365,472      instructions              #    2.93  insn per cycle         
       472,374,561      branches                  # 3074.462 M/sec                  
            93,170      branch-misses             #    0.02% of all branches        

       0.153877924 seconds time elapsed

       0.153928000 seconds user
       0.000000000 seconds sys

            156.10 msec task-clock                #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.002 M/sec                  
       729,871,803      cycles                    #    4.676 GHz                    
     2,031,331,946      instructions              #    2.78  insn per cycle         
       472,367,612      branches                  # 3026.138 M/sec                  
            93,624      branch-misses             #    0.02% of all branches        

       0.156334566 seconds time elapsed

       0.156392000 seconds user
       0.000000000 seconds sys

            148.00 msec task-clock                #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.002 M/sec                  
       692,560,476      cycles                    #    4.680 GHz                    
     2,031,339,986      instructions              #    2.93  insn per cycle         
       472,369,311      branches                  # 3191.737 M/sec                  
           126,137      branch-misses             #    0.03% of all branches        

       0.148277307 seconds time elapsed

       0.148306000 seconds user
       0.000000000 seconds sys

With inline and flags optimization:
            189.09 msec task-clock                #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.002 M/sec                  
       880,888,042      cycles                    #    4.659 GHz                    
     2,794,500,090      instructions              #    3.17  insn per cycle         
       663,114,388      branches                  # 3506.911 M/sec                  
            80,173      branch-misses             #    0.01% of all branches        

       0.189343391 seconds time elapsed

       0.189393000 seconds user
       0.000000000 seconds sys

@demerphq
Copy link
Collaborator

demerphq commented Nov 28, 2022 via email

@demerphq
Copy link
Collaborator

demerphq commented Nov 28, 2022 via email

@ericherman
Copy link
Contributor

Thank you @tonycoz for suggesting perf and an example exercise script; that helped us take a more informed approach.

@demerphq , @leonerd - I believe we've taken on-board the observations and suggestions; we'd be appreciative if you could take another look.

ericherman referenced this pull request Dec 20, 2022
We want amagic_find() to simulate amagic_call(), to do that we need
to support fallback. This copies the fallback logic from amagic_call()
into amagic_find(), without the extra stuff required to actually execute
the call.

See GH Issue #20627.
@book book marked this pull request as ready for review January 13, 2023 11:00
@book book requested a review from leonerd January 13, 2023 11:01
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to write a perldelta.pod entry

@book book requested a review from demerphq January 13, 2023 14:18
Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Will you write a perldelta.pod entry in this commit, or shortly after? Either seems fine.

doop.c Outdated
PERL_ARGS_ASSERT_DO_JOIN;
/* stringify once and use that unless the delim has concat_amg */
if (!delim_has_concat) {
delim = newSVpvn_utf8(delims, delimlen, SvUTF8(delim));
Copy link
Collaborator

@demerphq demerphq Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want

delim = newSVpvn_flags(delims, delimlen, SvUTF8(delim) ? SVf_UTF8 | SVs_TEMP : SVs_TEMP);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that @leonerd's advice was incomplete or perhaps a little overly focused on the point about utf8. newSVpvn_utf8() is a macro around newSVpvn_flags() that prevents you from passing in the SVs_TEMP flag which would make your code more optimal. You want to create a new statement scoped mortal temporary. The code I showed is the optimal way. You shouldn't be creating a block scoped temporary variable, you should be creating a statement scoped temporary variable. With your current code this:

{
   $str = join (",", $str, $_) for 1..1_000_000;
}

will create 1_000_000 temporary values that will not be freed until block exit. With the code I showed, each iteration of join should free up the variables right after the statement completes.

Note

#define newSVpvn_utf8(s, len, u) newSVpvn_flags((s), (len), (u) ? SVf_UTF8 : 0)

If you read the docs for newSVpvn_flags the SVs_TEMP flag results in the low level machinery setting up the returned value as a mortal, which avoids an inefficient call to sv_2mortal() (which you should be using instead of SAVEFREEPV() anyway). You also need to pass in the UTF8 flag to ensure the semantics are correct, so call newSVpvn_flags() directly.

Note the docs for SAVEFREEPV:

=item C<SAVEFREESV(SV *sv)>

The refcount of C<sv> will be decremented at the end of
I<pseudo-block>.  This is similar to C<sv_2mortal> in that it is also a
mechanism for doing a delayed C<SvREFCNT_dec>.  However, while C<sv_2mortal>
extends the lifetime of C<sv> until the beginning of the next statement,
C<SAVEFREESV> extends it until the end of the enclosing scope.  These
lifetimes can be wildly different.

Also compare C<SAVEMORTALIZESV>.

Note the comparison with sv_2mortal(). SAVEFREESV is one of those things that you should only use if you are doing something with exceptional scoping requirements from the normal case. Like declaring a new variable that needs to survive the full scope of the block. Temporary SV's on the other hand are pretty much universally set up with sv_2mortal() or even better through a newSV.._flags() call with the SVs_TEMP flag.

Also be aware, i typoed originally, and misspelled SVs_TEMP as SVf_TEMP, it is not an SV flag, so it has a different prefix. Sorry about that.

BTW, these stats should give you some insight on why SAVEFREESV is a red flag:

$ git grep SAVEFREESV | wc -l
75
$ git grep SVs_TEMP | wc -l
230
$ git grep sv_2mortal | wc -l
765

We mortalize about 10 times for every SAVEFREESV.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, its maybe worth knowing that a lot of the SV related functions are wrappers passing in specific flags to lower level _flags() based functions. You will find that a lot of our API functions can be decomposed. newSV is the prefix for stuff that creates a new sv. The suffix tells you about the arguments. So another example might sv_cat which has a similar set of suffixes and wrappers to newSV. So newSVpvf results in a newSV from a format string, and sv_catpvf() concatenates a string created by a format. In a lot, maybe a majority of these cases, the functions are wrappers around a core function with a _flags suffix. It is often useful when optimizing (and learning) to unroll the chain and call directly to the core function with the desired flags and arguments. Once you become familiar with the patterns you can often guess at the names of functions and be correct, they are relatively consistent.

doop.c Outdated
/* stringify once and use that unless the delim has concat_amg */
if (!delim_has_concat) {
delim = newSVpvn_utf8(delims, delimlen, SvUTF8(delim));
SAVEFREESV(delim);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you are using this? I would expect this to use sv_2mortal(), and if it should then if the creation uses newSVpvn_flags() like I suggest above, you don't need to anything at all. If there is a good reason to use this you should comment why. I suspect you really wanted sv_2mortal().

*/

PERL_STATIC_INLINE SV *
S_do_join_inner(pTHX_ SV *lhs, SV *rhs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding a flag parameter, lhs_has_mg, which is preinitialized before calling do_join_inner() with:

lhs_has_mg = amagic_applies(lhs, concat_amg, applies_flags);

* because those must be set if an SV has any overload. */
U32 flags = SvFLAGS(lhs)|SvFLAGS(rhs);
if (UNLIKELY((flags & (SVf_ROK|SVs_GMG))) &&
UNLIKELY(amagic_applies(lhs, concat_amg, applies_flags)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then replace this call to amagic_applies() with lhs_has_mg.

This will halve the number of times this code calls amagic_applies() for a given string.

s = SvPV_const(*mark,len);
sv_catpvn_flags(sv,s,len,
DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
sv = do_join_inner(sv, delim);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that here...

sv_catpvn_flags(sv,s,len,
DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
sv = do_join_inner(sv, delim);
sv = do_join_inner(sv, *mark);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... here ...

Copy link
Collaborator

@demerphq demerphq Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would also be interesting to see if the compiler did anything different with

sv = do_join_inner(do_join_inner(sv, delim),*mark);

or

sv = do_join_inner(do_join_inner(sv, delim,lhs_has_mg),*mark,lhs_has_mg);

sv_catpvn_flags(sv,s,len,
DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
for (; items > 0; items--,mark++) {
sv = do_join_inner(sv, *mark);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here we call do_join_inner() with the same SV as the lhs. So we can avoid the need to check it for magic each time we call by passing that state in as a flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the reason we assign the sv each time is that the return value may not be the same sv, especially in the case where the first element didn't have overload concat magic, but the second element does, after which the lhs gets the concat magic? Checking once would break that, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, magic isnt viral like that. See comment below.

@ericherman
Copy link
Contributor

Will you write a perldelta.pod entry in this commit, or shortly after? Either seems fine.

Done, as part of this commit.

@demerphq
Copy link
Collaborator

demerphq commented Jan 14, 2023 via email

@book
Copy link
Contributor Author

book commented Jan 16, 2023

The lhs doesn't become magic by concatenation of something that is. That would be very weird. Strings don't turn into objects by concatenation, nor do they somehow become tied without calling the tie operator. :-) magic isn't viral like that.

I might be misunderstanding your point. The lhs in the S_do_join_inner function will be the result of the overall join(). It starts as the empty string, and we keep concatenating to it from left to right.

In RFC0013, we want join( $delim, @list ) to behave exactly like reduce { ( $a . $delim ) . $b } @list. If one of the arguments to join() is an object with a . overload that returns another object with an overloaded ., then the overload will propagate, and the result of join() will be an overloaded object.

This is the behavior we've put in the BoldStars class in our test script (using . on a BoldStars object produces a BoldStars object). If one of the arguments to join() is one such object, then it will "assimilate" the other arguments with each concatenation, and the result of join() will be an overloaded object.

@demerphq
Copy link
Collaborator

demerphq commented Jan 16, 2023 via email

@leonerd
Copy link
Contributor

leonerd commented Jan 25, 2023

If it would be considered too surprising for join to suddenly be implemented in terms of an overloaded concat operator, what about adding a new overload key whereby classes can opt-in to such behaviour.

class My::Stringlike::Class;
use overload
  '.' => "concat",
  join_uses_concat => 1;

and then pp_join would only follow the concat overloading on any object that additionally sets that key, indicating it is opting in to such a mechanism.

@demerphq
Copy link
Collaborator

and then pp_join would only follow the concat overloading on any object that additionally sets that key, indicating it is opting in to such a mechanism.

That feels backwards, or maybe I dont understand the suggestion properly. If I call join with a list of items, and only one of them has this this new type of overloading what would happen?

FWIW: I have no problem with this RFC if it is lexically scoped behavior.

{
   use feature "join_with_concat";
   my $joined = join $sep, @things;
}

But it has fairly deep implications some of which IMO are negative in terms of performance, and definitely would change which existing overload methods are used, so I think it needs to be guarded. I can totally understand that two reasonable people might see what join does in different ways, and i think we can offer users the ability to choose which they get, but I do think it has to be opt in.

@leonerd
Copy link
Contributor

leonerd commented Feb 18, 2023

That feels backwards, or maybe I dont understand the suggestion properly. If I call join with a list of items, and only one of them has this this new type of overloading what would happen?

Per instance. The pp_join function has to individually query every SV to ask "hey, do you have concat magic? if so I'll use it". The suggestion is that question becomes "Hey, do you have concat magic and is join permitted to use it?". That way, any class that doesn't specify the flag will continue in its current behaviour and not cause any surprises, meanwhile any class that does want to be join-aware can set the flag and all works lovely.

Perhaps I need to remind people of my use-case here. My use-case is that I have an entire ecosystem of object classes ("String::Tagged") that are objects that behave like strings. Or at least, they try to. But currently they can't do a very good job of it because many core operators (such as join() and substr()) don't permit overloading, so I lost my "being a special object" going via those operators. It means I have to write an entire second set of object methods - like the String::Tagged->join($sep, @values) constructor, or the $str->substr(...) accessor, which code should call instead. But then that means that only code that knows about them actually works. It means that I can't pass these strings into the huuuuuge amount of existing code already on CPAN and have it work. One random example module would be Text::Wrap. I tried years ago to use that with String::Tagged and was very disappointed at how badly we handle that kind of thing. It seems I'm not the only person - https://rt.cpan.org/Ticket/Display.html?id=145864

My eventual goal here is to end up in a situation where an object class such as String::Tagged gets as fully supported by all the core ops (and any CPAN module that uses them) as a number class like Math::BigRat. Observe that nobody has to call $x->add($y) to make those behave - you can just do $x + $y. I'd like our stringy ops to do the same.

@demerphq
Copy link
Collaborator

Per instance. The pp_join function has to individually query every SV to ask "hey, do you have concat magic? if so I'll use it". The suggestion is that question becomes "Hey, do you have concat magic and is join permitted to use it?". That way, any class that doesn't specify the flag will continue in its current behaviour and not cause any surprises, meanwhile any class that does want to be join-aware can set the flag and all works lovely.

So in the following code what would happen?

my $s1 = Object::With::Normal::Concat::Overload->new(..);
my $s2 = Object::With::Join::Concat::Overload->new(...);

my $joined = join("", qw(a b c d e), $s1, qw( p q r s t u ), $s2, qw( w x y z));

Would $s2 being in the arg list cause $s1 to also use concat overloads? Would it be acceptable if the internals turned this into:

my $joined = join("", join("",qw(a b c d e), "$s1", qw( p q r s t u )), $s2, join("",qw( w x y z)));

for example?

Id feel more comfortable reasoning about this if we weren't discussing the "concat" overload. Lets say we had a "join" overload.

Would it be acceptable that a join with K items in it, where only 1 of which had the join overload, that it might call that overload at most twice and perhaps only once (if it was at the head or tail of the list), regardless of what K was?

If that was acceptable then I would be a lot more comfortable about this. My concern is that adding this has a deep impact on the expected order of the joining, if we eliminate that, then most of my concern gets resolved.

@demerphq
Copy link
Collaborator

@book: I will resolve the embed.fnc conflict in this PR for you unless you leave a note requesting that I do not. Sorry for the inconvenience.

@demerphq
Copy link
Collaborator

Rebased as promised.

book and others added 2 commits February 21, 2023 05:00
Add static helper functions:

* S_do_join_inner(lhs, rhs)
  concat function which may call overload if either
  the left or right sv has concat overload magic.

Test for overload support in join:

* t/lib/overload_join.t

Co-authored-by: Eric Herman <eric@freesa.org>
Add static helper functions:

* S_do_join_inner(lhs, rhs)
  concat function which may call overload if either
  the left or right sv has concat overload magic.

Test for overload support in join:

* t/lib/overload_join.t

Co-authored-by: Eric Herman <eric@freesa.org>
@jkeenan
Copy link
Contributor

jkeenan commented Aug 27, 2024

There was extensive discussion in this p.r., but it petered out a year-and-a-half ago. Consequently, the code has acquired merge conflicts. @book et al., can we get an update on the status of this ticket? Thanks.

@book book closed this by deleting the head repository Mar 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants