Part 1 of implementing RFC0013 (join) #20503

book · 2022-11-11T11:00:44Z

No description provided.

doop.c

demerphq · 2022-11-13T09:52:53Z

General comments, even though these are static functions they really should be registered in embed.fnc, and you should use the generated wrapper macros for accessing the sub, not explicit calls to the fully prefixed function. This will make peoples life easier if the subs need to be changed to be exported as code gets refactored and saves the need to add the annoying aTHX_ bumf at the start of the function calls. Also you should be providing pod for the new functions, and using the generated PERL_ARGS_ macros to validate that your arguments are correct. (For instance it will generate the appropriate code to validate that your arguments are non-null if specificed, etc).

Also you should use the standard symbols perl uses internally for specifying static, and in modern code we normally don't use 'int' where 'bool' will do.

So basically, the code looks ok, aside from some minor nits and the fact it is not using the proper build machinery for declaring subs.

iabyn · 2022-11-13T14:46:30Z

General comments from me too (I haven't reviewed the correctness of the code). I see no point in splitting most of this into separate commits - for example, in commit N, adding a function S_foo(), then in commit N+1 making use of S_foo() in one place. Commits N and N+1 are semantically part of the same commit - neither is any use without the other. Splitting it just makes it harder to understand what's being changed.
Also, the new functions do not have any comments or pod describing what they are for. And in particular in the case of S_sv_has_magic_method(), it's badly named. Magic encompasses a lot more things than just overloading - e.g. tie magic has methods too (like FETCH). Perhaps it should be called S_sv_has_overload_method() or similar instead? And the docs should make clear what the method arg is.
I'm surprised that that there aren't already API functions to do this sort of thing?

doop.c

leonerd

Overall appears to be heading in the right direction but a few more changes needed yet.

Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance.

t/lib/overload_join.t

doop.c

tonycoz · 2022-11-28T00:48:43Z

Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance.

It's much slower.

My test code:

./perl -e 'my @x = ("a" x 100) x 1000; for my $i (0 .. 10000) { my $z = join(",", @x); }'

Summary, minimums of three runs:

blead     #20503   Measure
148.00    234.29   ms "task-clock"
2.031     3.314    G instructions
472.4     843.1    M branches

Making S_concat_maybe_overloaded() inline and applying the suggested flags optimization in that function improved these to 189.09/2.794/663.1

Raw results from perl stat

#20503 (rebased on blead @99d24a03f4f65d155416e04274b943ae7245dc21:

            238.42 msec task-clock                #    0.999 CPUs utilized          
                 1      context-switches          #    0.004 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,101,882,222      cycles                    #    4.622 GHz                    
     3,314,272,902      instructions              #    3.01  insn per cycle         
       843,052,837      branches                  # 3536.025 M/sec                  
            75,232      branch-misses             #    0.01% of all branches        

       0.238678574 seconds time elapsed

       0.238690000 seconds user
       0.000000000 seconds sys

            239.34 msec task-clock                #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.001 M/sec                  
     1,110,324,651      cycles                    #    4.639 GHz                    
     3,314,323,674      instructions              #    2.99  insn per cycle         
       843,062,902      branches                  # 3522.382 M/sec                  
            92,861      branch-misses             #    0.01% of all branches        

       0.239640817 seconds time elapsed

       0.235731000 seconds user
       0.003995000 seconds sys

            234.29 msec task-clock                #    0.999 CPUs utilized          
                 2      context-switches          #    0.009 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,098,648,396      cycles                    #    4.689 GHz                    
     3,314,298,427      instructions              #    3.02  insn per cycle         
       843,058,695      branches                  # 3598.403 M/sec                  
            90,207      branch-misses             #    0.01% of all branches        

       0.234566521 seconds time elapsed

       0.234606000 seconds user
       0.000000000 seconds sys

20503 (-g -O2): (-O2 -g was slower)

            237.04 msec task-clock                #    0.996 CPUs utilized          
                 1      context-switches          #    0.004 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.001 M/sec                  
     1,095,326,580      cycles                    #    4.621 GHz                    
     3,314,319,311      instructions              #    3.03  insn per cycle         
       843,061,149      branches                  # 3556.615 M/sec                  
            90,923      branch-misses             #    0.01% of all branches        

       0.238089997 seconds time elapsed

       0.238116000 seconds user
       0.000000000 seconds sys


blead@99d24a03f4f65d155416e04274b943ae7245dc21:

            153.64 msec task-clock                #    0.998 CPUs utilized          
                 1      context-switches          #    0.007 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.002 M/sec                  
       694,223,108      cycles                    #    4.518 GHz                    
     2,031,365,472      instructions              #    2.93  insn per cycle         
       472,374,561      branches                  # 3074.462 M/sec                  
            93,170      branch-misses             #    0.02% of all branches        

       0.153877924 seconds time elapsed

       0.153928000 seconds user
       0.000000000 seconds sys

            156.10 msec task-clock                #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.002 M/sec                  
       729,871,803      cycles                    #    4.676 GHz                    
     2,031,331,946      instructions              #    2.78  insn per cycle         
       472,367,612      branches                  # 3026.138 M/sec                  
            93,624      branch-misses             #    0.02% of all branches        

       0.156334566 seconds time elapsed

       0.156392000 seconds user
       0.000000000 seconds sys

            148.00 msec task-clock                #    0.998 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               291      page-faults               #    0.002 M/sec                  
       692,560,476      cycles                    #    4.680 GHz                    
     2,031,339,986      instructions              #    2.93  insn per cycle         
       472,369,311      branches                  # 3191.737 M/sec                  
           126,137      branch-misses             #    0.03% of all branches        

       0.148277307 seconds time elapsed

       0.148306000 seconds user
       0.000000000 seconds sys

With inline and flags optimization:
            189.09 msec task-clock                #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               294      page-faults               #    0.002 M/sec                  
       880,888,042      cycles                    #    4.659 GHz                    
     2,794,500,090      instructions              #    3.17  insn per cycle         
       663,114,388      branches                  # 3506.911 M/sec                  
            80,173      branch-misses             #    0.01% of all branches        

       0.189343391 seconds time elapsed

       0.189393000 seconds user
       0.000000000 seconds sys

demerphq · 2022-11-28T09:01:42Z

On Mon, 28 Nov 2022 at 01:48, Tony Cook ***@***.***> wrote: Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance. It's much slower.

FWIW, join is not very efficient. We did a bunch of analysis on this at Booking at one point when I was there, and we found that .= is more performant than join is. Code generating code that does something like (((((($x.=$ar[0]).=$sep).=$ar[1]).=$sep).=$ar[2]) etc, runs faster than calling join on the same array. One would assume it would be the other way around, but it isnt. When we looked closely at the code for join we were a bit surprised at what we found. It didnt look like what you would expect. Eg, id expect that the code would loop over the array, compute how big the chunks are, then allocate a buffer big enough for all of them, and then loop again, and copy them one by one into the result. Join doesn't work like that. :-( It's funny, actually, how many places that perls hyper dynamic nature makes life ultra painful. Consider what happens if you tie the separator value for join? You cant compute its size once, and then join using that string, you need to call it every element, and deal with the fact that it can change size every element. It is as though perl were designed to be slow. (Sadly.) Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq · 2022-11-28T09:24:53Z

On Mon, 28 Nov 2022 at 10:01, demerphq ***@***.***> wrote: On Mon, 28 Nov 2022 at 01:48, Tony Cook ***@***.***> wrote: > > Also it would be useful to use the perf infrastructure to get some before/after comparison benchmarks on a variety of join test cases, to get a feel for what kind of impact this will have on performance. > > It's much slower. FWIW, join is not very efficient. We did a bunch of analysis on this at Booking at one point when I was there, and we found that .= is more performant than join is. Code generating code that does something like (((((($x.=$ar[0]).=$sep).=$ar[1]).=$sep).=$ar[2]) etc, runs faster than calling join on the same array. One would assume it would be the other way around, but it isnt. When we looked closely at the code for join we were a bit surprised at what we found. It didnt look like what you would expect. Eg, id expect that the code would loop over the array, compute how big the chunks are, then allocate a buffer big enough for all of them, and then loop again, and copy them one by one into the result. Join doesn't work like that. :-( It's funny, actually, how many places that perls hyper dynamic nature makes life ultra painful. Consider what happens if you tie the separator value for join? You cant compute its size once, and then join using that string, you need to call it every element, and deal with the fact that it can change size every element. It is as though perl were designed to be slow. (Sadly.)

Based on the benchmarks for a recent perl, whatever it was join was doing wrong when we did these benchmarks back in the day doesnt seem to be an issue anymore. Sorry for the noise. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

ericherman · 2022-12-16T14:45:23Z

Thank you @tonycoz for suggesting perf and an example exercise script; that helped us take a more informed approach.

@demerphq , @leonerd - I believe we've taken on-board the observations and suggestions; we'd be appreciative if you could take another look.

We want amagic_find() to simulate amagic_call(), to do that we need to support fallback. This copies the fallback logic from amagic_call() into amagic_find(), without the extra stuff required to actually execute the call. See GH Issue #20627.

leonerd

Don't forget to write a perldelta.pod entry

doop.c

leonerd

LGTM

Will you write a perldelta.pod entry in this commit, or shortly after? Either seems fine.

demerphq · 2023-01-13T18:53:36Z

doop.c

-    PERL_ARGS_ASSERT_DO_JOIN;
+    /* stringify once and use that unless the delim has concat_amg */
+    if (!delim_has_concat) {
+        delim = newSVpvn_utf8(delims, delimlen, SvUTF8(delim));


I think you want

delim = newSVpvn_flags(delims, delimlen, SvUTF8(delim) ? SVf_UTF8 | SVs_TEMP : SVs_TEMP);

see: #20503 (comment)

I'd say that @leonerd's advice was incomplete or perhaps a little overly focused on the point about utf8. newSVpvn_utf8() is a macro around newSVpvn_flags() that prevents you from passing in the SVs_TEMP flag which would make your code more optimal. You want to create a new statement scoped mortal temporary. The code I showed is the optimal way. You shouldn't be creating a block scoped temporary variable, you should be creating a statement scoped temporary variable. With your current code this:

{ $str = join (",", $str, $_) for 1..1_000_000; }

will create 1_000_000 temporary values that will not be freed until block exit. With the code I showed, each iteration of join should free up the variables right after the statement completes.

Note

#define newSVpvn_utf8(s, len, u) newSVpvn_flags((s), (len), (u) ? SVf_UTF8 : 0)

If you read the docs for newSVpvn_flags the SVs_TEMP flag results in the low level machinery setting up the returned value as a mortal, which avoids an inefficient call to sv_2mortal() (which you should be using instead of SAVEFREEPV() anyway). You also need to pass in the UTF8 flag to ensure the semantics are correct, so call newSVpvn_flags() directly.

Note the docs for SAVEFREEPV:

=item C<SAVEFREESV(SV *sv)> The refcount of C<sv> will be decremented at the end of I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> extends the lifetime of C<sv> until the beginning of the next statement, C<SAVEFREESV> extends it until the end of the enclosing scope. These lifetimes can be wildly different. Also compare C<SAVEMORTALIZESV>.

Note the comparison with sv_2mortal(). SAVEFREESV is one of those things that you should only use if you are doing something with exceptional scoping requirements from the normal case. Like declaring a new variable that needs to survive the full scope of the block. Temporary SV's on the other hand are pretty much universally set up with sv_2mortal() or even better through a newSV.._flags() call with the SVs_TEMP flag.

Also be aware, i typoed originally, and misspelled SVs_TEMP as SVf_TEMP, it is not an SV flag, so it has a different prefix. Sorry about that.

BTW, these stats should give you some insight on why SAVEFREESV is a red flag:

$ git grep SAVEFREESV | wc -l 75 $ git grep SVs_TEMP | wc -l 230 $ git grep sv_2mortal | wc -l 765

We mortalize about 10 times for every SAVEFREESV.

BTW, its maybe worth knowing that a lot of the SV related functions are wrappers passing in specific flags to lower level _flags() based functions. You will find that a lot of our API functions can be decomposed. newSV is the prefix for stuff that creates a new sv. The suffix tells you about the arguments. So another example might sv_cat which has a similar set of suffixes and wrappers to newSV. So newSVpvf results in a newSV from a format string, and sv_catpvf() concatenates a string created by a format. In a lot, maybe a majority of these cases, the functions are wrappers around a core function with a _flags suffix. It is often useful when optimizing (and learning) to unroll the chain and call directly to the core function with the desired flags and arguments. Once you become familiar with the patterns you can often guess at the names of functions and be correct, they are relatively consistent.

demerphq · 2023-01-13T18:58:02Z

doop.c

+    /* stringify once and use that unless the delim has concat_amg */
+    if (!delim_has_concat) {
+        delim = newSVpvn_utf8(delims, delimlen, SvUTF8(delim));
+        SAVEFREESV(delim);


Is there a reason you are using this? I would expect this to use sv_2mortal(), and if it should then if the creation uses newSVpvn_flags() like I suggest above, you don't need to anything at all. If there is a good reason to use this you should comment why. I suspect you really wanted sv_2mortal().

demerphq · 2023-01-13T19:03:21Z

doop.c

 */

+PERL_STATIC_INLINE SV *
+S_do_join_inner(pTHX_ SV *lhs, SV *rhs)


I recommend adding a flag parameter, lhs_has_mg, which is preinitialized before calling do_join_inner() with:

lhs_has_mg = amagic_applies(lhs, concat_amg, applies_flags);

demerphq · 2023-01-13T19:04:29Z

doop.c

+     * because those must be set if an SV has any overload. */
+    U32 flags = SvFLAGS(lhs)|SvFLAGS(rhs);
+    if (UNLIKELY((flags & (SVf_ROK|SVs_GMG))) &&
+        UNLIKELY(amagic_applies(lhs, concat_amg, applies_flags)


and then replace this call to amagic_applies() with lhs_has_mg.

This will halve the number of times this code calls amagic_applies() for a given string.

demerphq · 2023-01-13T19:05:13Z

doop.c

-            s = SvPV_const(*mark,len);
-            sv_catpvn_flags(sv,s,len,
-                            DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
+            sv = do_join_inner(sv, delim);


Notice that here...

demerphq · 2023-01-13T19:05:25Z

doop.c

-            sv_catpvn_flags(sv,s,len,
-                            DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
+            sv = do_join_inner(sv, delim);
+            sv = do_join_inner(sv, *mark);


... here ...

it would also be interesting to see if the compiler did anything different with

sv = do_join_inner(do_join_inner(sv, delim),*mark);

or

sv = do_join_inner(do_join_inner(sv, delim,lhs_has_mg),*mark,lhs_has_mg);

demerphq · 2023-01-13T19:06:15Z

doop.c

-            sv_catpvn_flags(sv,s,len,
-                            DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES);
+        for (; items > 0; items--,mark++) {
+            sv = do_join_inner(sv, *mark);


and here we call do_join_inner() with the same SV as the lhs. So we can avoid the need to check it for magic each time we call by passing that state in as a flag.

Isn't the reason we assign the sv each time is that the return value may not be the same sv, especially in the case where the first element didn't have overload concat magic, but the second element does, after which the lhs gets the concat magic? Checking once would break that, right?

No, magic isnt viral like that. See comment below.

ericherman · 2023-01-14T08:39:50Z

Will you write a perldelta.pod entry in this commit, or shortly after? Either seems fine.

Done, as part of this commit.

demerphq · 2023-01-14T09:10:46Z

On Sat, 14 Jan 2023, 09:48 Eric Herman, ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doop.c <#20503 (comment)>: > } } else { - for (; items > 0; items--,mark++) - { - STRLEN len; - const char *s = SvPV_const(*mark,len); - sv_catpvn_flags(sv,s,len, - DO_UTF8(*mark) ? SV_CATUTF8 : SV_CATBYTES); + for (; items > 0; items--,mark++) { + sv = do_join_inner(sv, *mark); Isn't the reason we assign the sv each time is that the return value may not be the same sv, especially in the case where the first element didn't have overload concat magic, but the second element does, after which the lhs gets the concat magic? Checking once would break that, right?

The lhs doesn't become magic by concatenation of something that is. That would be very weird. Strings don't turn into objects by concatenation, nor do they somehow become tied without calling the tie operator. :-) magic isn't viral like that.* So no, checking once should purely improve performance without altering behaviour. Yves * ruud has proposed a new magic that might work like that, but it is no more than an idea, and I wouldn't worry about it for this patch, assuming his idea does come to fruition it will be a long time and a lot of patches before it gets merged and for the mean time you should forget about it. Especially as many are doubtful about the idea.

book · 2023-01-16T07:59:09Z

The lhs doesn't become magic by concatenation of something that is. That would be very weird. Strings don't turn into objects by concatenation, nor do they somehow become tied without calling the tie operator. :-) magic isn't viral like that.

I might be misunderstanding your point. The lhs in the S_do_join_inner function will be the result of the overall join(). It starts as the empty string, and we keep concatenating to it from left to right.

In RFC0013, we want join( $delim, @list ) to behave exactly like reduce { ( $a . $delim ) . $b } @list. If one of the arguments to join() is an object with a . overload that returns another object with an overloaded ., then the overload will propagate, and the result of join() will be an overloaded object.

This is the behavior we've put in the BoldStars class in our test script (using . on a BoldStars object produces a BoldStars object). If one of the arguments to join() is one such object, then it will "assimilate" the other arguments with each concatenation, and the result of join() will be an overloaded object.

demerphq · 2023-01-16T12:17:27Z

On Mon, 16 Jan 2023 at 08:59, Philippe Bruhat (BooK) < ***@***.***> wrote: The lhs doesn't become magic by concatenation of something that is. That would be very weird. Strings don't turn into objects by concatenation, nor do they somehow become tied without calling the tie operator. :-) magic isn't viral like that. I might be misunderstanding your point. The lhs in the S_do_join_inner function will be the result of the overall join(). It starts as the empty string, and we keep concatenating to it from left to right. In RFC0013, we want join( $delim, @list ) to behave exactly like reduce { ( $a . $delim ) . $b } @list. If one of the arguments to join() is an object with a . overload that returns another object with an overloaded ., then the overload will propagate, and the result of join() will be an overloaded object.

Hmm. I sit corrected, usually in core code when you ask for a PV (a string) you get it and copy it into something else. It never occurred to me the sv_catsv(a,b) might potentially change a. But obviously the logic you describe at the perl level is sound. But my point still stands really, the number of times we check the lhs should be at most the number of elements with magic on the right hand side. We shouldn't keep checking the lhs over and over. If the rhs has magic, then we do the concatenation, and then we check if the result has magic, etc. I mean, consider, this code could scan the entire array, check if each element has the required magic, and then if it doesn't do something optimal to build the result. And it would only check magic on each element once. So we know we can solve this problem without checking twice as many times as we need to. Also, I checked, currently the delimiter appears to be stringified once, on its first use, not each time. So changing this to call concatenate instead means that potentially the stringify would not be called. I have to admit I find the idea that join might return an object kinda concerning. You could just as easily argue that join should be "stringify each element on the stack in turn placing delimeters between each element," (which is actually what happens now). As opposed to concatenation at the operator overload level. Now that I understand this more I have to say I kinda question this RFC. Currently this: join("",$thing) and "$thing" are equivalent. This RFC would change that equivalence. That feels like a super big change. Should this maybe be feature flagged? I feel like there is a conceptual clash here between overloaded "concatenation" and "stringification". Especially in implicit contexts. For instance I checked and "$x$y$p$q" is processed as $t = ((($x.$y).$p).$q); "$t". But you could argue that is an implementation detail and if we choose we should be able to evaluate it as $t=(($x.$y).($p.$q)); "$t" You could also argue that since "$x" is the stringification operator, that it should actually be parsed as (("$x" . "$y").("$p"."$q")) or whatever order of operations we choose. We shouldn't even guarantee the order in which we choose to stringify the arguments. I feel like maybe the concatenation operator and this rfc needs to be reconsidered. The RFC imposes a very specific execution order on processing an operation that isn't strictly necessary to accomplishing that operation and is the worst order possible from the of view of parallelizing the operation. And it seems like it breaks a fundamental expectation that the return from join will be a string, just like the return from a double quoted string is a string. It also kinda breaks the mathematics of strings. Shouldn't $x . "" eq "$x" eq join "", $x ? With this rfc they won't be right? It seems like a bug that "$x$y" is parsed as $t = $x . $y; "$t" instead of "$x" . "$y". This is the behavior we've put in the BoldStars class in our test script

(using . on a BoldStars object produces a BoldStars object). If one of the arguments to join() is one such object, then it will "assimilate" the other arguments with each concatenation, and the result of join() will be an overloaded object

Yes I got it now. See above. Yves

…

-- perl -Mre=debug -e "/just|another|perl|hacker/"

leonerd · 2023-01-25T17:42:20Z

If it would be considered too surprising for join to suddenly be implemented in terms of an overloaded concat operator, what about adding a new overload key whereby classes can opt-in to such behaviour.

class My::Stringlike::Class;
use overload
  '.' => "concat",
  join_uses_concat => 1;

and then pp_join would only follow the concat overloading on any object that additionally sets that key, indicating it is opting in to such a mechanism.

demerphq · 2023-02-18T13:41:29Z

and then pp_join would only follow the concat overloading on any object that additionally sets that key, indicating it is opting in to such a mechanism.

That feels backwards, or maybe I dont understand the suggestion properly. If I call join with a list of items, and only one of them has this this new type of overloading what would happen?

FWIW: I have no problem with this RFC if it is lexically scoped behavior.

{
   use feature "join_with_concat";
   my $joined = join $sep, @things;
}

But it has fairly deep implications some of which IMO are negative in terms of performance, and definitely would change which existing overload methods are used, so I think it needs to be guarded. I can totally understand that two reasonable people might see what join does in different ways, and i think we can offer users the ability to choose which they get, but I do think it has to be opt in.

leonerd · 2023-02-18T16:04:03Z

That feels backwards, or maybe I dont understand the suggestion properly. If I call join with a list of items, and only one of them has this this new type of overloading what would happen?

Per instance. The pp_join function has to individually query every SV to ask "hey, do you have concat magic? if so I'll use it". The suggestion is that question becomes "Hey, do you have concat magic and is join permitted to use it?". That way, any class that doesn't specify the flag will continue in its current behaviour and not cause any surprises, meanwhile any class that does want to be join-aware can set the flag and all works lovely.

Perhaps I need to remind people of my use-case here. My use-case is that I have an entire ecosystem of object classes ("String::Tagged") that are objects that behave like strings. Or at least, they try to. But currently they can't do a very good job of it because many core operators (such as join() and substr()) don't permit overloading, so I lost my "being a special object" going via those operators. It means I have to write an entire second set of object methods - like the String::Tagged->join($sep, @values) constructor, or the $str->substr(...) accessor, which code should call instead. But then that means that only code that knows about them actually works. It means that I can't pass these strings into the huuuuuge amount of existing code already on CPAN and have it work. One random example module would be Text::Wrap. I tried years ago to use that with String::Tagged and was very disappointed at how badly we handle that kind of thing. It seems I'm not the only person - https://rt.cpan.org/Ticket/Display.html?id=145864

My eventual goal here is to end up in a situation where an object class such as String::Tagged gets as fully supported by all the core ops (and any CPAN module that uses them) as a number class like Math::BigRat. Observe that nobody has to call $x->add($y) to make those behave - you can just do $x + $y. I'd like our stringy ops to do the same.

demerphq · 2023-02-18T16:36:52Z

Per instance. The pp_join function has to individually query every SV to ask "hey, do you have concat magic? if so I'll use it". The suggestion is that question becomes "Hey, do you have concat magic and is join permitted to use it?". That way, any class that doesn't specify the flag will continue in its current behaviour and not cause any surprises, meanwhile any class that does want to be join-aware can set the flag and all works lovely.

So in the following code what would happen?

my $s1 = Object::With::Normal::Concat::Overload->new(..);
my $s2 = Object::With::Join::Concat::Overload->new(...);

my $joined = join("", qw(a b c d e), $s1, qw( p q r s t u ), $s2, qw( w x y z));

Would $s2 being in the arg list cause $s1 to also use concat overloads? Would it be acceptable if the internals turned this into:

my $joined = join("", join("",qw(a b c d e), "$s1", qw( p q r s t u )), $s2, join("",qw( w x y z)));

for example?

Id feel more comfortable reasoning about this if we weren't discussing the "concat" overload. Lets say we had a "join" overload.

Would it be acceptable that a join with K items in it, where only 1 of which had the join overload, that it might call that overload at most twice and perhaps only once (if it was at the head or tail of the list), regardless of what K was?

If that was acceptable then I would be a lot more comfortable about this. My concern is that adding this has a deep impact on the expected order of the joining, if we eliminate that, then most of my concern gets resolved.

demerphq · 2023-02-19T13:59:51Z

@book: I will resolve the embed.fnc conflict in this PR for you unless you leave a note requesting that I do not. Sorry for the inconvenience.

demerphq · 2023-02-20T14:15:35Z

Rebased as promised.

Add static helper functions: * S_do_join_inner(lhs, rhs) concat function which may call overload if either the left or right sv has concat overload magic. Test for overload support in join: * t/lib/overload_join.t Co-authored-by: Eric Herman <eric@freesa.org>

jkeenan · 2024-08-27T20:43:06Z

There was extensive discussion in this p.r., but it petered out a year-and-a-half ago. Consequently, the code has acquired merge conflicts. @book et al., can we get an update on the status of this ticket? Thanks.

book requested a review from leonerd November 11, 2022 11:01

demerphq requested changes Nov 13, 2022

View reviewed changes

tonycoz reviewed Nov 14, 2022

View reviewed changes

doop.c Outdated Show resolved Hide resolved

leonerd reviewed Nov 19, 2022

View reviewed changes

doop.c Outdated Show resolved Hide resolved

leonerd requested changes Nov 19, 2022

View reviewed changes

t/lib/overload_join.t Outdated Show resolved Hide resolved

t/lib/overload_join.t Outdated Show resolved Hide resolved

doop.c Outdated Show resolved Hide resolved

ericherman mentioned this pull request Dec 11, 2022

Add function amagic_find(sv, method, flags) #20604

Merged

ericherman mentioned this pull request Dec 19, 2022

amagic_find() doesn't handle method fallback properly. #20627

Closed

github-actions bot added hasConflicts and removed hasConflicts labels Dec 24, 2022

book marked this pull request as ready for review January 13, 2023 11:00

book requested a review from leonerd January 13, 2023 11:01

leonerd reviewed Jan 13, 2023

View reviewed changes

doop.c Outdated Show resolved Hide resolved

book requested a review from demerphq January 13, 2023 14:18

leonerd approved these changes Jan 13, 2023

View reviewed changes

demerphq requested changes Jan 13, 2023

View reviewed changes

github-actions bot added hasConflicts and removed hasConflicts labels Jan 19, 2023

github-actions bot added the hasConflicts label Feb 11, 2023

demerphq removed the hasConflicts label Feb 20, 2023

github-actions bot added the hasConflicts label Feb 20, 2023

book and others added 2 commits February 21, 2023 05:00

demerphq removed the hasConflicts label Feb 21, 2023

github-actions bot added the hasConflicts label Mar 5, 2023

book closed this by deleting the head repository Mar 12, 2025

Part 1 of implementing RFC0013 (join) #20503

Part 1 of implementing RFC0013 (join) #20503

Uh oh!

Conversation

book commented Nov 11, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

demerphq commented Nov 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iabyn commented Nov 13, 2022

Uh oh!

Uh oh!

Uh oh!

leonerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tonycoz commented Nov 28, 2022

Uh oh!

demerphq commented Nov 28, 2022 via email

Uh oh!

demerphq commented Nov 28, 2022 via email

Uh oh!

ericherman commented Dec 16, 2022

Uh oh!

leonerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leonerd left a comment

Choose a reason for hiding this comment

Uh oh!

demerphq Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

demerphq Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericherman commented Jan 14, 2023

Uh oh!

demerphq commented Jan 14, 2023 via email

Uh oh!

book commented Jan 16, 2023

Uh oh!

demerphq commented Nov 13, 2022 •

edited

Loading

demerphq Jan 13, 2023 •

edited

Loading

demerphq Jan 13, 2023 •

edited

Loading