Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization for put/doPut templates that use outputRanges that should be pass by value #2655

Closed
wants to merge 1 commit into from

Conversation

marler8997
Copy link
Contributor

The original put/doPut functions pass in the OutputRange by ref, i.e.

package void doPut(R, E)(ref R r, auto ref E e)
void put(R, E)(ref R r, E e)

However, some types do not need to be passed by reference such as pointers, dynamic arrays, classes, delegates...

To better optimize these output ranges I added overloads for the put/doPut functions that pass the OutputRange by value instead of by reference (See code).

Now that each overload handles a different set of types, some of the logic can be removed from each function, but I just duplicated the logic twice to minimize the potential amount of turmoil (I don't want to mess anything up). If anyone knows "for sure" what logic can be removed I'm willing to do that, but, most of the logic is compile time so the only gain will be slightly faster compile times (not necessarily smaller/leaner functions). Also feel free to make any modifications to my code:)

@marler8997 marler8997 force-pushed the OptimizePut branch 3 times, most recently from 8cd7d25 to 0a0656e Compare November 4, 2014 19:43
@DmitryOlshansky
Copy link
Member

dynamic arrays actually must be passed by reference as they do change in size after each put.

@marler8997
Copy link
Contributor Author

Ah yes good catch, fixed and pushed.

@marler8997
Copy link
Contributor Author

Hey just wanted to know what the status on this is. Is everyone too busy to look at it right now or is there something I can do to make this PR better? Thanks.

@quickfur
Copy link
Member

quickfur commented Nov 7, 2014

I haven't gotten around to looking over the code yet. I'll see if I can do that sometime today.

{
import std.string;
static assert (false,
format("Cannot nativaly put a %s into a %s.", E.stringof, R.stringof));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this wasn't your doing, but can you fix nativaly? I'm assuming it should be natively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

@schveiguy
Copy link
Member

Can you give a hint as to what optimizations we can have by specializing on certain types? This seems like we are singling out certain types, but there are others that could be "pass by value".

@marler8997
Copy link
Contributor Author

This optimization removes a level of indirection for types that already have indirection like pointers/classes/etc. If you pass a pointer by reference, you can modify what the pointer is pointing to and the pointer itself. The put/doPut functions will be modifying what the pointer is pointing to but will never be modifying the pointer itself. Therefore, passing a pointer by reference in this case is unnecessary.

There may be more types that could be pass by value but these were all the ones I could think of (if you have any more let me know). However, the default is to use pass by reference, which at worst creates more indirection then needed. If any more types are discovered, they just need to by added to the isPassByValue template.

It is unfortunate that there's no way to combine these functions into a single template. I posted an idea to do this by adding a new function attribute pref here:

http://forum.dlang.org/thread/dhqrbvslzzoxmhzisnga@forum.dlang.org

I didn't get any reponses though. Basically, a type that is "pointing" to data should be passed by value, whereas a type that is data ( like a struct ) should be passed by reference. The perfect example is a struct (pass by reference) or a pointer to a struct (pass by value).

If you read the post I made and understand what pref means then this might help:

ref = modify callers variable
pref = modify content of callers variable

So if the callers variable is a pointer, the caller needs only to pass the pointer, if the callers variable is not a pointer, like a raw struct or a value type, then the caller should pass the variable by reference.

@schveiguy
Copy link
Member

This optimization removes a level of indirection for types that already have indirection like pointers/classes/etc.

OK, so it's simply an avoidance of indirection. The way you phrased the original post, it sounded like there were some other optimizations that can be made, but you were holding off to avoid conflating the two.

There may be more types that could be pass by value but these were all the ones I could think of (if you have any more let me know). However, the default is to use pass by reference, which at worst creates more indirection then needed. If any more types are discovered, they just need to by added to the isPassByValue template.

For instance a user-created smart pointer. Basically, any struct that is made up of references that have nothing to do with the range apparatus, because what the references point to is actually the range.

@marler8997
Copy link
Contributor Author

There may be more types that could be pass by value but these were all the ones I could think of (if you have any more let me know). However, the default is to use pass by reference, which at worst creates more indirection then needed. If any more types are discovered, they just need to by added to the isPassByValue template.

For instance a user-created smart pointer. Basically, any struct that is made up of references that have nothing to do with the range apparatus, because what the references point to is actually the range.

Yes it's true that if the struct is simply wrapping the real output range then it could be passed by value and it would still work. However, I'm not sure it's possible to be able to determine in the general case if a struct is doing this. Furthermore, even if the struct is wrapping an output range, it may also contain a bunch of other fields which may make the user want to pass it by reference to avoid copying the entire struct to the stack just to call the put function. I think its reasonable to say that structs will always be passed by reference. If the struct is wrapping some type that is actually facilitating the output range, then the user can pass that type to the put function instead.

@schveiguy
Copy link
Member

I suppose it's just an optimization that we can't give users the ability to hook. Probably fine the way it is then.

@9il 9il mentioned this pull request Nov 8, 2014
@9il
Copy link
Member

9il commented Nov 8, 2014

Please block this PR until #2661 became closed (merged/rejected).

@marler8997
Copy link
Contributor Author

It looks like the test infrastructure keeps trying to re-validate this PR. Is there a way to prevent this while we are waiting for #2661? I don't want to take away build cycles from that PR.

@schveiguy
Copy link
Member

You can close the PR, and then open it later.

@marler8997
Copy link
Contributor Author

Modified to accomodate rebasing from #2661.

@9il
Copy link
Member

9il commented Nov 11, 2014

Thanks!

@marler8997 marler8997 force-pushed the OptimizePut branch 2 times, most recently from 1322cc8 to db19a83 Compare November 11, 2014 20:29
@DmitryOlshansky
Copy link
Member

Awesome, so all my comments are somewhere on Jonathon's branch...

Anyhow some proof that this actually does add speed is needed. Maybe inliner just unrolls it all on its own and we just duplicate code (and significant amounts).

@DmitryOlshansky
Copy link
Member

@9il Moscow

@9il
Copy link
Member

9il commented Nov 12, 2014

@DmitryOlshansky Moscow too. I will write my comment in Russian tomorrow. My English is very bad to explain all details I want.

@marler8997
Copy link
Contributor Author

Oh! Thanks for clarifying. If the compiler is able to optimize a ref class/ref delegate/ref function/ref (pointer) by removing the level of indirection then this PR is useless. I'm not convinced that the compiler will do this in all cases, I'll have to do some "thinking". When you get a chance, what tool did you use to get the assembly code? Did you disassemble the binary or did you have DMD generate the assembly code? I would also love to get someone like @WalterBright to give his take on this. He might be able to immediately say whether or not this optimization is useful.

@WalterBright If you have a minute, is there any use to remove the "ref" attribute on the put function for class/function/delegate/pointer output ranges? In this PR, I basically copied the put/doPut functions and added template guards that choose either the ref version or the non-ref version depending on the output range type (see isPassByValue template). However, we are seeing that in some (maybe even all) cases the compiler is able to optimize a ref class/ref struct pointer into a regular class/struct pointer. Do you think this PR is useless or are there some cases where the compiler may not be able to optimize? Thanks.

@IgorStepanov
Copy link
Contributor

@DmitryOlshansky @9il FYI: Moscow too;)

@9il
Copy link
Member

9il commented Nov 12, 2014

@IgorStepanov #dconf in Moscow, 2018? :-)

@IgorStepanov
Copy link
Contributor

@9il Great idea!

@DmitryOlshansky
Copy link
Member

I'm not convinced that the compiler will do this in all cases, I'll have to do some "thinking".

I was afraid it can since put is a template so by definition all of source code is just there. With that being said new code may be faster for non-inlined builds, but who cares about performance and yet doesn't inline...

When you get a chance, what tool did you use to get the assembly code?

I disassembled the binary, the tool is IDA Pro: https://www.hex-rays.com/products/ida/
There is a freeware for x86 I think.

On linux one can just use objdump, it;s good enough.

Did you disassemble the binary or did you have DMD generate the assembly code?

It doesn't produce ASM listing. To the best of my knowledge DMD outputs object/machine code directly, there is no intermediate asm format anywhere.

@DmitryOlshansky
Copy link
Member

@9il @IgorStepanov I recall folks hoped for the next DConf to be staged in Europe. Well, Moscow is Europe ... ;)

@9il About x64, here are 2 dumps on Linux before/after this pull:
https://gist.github.com/anonymous/75bff56566c2dce761fb
https://gist.github.com/anonymous/89577cfd486efd0976a1

As you can observe with you favorite diff tool, the only difference is addresses of calls.

I tricked optimizer by using separate compilation, now it finally makes some sense:

OutputRange (struct pointer) run 1 (loopcount 100000000)
  nothing      : 149614 microseconds
  range.put    : 80888 microseconds
  put(range)   : 81764 microseconds
OutputRange (struct pointer) run 2 (loopcount 100000000)
  nothing      : 147139 microseconds
  range.put    : 81201 microseconds
  put(range)   : 80440 microseconds

Variation is about +- 4ms.

The latest code:

import std.stdio;
import std.datetime;
import std.range;

import submod; // defines size_t global and is separately compiled

class MyOutputRange
{
  size_t x, y;
  final void put(int x)
  {
    // Do something to demonstrate that indirection causes slower performance
    //asm{ nop; }
    global = x;
    y = global;
    this.x = y;
  }
}

void foo(){
  asm { nop; }
}

void main(string[] args)
{
  size_t runCount = 2;
  size_t loopCount = 100_000_000;

  // Use struct pointer to show that new put function operates quicker on pointers
  MyOutputRange outputRange = new MyOutputRange();
  StopWatch sw;
  for(auto runIndex = 0; runIndex < runCount; runIndex++) {

    writefln("OutputRange (struct pointer) run %s (loopcount %s)", runIndex + 1, loopCount);
    sw.reset();
    sw.start();
    for(auto i = 0; i < loopCount; i++) {
      foo();
    }
    sw.stop();
    writefln("  nothing      : %s microseconds", sw.peek.usecs);

    sw.reset();
    sw.start();
    for(auto i = 0; i < loopCount; i++) {
      outputRange.put(3);
    }
    sw.stop();
    writefln("  range.put    : %s microseconds", sw.peek.usecs);

    sw.reset();
    sw.start();
    for(auto i = 0; i < loopCount; i++) {
      put(outputRange, 3);
    }
    sw.stop();
    writefln("  put(range)   : %s microseconds", sw.peek.usecs);

    version(PutRef) {
      sw.reset();
      sw.start();
      for(auto i = 0; i < loopCount; i++) {
        putRef(outputRange, 3);
      }
      sw.stop();
      writefln("  putRef(range): %s microseconds", sw.peek.usecs);
    }
  }
}

@marler8997
Copy link
Contributor Author

This may be getting a little "theoretical" here, but I would think that if a compiler could optimize a "ref" function parameter to a "pass-by-value" argument, then it could handle this:

__gshared int global;
void func(ref int x) { global = x; }
void main() { int a; func(a); }

Since the function parameter x is not getting modified, the compiler can optimize that argument to a pass-by-value. I haven't mentally decompiled the assembly but when I compile this code with and without the "ref", here's what I get.

Non-ref version:

SECTION .text$_D4test4funcFiZv align=4 execute          ; section number 8, code
;  Communal section not supported by YASM

_D4test4funcFiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        mov     dword [_D4test6globali], eax            ; 0001 _ A3, 00000000(segrel)
        pop     eax                                     ; 0006 _ 58
        ret                                             ; 0007 _ C3
; _D4test4funcFiZv End of function

Ref version:

SECTION .text$_D4test4funcFKiZv align=4 execute         ; section number 8, code
;  Communal section not supported by YASM

_D4test4funcFKiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        mov     ecx, dword [eax]                        ; 0001 _ 8B. 08
        mov     dword [_D4test6globali], ecx            ; 0003 _ 89. 0D, 00000000(segrel)
        pop     eax                                     ; 0009 _ 58
        ret                                             ; 000A _ C3
; _D4test4funcFKiZv End of function

It appears that the compiler isn't able to optimize the indirection and still passes the integer argument by reference. I'm going to try this again with the pointer types but this is interesting.

@marler8997
Copy link
Contributor Author

Looks like the same thing happens with classes, I'm going to disassemble the old put vs new put and see if I see the same thing as you.

@marler8997
Copy link
Contributor Author

Ok, I'm definitely seeing differences in the assembly code. I'm not sure why you were not seeing differences. I've simplified the code since now were looking at assembly, no more need for all those write statements and extra imports:

import std.range;

size_t global;

class MyOutputRange
{
  size_t x;
  void put(int x)
  {
    asm { nop; }
    global = x;
  }
}

void main(string[] args)
{
  MyOutputRange outputRange;
  put(outputRange, 3);
}

Old Put

SECTION .text$__Dmain align=4 execute                   ; section number 12, code
;  Communal section not supported by YASM

__Dmain:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        lea     eax, [esp]                              ; 0001 _ 8D. 04 24
        mov     dword [esp], 0                          ; 0004 _ C7. 04 24, 00000000
        push    eax                                     ; 000B _ 50
        mov     eax, 3                                  ; 000C _ B8, 00000003
        call    _D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFKC7testput13MyOutputRangeiZv; 0011 _ E8, 00000000(rel)
        xor     eax, eax                                ; 0016 _ 31. C0
        pop     ecx                                     ; 0018 _ 59
        ret                                             ; 0019 _ C3
; __Dmain End of function


SECTION .text$_D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFKC7testput13MyOutputRangeiZv align=4 execute ; section number 13, code
;  Communal section not supported by YASM

_D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFKC7testput13MyOutputRangeiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        mov     ecx, dword [esp+8H]                     ; 0001 _ 8B. 4C 24, 08
        push    eax                                     ; 0005 _ 50
        mov     eax, dword [ecx]                        ; 0006 _ 8B. 01
        mov     edx, dword [eax]                        ; 0008 _ 8B. 10
        call    near [edx+14H]                          ; 000A _ FF. 52, 14
        pop     eax                                     ; 000D _ 58
        ret     4                                       ; 000E _ C2, 0004
; _D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFKC7testput13MyOutputRangeiZv End of function


SECTION .text$_D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFKC7testput13MyOutputRangeKiZv align=4 execute ; section number 14, code
;  Communal section not supported by YASM

_D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFKC7testput13MyOutputRangeKiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        push    dword [eax]                             ; 0001 _ FF. 30
        mov     ecx, dword [esp+0CH]                    ; 0003 _ 8B. 4C 24, 0C
        mov     eax, dword [ecx]                        ; 0007 _ 8B. 01
        mov     edx, dword [eax]                        ; 0009 _ 8B. 10
        call    near [edx+14H]                          ; 000B _ FF. 52, 14
        pop     eax                                     ; 000E _ 58
        ret     4                                       ; 000F _ C2, 0004
; _D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFKC7testput13MyOutputRangeKiZv End of function

New Put

SECTION .text$__Dmain align=4 execute                   ; section number 12, code
;  Communal section not supported by YASM

__Dmain:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        mov     eax, 3                                  ; 0001 _ B8, 00000003
        push    0                                       ; 0006 _ 6A, 00
        call    _D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFC7testput13MyOutputRangeiZv; 0008 _ E8, 00000000(rel)
        xor     eax, eax                                ; 000D _ 31. C0
        pop     ecx                                     ; 000F _ 59
        ret                                             ; 0010 _ C3
; __Dmain End of function


SECTION .text$_D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFC7testput13MyOutputRangeiZv align=4 execute ; section number 13, code
;  Communal section not supported by YASM

_D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFC7testput13MyOutputRangeiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        push    eax                                     ; 0001 _ 50
        mov     eax, dword [esp+0CH]                    ; 0002 _ 8B. 44 24, 0C
        mov     ecx, dword [eax]                        ; 0006 _ 8B. 08
        call    near [ecx+14H]                          ; 0008 _ FF. 51, 14
        add     esp, 4                                  ; 000B _ 83. C4, 04
        ret     4                                       ; 000E _ C2, 0004
; _D3std5range35__T3putTC7testput13MyOutputRangeTiZ3putFC7testput13MyOutputRangeiZv End of function


SECTION .text$_D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFC7testput13MyOutputRangeKiZv align=4 execute ; section number 14, code
;  Communal section not supported by YASM

_D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFC7testput13MyOutputRangeKiZv:; Function begin, communal
        push    eax                                     ; 0000 _ 50
        push    dword [eax]                             ; 0001 _ FF. 30
        mov     eax, dword [esp+0CH]                    ; 0003 _ 8B. 44 24, 0C
        mov     ecx, dword [eax]                        ; 0007 _ 8B. 08
        call    near [ecx+14H]                          ; 0009 _ FF. 51, 14
        add     esp, 4                                  ; 000C _ 83. C4, 04
        ret     4                                       ; 000F _ C2, 0004
; _D3std5range37__T5doPutTC7testput13MyOutputRangeTiZ5doPutFC7testput13MyOutputRangeKiZv End of function

@DmitryOlshansky
Copy link
Member

It doesn't even compile for me with full optimizations turned on: there is a null pointer.
std/range/constraints.d(386): Error: null dereference in function _Dmain

I'll try with MyOutputRange outputRange = new MyOutputRange(); though

@DmitryOlshansky
Copy link
Member

This is how it looks for me:

0000000000000e30 <_Dmain>:
 e30:   55                      push   %rbp
 e31:   48 8b ec                mov    %rsp,%rbp
 e34:   48 83 ec 20             sub    $0x20,%rsp
 e38:   be 03 00 00 00          mov    $0x3,%esi
 e3d:   48 89 75 e8             mov    %rsi,-0x18(%rbp)
 e41:   48 bf 00 00 00 00 00    movabs $0x0,%rdi
 e48:   00 00 00 
 e4b:   e8 00 00 00 00          callq  e50 <_Dmain+0x20>
 e50:   48 89 c7                mov    %rax,%rdi
 e53:   48 8b 75 e8             mov    -0x18(%rbp),%rsi
 e57:   48 8b 07                mov    (%rdi),%rax
 e5a:   48 ff 50 28             rex.W callq *0x28(%rax)
 e5e:   31 c0                   xor    %eax,%eax
 e60:   48 8b e5                mov    %rbp,%rsp
 e63:   5d                      pop    %rbp
 e64:   c3                      retq   
 e65:   0f 1f 00                nopl   (%rax)

vs

0000000000000e30 <_Dmain>:
 e30:   55                      push   %rbp
 e31:   48 8b ec                mov    %rsp,%rbp
 e34:   48 83 ec 20             sub    $0x20,%rsp
 e38:   be 03 00 00 00          mov    $0x3,%esi
 e3d:   48 89 75 e8             mov    %rsi,-0x18(%rbp)
 e41:   48 bf 00 00 00 00 00    movabs $0x0,%rdi
 e48:   00 00 00 
 e4b:   e8 00 00 00 00          callq  e50 <_Dmain+0x20>
 e50:   48 89 c7                mov    %rax,%rdi
 e53:   48 8b 75 e8             mov    -0x18(%rbp),%rsi
 e57:   48 8b 07                mov    (%rdi),%rax
 e5a:   48 ff 50 28             rex.W callq *0x28(%rax)
 e5e:   31 c0                   xor    %eax,%eax
 e60:   48 8b e5                mov    %rbp,%rsp
 e63:   5d                      pop    %rbp
 e64:   c3                      retq   
 e65:   0f 1f 00                nopl   (%rax)

Completely inlined. There are also present the bodies of put that look different, and in fact your pull makes version that looks better. The catch is it's NOT used anywhere after inlining.

Anyway here they are:

0000000000001c98 <_D3std5range11constraints31__T3putTC3new13MyOutputRangeTiZ3putFKC3new13MyOutputRangeiZv>:
    1c98:   55                      push   %rbp
    1c99:   48 8b ec                mov    %rsp,%rbp
    1c9c:   48 83 ec 10             sub    $0x10,%rsp
    1ca0:   48 89 75 f8             mov    %rsi,-0x8(%rbp)
    1ca4:   48 89 fe                mov    %rdi,%rsi
    1ca7:   48 8b 45 f8             mov    -0x8(%rbp),%rax
    1cab:   48 8b 38                mov    (%rax),%rdi
    1cae:   48 8b 0f                mov    (%rdi),%rcx
    1cb1:   48 ff 51 28             rex.W callq *0x28(%rcx)
    1cb5:   48 8b e5                mov    %rbp,%rsp
    1cb8:   5d                      pop    %rbp
    1cb9:   c3                      retq   
    1cba:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

vs

0000000000001c98 <_D3std5range11constraints31__T3putTC3new13MyOutputRangeTiZ3putFC3new13MyOutputRangeiZv>:
    1c98:   55                      push   %rbp
    1c99:   48 8b ec                mov    %rsp,%rbp
    1c9c:   48 83 ec 10             sub    $0x10,%rsp
    1ca0:   48 89 75 f8             mov    %rsi,-0x8(%rbp)
    1ca4:   48 89 fe                mov    %rdi,%rsi
    1ca7:   48 8b 7d f8             mov    -0x8(%rbp),%rdi
    1cab:   48 8b 07                mov    (%rdi),%rax
    1cae:   48 ff 50 28             rex.W callq *0x28(%rax)
    1cb2:   48 8b e5                mov    %rbp,%rsp
    1cb5:   5d                      pop    %rbp
    1cb6:   c3                      retq   
    1cb7:   90                      nop

@marler8997
Copy link
Contributor Author

So you're saying that the doPut/put functions get generated but no one calls them because the call to put gets inlined? So let's say that's true, the times that the function can't get inlined, the new put is going to still be better correct?

@DmitryOlshansky
Copy link
Member

@marler8997 Yeah, that pretty much nails it. If compiler inlines it then ref doesn't matter, if not then passing classes and pointers by copy is going to be faster.

@marler8997
Copy link
Contributor Author

Cool so where does that leave us. Is this something we want to integrate. I can understand the argument that the duplicate code may not be worth having such a small optimization. There also may a feature added to D in the future that would solve this problem (maybe adding a 'cref' attribute instead of 'ref' that means the parameter is referencing the "contents" of the caller's variable instead of the variable itself). What do you think?

@DmitryOlshansky
Copy link
Member

I think the problem is more general then put/doPut that is any generic function that takes a range and wants to modify it has to use ref, however since a range may already be a reference it mean extra indirection just like this case.

Now if there was some cleaner idiom I'd be all for it, but solving it with duplication like this for a single case is too much. Conditional ref would be awesome...

I have an idea - maybe trumpoline function that is so simple to always be inlined could help:

auto someFuncTrumpl(ref T arg) if(ValueType!T){
   return someFunc(arg);
}
auto someFuncTrumpl(ref T arg) if(!ValueType!T){ 
  auto ptr= *&arg;
  return someFunc(ptr);  // get simple value
}

The catch is without -inline it will be even slower. Otherwise it seems quite what the doctor ordered.

Need to test assembly output though for as many cases as possible.

@DmitryOlshansky
Copy link
Member

Even better - the above could be generalized to something like:

// something like:
alias put(T) = forwardRef!(doPut, T);

Where forwardRef would alias itself away to raw doPut if T is not a class or pointer , and be a trumpoline template otherwise. Best of both worlds it seems.

@marler8997
Copy link
Contributor Author

Interesting...could you provide the code for forwardRef ? Since it's your idea it will help me understand the details quicker.

@DmitryOlshansky
Copy link
Member

I was wrong on the case that was aliased away and interestingly I've found it reduces template bloat: pointers and references use the same instance(!).

Somewhat raw but a good start:

import std.traits;

private enum bool isPassByValue(T) = isPointer!T || is( T == class ) || 
    is (T == delegate) || is( T == function );


//same doPut but  _stripped_ of `ref` storage class
void doPut(T)(T arg){
    pragma(msg, "Instantiated " ~ T.stringof);
    //...
    static if(isPointer!T)
        *arg = 1;
    else
        *arg = 1;
}

template forwardRef(alias Fn, T)
{
    static if(isPassByValue!T)
    {
        pragma(msg, T.stringof ~" by value");
        alias forwardRef = Fn!T; 
    }
    else
    {
        pragma(msg, T.stringof ~" by ref");
        auto forwardRef(ref T arg)
        {
            //convert to pointer explicitly
            return Fn(&arg);
        }
    }
}

void main()
{
    alias a = forwardRef!(doPut, int);
    alias b = forwardRef!(doPut, int*);
    int k;
    a(k);
    assert(k == 1);
    k = 0;
    b(&k);
    assert(k == 1);
}

UPDATE: fixed main to be more insteresting.

@marler8997
Copy link
Contributor Author

I started out writing the new put function this way. The problem is that I couldn't figure out how to write the put template in a way that the template parameters could be deduced. If the template parameters cannot be deduced then everything that uses an output range will need to explicitly write the template parameter for every put call. That's why I proposed cref. But no one has looked at my proposal. The only way I know how to make a proposal for the language is to post on the forums but alot of times I get very little response. I think the put function demonstrates a deficiency that D has no solution to right now aside from writing the same function twice as I have done here. Is there a better way to make a proposal like cref and actually get people to see it?

@marler8997
Copy link
Contributor Author

I'd like to get some closure on this PR. To summarize, this PR does make the generated put/doPut functions slightly more efficient by removing a level of indirection for "pointer" types, but at the cost of writing the functions source code twice. However, many times these functions get inlined which also removes the indirection anyway. @DmitryOlshansky feels like the duplicated source code is not a good solution but is open to exploring other solutions. I've proposed a new (conditional reference) attribute I made a post here: http://forum.dlang.org/thread/dhqrbvslzzoxmhzisnga@forum.dlang.org

There are likely other solutions as well. If we could solve the template parameter type deduction problem we could use the solution that @DmitryOlshansky suggested. I would like to get some more opinions on this. Maybe we should close this PR and create an issue instead? Not sure how to move forward.

@DmitryOlshansky
Copy link
Member

Well a change to the language is not likely to be accepted for a corner case. Problem of optimal parameter passing is not new, and e.g. C++ guys "solved" in Boost library. It is however rather involved.

If C++ can do it in library we must be able to. I'll take a shot at deduction problem in a day or two.

@marler8997
Copy link
Contributor Author

Sounds good. If we find a solution there are more functions in phobos that could benefit as well.

@DmitryOlshansky
Copy link
Member

It's been a while but haven't had the time to try it out. Stay tuned.

@DmitryOlshansky
Copy link
Member

This is stuck and the general idea of optimal forwarding for classes/pointers should be just converted to bugzilla enhancement. The plus side is handling ref and pointer in one template instance.

Last code:

import std.traits;

private enum bool isPassByValue(T) = isPointer!T || is( T == class ) || 
    is (T == delegate) || is( T == function );


//same doPut but  _stripped_ of `ref` storage class
void doPut(T)(T arg){
    pragma(msg, "Instantiated " ~ T.stringof);
    //...
    static if(isPointer!T)
        *arg = 1; // pointer branch
    else{
        *arg = 1; // delelgate class branch
    }
}

template forwardRef(alias Fn, T)
{
    static if(isPassByValue!T)
    {
        pragma(msg, T.stringof ~" by value");
        alias forwardRef = Fn!T; 
    }
    else
    {
        pragma(msg, T.stringof ~" by ref");
        auto forwardRef(ref T arg)
        {
            //convert to pointer explicitly
            return Fn(&arg);
        }
    }
}

void main()
{
    alias a = forwardRef!(doPut, int);
    alias b = forwardRef!(doPut, int*);
    int k;
    a(k);
    assert(k == 1);
    k = 0;
    b(&k);
    assert(k == 1);
}

@DmitryOlshansky
Copy link
Member

I turned this into an ehancement request:
https://issues.dlang.org/show_bug.cgi?id=14761

With that I'm closing it - any effort on this front should be generic and address this problem w/o duplicating code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants