Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static-foreach uses huge stack for no reason #18976

Open
dlangBugzillaToGithub opened this issue Apr 16, 2015 · 3 comments
Open

static-foreach uses huge stack for no reason #18976

dlangBugzillaToGithub opened this issue Apr 16, 2015 · 3 comments

Comments

@dlangBugzillaToGithub
Copy link

Tomer Filiba (weka) reported this on 2015-04-16T10:22:02Z

Transferred from https://issues.dlang.org/show_bug.cgi?id=14451

CC List

  • Ali Cehreli
  • Tomer Filiba (weka)
  • yebblies

Description

Using static foreach (on a compile-time known range) basically expands the code block so many times. That's excellent, except that variables declared inside the foreach-body are not scoped properly, which causes needless, huge stack allocation.

Consider the following snippet:

================================================
import std.traits;

struct S {ubyte a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;}

string[] staticForeach1() {
    string[] arr;

    foreach(string name; __traits(allMembers, S)) {
        ulong[100] x;
        arr ~= name;
    }
    return arr;
}

string[] staticForeach2() {
    string[] arr;

    foreach(string name; __traits(allMembers, S)) {
        (){
            ulong[100] x;
            arr ~= name;
        }();
    }
    return arr;
}
================================================

The first version compiles to 

Dump of assembler code for function staticForeach1:
   0x0000000000426170 <+0>:     push   %rbp
   0x0000000000426171 <+1>:     mov    %rsp,%rbp
   0x0000000000426174 <+4>:     sub    $0x52f8,%rsp   <<<<<
   0x000000000042617b <+11>:    push   %rbx
   0x000000000042617c <+12>:    movq   $0x0,-0x52f0(%rbp)
   0x0000000000426187 <+23>:    movq   $0x0,-0x52e8(%rbp)

requiring over 20K of stack space. The second version compiles to

Dump of assembler code for function staticForeach2:
   0x0000000000426c48 <+0>:     push   %rbp
   0x0000000000426c49 <+1>:     mov    %rsp,%rbp
   0x0000000000426c4c <+4>:     sub    $0x10,%rsp    <<<<<
   0x0000000000426c50 <+8>:     movq   $0x0,-0x10(%rbp)
   0x0000000000426c58 <+16>:    movq   $0x0,-0x8(%rbp)
   0x0000000000426c60 <+24>:    mov    %rbp,%rdi
   0x0000000000426c63 <+27>:    callq  0x426d40 <staticForeach2FZ9__lambda1MFNbNfZv>
   0x0000000000426c68 <+32>:    mov    %rbp,%rdi
   0x0000000000426c6b <+35>:    callq  0x426db8 <staticForeach2FZ9__lambda2MFNbNfZv>
   0x0000000000426c70 <+40>:    mov    %rbp,%rdi

requiring only 16 bytes. I tried adding more scoping braces inside the foreach, e.g.,

    foreach(string name; __traits(allMembers, S)) {
        {
            ulong[100] x;
            arr ~= name;
        }
    }

but anything other than actually invoking a separate function does not work. Since I'm using fibers with small stacks, this is a real issue for me. I can usually find workarounds (as described here), but there's no reason why so much stack be wasted here.
@dlangBugzillaToGithub
Copy link
Author

yebblies commented on 2015-04-16T16:01:44Z

Scoped properly or not, each variable is going to take up stack space.  Space for all variables is always allocated at the start of the function, inner scopes just control when destructors are run.  I can't really see this changing.

This could be fixed by allowing the backend to re-use stack space for variables whose lifetimes do not overlap.

@dlangBugzillaToGithub
Copy link
Author

tomer commented on 2015-04-16T16:29:15Z

you are correct:

===================
void foo() {
    {
        ulong[32] x;
        writeln(x.ptr);
    }
    {
        ulong[32] x;
        writeln(x.ptr);
    }
    {
        ulong[32] x;
        writeln(x.ptr);
    }
    {
        ulong[32] x;
        writeln(x.ptr);
    }
    {
        ulong[32] x;
        writeln(x.ptr);
    }
}
===================

yields

7FFFB9A1E8A0
7FFFB9A1E9A0
7FFFB9A1EAA0
7FFFB9A1EBA0
7FFFB9A1ECA0

it seems a pity to waste so much stack... having the backend reuse dead parts of the stack is crucial for embedded applications or heavily-parallel ones that rely to thousands of small stacks.

of course it could be re-written

====================
void foo() {
    (){
        ulong[32] x;
        writeln(x.ptr);
    }();
    (){
        ulong[32] x;
        writeln(x.ptr);
    }();
    (){
        ulong[32] x;
        writeln(x.ptr);
    }();
    (){
        ulong[32] x;
        writeln(x.ptr);
    }();
    (){
        ulong[32] x;
        writeln(x.ptr);
    }();
}
===================

which then works just fine:

7FFF4101C4A0
7FFF4101C4A0
7FFF4101C4A0
7FFF4101C4A0
7FFF4101C4A0

couldn't inner-scopes be rewritten like that, only making sure the lambda is fully-scoped so that external variables will be heap-allocated

@dlangBugzillaToGithub
Copy link
Author

yebblies commented on 2015-04-17T01:01:48Z

(In reply to Tomer Filiba from comment #2)
> couldn't inner-scopes be rewritten like that, only making sure the lambda is
> fully-scoped so that external variables will be heap-allocated

That is automatic outlining, and while possible, is something we're unlikely to see any time soon in dmd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant