Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve copy propagation optimization #4906

Merged
merged 1 commit into from
Aug 19, 2015

Conversation

WalterBright
Copy link
Member

Turns out H. S. Teoh was right. The copy propagation optimization was not working as well as it could for data that was larger than one register in size. Formerly, the sizes had to match exactly. Now the new size only has to be a subset of the original to trigger the copy propagation.

Some small informal tests showed significant gains from this.

@deadalnix
Copy link
Contributor

This probably should come with a test attached. This is scary that codegen is not more tested. This is a deterministic process and pretty damn hard to debug.

@WalterBright
Copy link
Member Author

It's already tested by the test suite, as an earlier version of my fix failed the test suite in multiple places. It is commonplace.

There is a more general problem with the optimizer and code gen testing. It often passes unnoticed when an optimization doesn't happen - the tests don't prove negatives, only that the code works. One the reasons I want to switch it to D is to get some unit testing in there.

@deadalnix
Copy link
Contributor

LLVM test this kind of thing but running the output assembly (in textual form) through a checker.

The checker can check that this are present or that thing are not present (as to verify optimizations). That is something worth considering. Anyway, i have no power over here, so what i'm saying is nothing more than a suggestion.

@WalterBright
Copy link
Member Author

Generating a text file and then comparing does sound like a convenient way to test. It's a good idea.

@WalterBright
Copy link
Member Author

For an idea of the effect of this, consider the program:

void foo() {
    __gshared uint[2]  a1 = [0x1001_1100, 0x0220_0012];
    uint[] a = a1;
    foreach (e; a)
        bar(e);
}
void bar(uint);

Before:

_D3foo3fooFZv:
                sub     ESP,010h
                mov     EAX,2
                mov     ECX,offset FLAT:_D3foo3fooFZ2a1G2k
                push    EBX
                xor     EBX,EBX
                mov     0Ch[ESP],EAX
                mov     010h[ESP],ECX
                mov     4[ESP],EAX
                mov     8[ESP],ECX
                cmp     0Ch[ESP],EBX
                je      L39
L26:            mov     EDX,8[ESP]
                mov     EAX,[EBX*4][EDX]
                call    near ptr _D3foo3barFkZv
                inc     EBX
                cmp     EBX,0Ch[ESP]
                jb      L26
L39:            pop     EBX
                add     ESP,010h
                ret

After:

_D3foo3fooFZv:
                push    EAX
                mov     ECX,offset FLAT:_D3foo3fooFZ2a1G2k
                push    EAX
                mov     EAX,2
                push    EBX
                xor     EBX,EBX
                mov     4[ESP],EAX
                mov     8[ESP],ECX
                cmp     4[ESP],EBX
                je      L30
L1D:            mov     EDX,8[ESP]
                mov     EAX,[EBX*4][EDX]
                call    near ptr _D3foo3barFkZv
                inc     EBX
                cmp     EBX,4[ESP]
                jb      L1D
L30:            pop     EBX
                add     ESP,8
                ret

@deadalnix
Copy link
Contributor

You could check that these extra mov are not present. For reference, here is how LLVM is doing it:

https://github.com/llvm-mirror/llvm/blob/master/test/Transforms/SROA/basictest.ll#L594

See that they check that this optimization pass remove loads present in the IR.

@DmitryOlshansky
Copy link
Member

For an idea of the effect of this, consider the program:

Looks like something I've encountered in the past. Great that you caught it.

@quickfur
Copy link
Member

Awesome. Good to know I'm not just spouting nonsense. ;-)

@WalterBright
Copy link
Member Author

Awesome. Good to know I'm not just spouting nonsense. ;-)

This kind of thing is exactly why I created that thread. It's been paying off nicely. Low hanging fruit FTW.

@dnadlinger
Copy link
Member

Meanwhile, using a modern compiler: ;)

__D4test3fooFZv:
    sub esp, 12
    mov eax, dword ptr [_D4test3fooFZ2a1G2k]
    call    _D4test3barFkZv
    mov eax, dword ptr [_D4test3fooFZ2a1G2k+4]
    call    _D4test3barFkZv
    add esp, 12
    ret

SCNR (Yes, I know that you know that I know that loop unrolling is not implemented in DMD at all.)

@WalterBright
Copy link
Member Author

@klickverbot see this one: #4909

@dnadlinger
Copy link
Member

@WalterBright: Looks better. ;) Can't really comment on the implementation, though.

@WalterBright
Copy link
Member Author

yah, the only difference is the loop isn't unrolled

MartinNowak added a commit that referenced this pull request Aug 19, 2015
improve copy propagation optimization
@MartinNowak MartinNowak merged commit f37e246 into dlang:master Aug 19, 2015
@WalterBright WalterBright deleted the betterCopyProp branch August 20, 2015 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants