Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%affine.For breaks without scalerize pass #244

Closed
y-exp opened this issue Jul 13, 2023 · 2 comments · Fixed by #245
Closed

%affine.For breaks without scalerize pass #244

y-exp opened this issue Jul 13, 2023 · 2 comments · Fixed by #245
Labels
bug Something isn't working

Comments

@y-exp
Copy link
Collaborator

y-exp commented Jul 13, 2023

Example: bug.thorin

The Thorin function foo computes the sequence 0, 1, ..., 99 as a pack «100; %core.I32» via a simple %affine.For counting loop. Initially, the pack ‹100; 0:%core.I32› is passed to this loop and updated as an accumulator acc between iterations. Inside the function bar, said pack is finally stored at some ptr: %mem.Ptr0.

.plugin core;
.plugin affine;

.let Arr = «100; %core.I32»;

.fun foo []: Arr =
    .con body [i: %core.I32, acc: Arr, yield: .Cn Arr] =
        .let `acc = .ins (acc, %core.conv.u 100 i, i);
        yield acc;

    .let loop = %affine.For (%core.i32, 1, Arr);
    loop (0:%core.I32, 100:%core.I32, 1:%core.I32,100; 0:%core.I32, body, return);

.fun .extern bar [mem: %mem.M, ptr: %mem.Ptr0 Arr]: %mem.M =
    .ret arr  = foo $ ();
    .let `mem = %mem.store (mem, ptr, arr);
    return mem;

Example: bug.c

In addition to that, the following C program calls bar with a pointer to an array and prints its contents to stdout.

#include <stdio.h>
#include <stdint.h>

extern void bar(uint32_t* arr);

int main() {
    uint32_t arr[100] = {0};
    bar(arr);

    for (size_t i = 0; i < 100; i++)
        printf("%i ", arr[i]);
    printf("\n");

    return 0;
}

Commands

thorin bug.thorin --output-ll bug.ll
clang -o test bug.c bug.ll -Wno-override-module
./test

However, this only produces the correct output when the scalerize pass is run which flattens the pack inside the loop. In case it is skipped (e.g. %compile.scalerize_threshold is exceeded), the output contains garbage instead.

Looking at the generated LLVM IR code, it seems acc is not updated correctly when jumping between BBs that form the loop.

@y-exp y-exp added the bug Something isn't working label Jul 13, 2023
@leissa
Copy link
Member

leissa commented Jul 18, 2023

Problem is that the extractvalues are placed in a wrong block, thereby violating a basic SSA property. I'm really astonished that clang silently accepts this garbage and removes it entirely. Maybe a debug build has more sanity checks.

@leissa
Copy link
Member

leissa commented Jul 18, 2023

So the problem seems to be that %affine.lower_for_pass creates a couple of things, that the LLVM backend doesn't like. I think the best solution is to rewrite the pass to create a more straightforward loop. I'm on it. Could also speed up a couple of things if you use lots of nested %affine.Fors.

leissa added a commit that referenced this issue Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants