Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU: "Malformed PRIM_GET_MEMBER" error for programs with forall over multi-dimensional domain #19981

Closed
stonea opened this issue Jun 13, 2022 · 4 comments
Assignees

Comments

@stonea
Copy link
Contributor

stonea commented Jun 13, 2022

If I set CHPL_LOCALE_MODEL=gpu and compile a program that has any of the following forall loops it will produce this compile-time error:

internal error: Malformed PRIM_GET_MEMBER_* [optimizations/gpuTransforms.cpp:527]
var A: [1..10, 1..10] real;
forall (i,j) in {1..10, 1..10} do
  A(i,j) = i + j;

forall a in A do
  a += 1;

forall (i,j) in A.domain do
  A(i,j) -= 1;
@stonea stonea self-assigned this Jun 13, 2022
@bradcray
Copy link
Member

I think @ShreyasKhandekar and I were seeing a version of this last week (?).

@stonea
Copy link
Contributor Author

stonea commented Jun 13, 2022

I think @ShreyasKhandekar and I were seeing a version of this last week (?).

I wouldn't be surprised. I think this will happen for any program that uses a forall or foreach iterating over a multidimensional domain or array. I was seeing this pop up as a common failure when I tried to compile our primers with CHPL_LOCALE_MODEL=gpu as well.

Anyway, I'd consider this and the linker issues as our "high priority" GPU related bugs.

@stonea
Copy link
Contributor Author

stonea commented Jun 16, 2022

This is erroring out at this INT_FATAL in gputransforms.cpp:

INT_FATAL("Malformed PRIM_GET_MEMBER_*");

This occurs in a pass where we're trying to determine what variables in a loop we are "outlining" need to be passed in as parameters to the outlined function.

Running it through my sample I see that when we hit that assertion parent is:

(1984979 CallExpr .=
  (1984977 SymExpr 'const-val ret_tmp[1764569]:2*int(64)[1084566]')
  (1984980 SymExpr 'val x0[1084562]:int(64)[13]')
  (1984978 SymExpr 'const-val i[1763537]:int(64)[13]'))

And we're not expecting to see a PRIM_SET_MEMBER with a third argument (we're processing the symbol i).

I'm not sure exactly how to read this IR. I'm guessing something like this?:

ret_tmp.x0 = i

In which case we would want to add i to the kernel.

Assuming this is the case I think the fix is to change:

else if (symExpr == parent->get(1)) {
  addKernelArgument(sym);
}

to:

else if (symExpr == parent->get(1) ||
  (parent->numActuals() >= 3 && symExpr == parent->get(3))) {
  addKernelArgument(sym);
}

If I do this the next issue I encounter is a segfault later on with the following Stack trace:

llvm::Value::getType Value.h:246
codegenCallExprInner cg-expr.cpp:2558
codegenCallExprWithArgs cg-expr.cpp:2768
codegenCallExprWithArgs cg-expr.cpp:2788
codegenGPUKernelLaunch cg-expr.cpp:4928
CallExpr::codegenGPU_KERNEL_LAUNCH_FLAT cg-expr.cpp:4932
CallExpr::codegenPrimitive cg-expr.cpp:5859
CallExpr::codegen cg-expr.cpp:3723
BlockStmt::codegen cg-stmt.cpp:184
CondStmt::codegen cg-stmt.cpp:296
BlockStmt::codegen cg-stmt.cpp:184
FnSymbol::codegenDef cg-symbol.cpp:2793
ModuleSymbol::codegenDef cg-symbol.cpp:3292
codegenPartTwo codegen.cpp:2657
codegen codegen.cpp:2726
runPass runpasses.cpp:208
runPasses runpasses.cpp:170
main driver.cpp:1876
__libc_start_main 0x00007fffef23a083
_start 0x0000000000422bae

Looking at codegenCallExprInner cg-expr.cpp:2558 I see that it's trying to get the type for args[2]. Moving up the stack to codegenGPUKernelLaunch what I see is that args[2] should be a pointer to chpl_gpuBinary but for some reason it's not finding it in the lookup table. I can see that gCodegenGPU == true so I think what's happening is we're getting nested kernel launches.

I think this brings up broader questions about how should we handle the following (and indeed I'll get the same segfault and stack trace compiling this program):

var A : [0..10] int;
foreach i = 0..10 {
  foreach j = 0..10 {
    A(i) = i * 100 + j;
  }
}

@stonea
Copy link
Contributor Author

stonea commented Sep 6, 2022

With the submission of this PR: #20561

This is now resolved. We have have the new behavior locked down by this test:

https://github.com/chapel-lang/chapel/blob/main/test/gpu/native/nestedForall.chpl

Note, we don't currently support nested foreach (as described in the aforementioned PR).

@stonea stonea closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants