Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using c_ptrTo makes loop not GPUizable: calls out to extern function 'chpl_rmem_consist_release' #22151

Open
stonea opened this issue Apr 25, 2023 · 7 comments

Comments

@stonea
Copy link
Contributor

stonea commented Apr 25, 2023

I have the following program that calls out to an extern function from within what should be a GPUizable loop:

extern {
  #include <cuda.h>
  #include <cuda_runtime.h>
  #include <cuda_runtime_api.h>

  __device__ static inline void foo(float *buf, int idx) {
    printf("In foo with %f\n", buf[idx]);
  }
  __host__ static inline void foo(float *buf, int idx) {}
}

config param N = 5;

proc main() {
  pragma "codegen for GPU"
  extern proc foo(buf : c_ptr(c_float), idx : c_int);

  on here.gpus[0] {
    var A : [0..<N] real(32);
    
    foreach i in 0..<N {
      assertOnGpu();
      foo(c_ptrTo(A), i : c_int);
    }
  }
}

Unfortunately, when I try to compile this I get the following error:

foo.chpl:47: In function 'main':
foo.chpl:55: error: Loop containing assertOnGpu() is not eligible for execution on a GPU
$CHPL_HOME/modules/internal/MemConsistency.chpl:112: note: function calls out to extern function (chpl_rmem_consist_release), which is not marked as GPU eligible

This seems to occur when using c_ptrTo from within the loop. I can workaround this fairly easily by adding var cPtrToA = c_ptrTo(A); outside the loop and then using cPtrToA in place of c_ptrTo(A) but it would be preferable if we didn't have to do that.

@mppf
Copy link
Member

mppf commented May 1, 2023

I wonder what is causing that? Do you know why c_ptrTo is calling chpl_rmem_consist_release ? That seems surprising to me & might indicate we are doing reference counting on the domain when we should not.

@stonea
Copy link
Contributor Author

stonea commented May 5, 2023

Not that this explains why we're getting a chpl_rmem_consist_release but I can simplify the example down to:

use GPU;

proc main() {
  on here.gpus[0] {
    var A : [0..0] real(32);
    foreach i in 0..0 {
      assertOnGpu();
      A.domain;
    }
  }
}

And I still get the error about calling out to the chpl_rmem_consist_release extern.

@e-kayrakli
Copy link
Contributor

While this is not perfectly satisfying, the issue is how c_ptrTo(x: []) is implemented. It requires doing a bunch of checks and ultimately needs A.domain at the end. @stonea is that how you figured out the reproducer above?

FWIW, c_ptrTo(A[0]) instead of c_ptrTo(A) works fine.

@stonea
Copy link
Contributor Author

stonea commented May 5, 2023

@stonea is that how you figured out the reproducer above?

Yes exactly. I have a suspicion this may be due to the use of on in the deinit function for domain but I'm not sure.

@e-kayrakli
Copy link
Contributor

Yes exactly. I have a suspicion this may be due to the use of on in the deinit function for domain but I'm not sure.

I haven't dug that deep, but that definitely makes sense. A question that's not directly related to this issue in particular is: why do A.domain in cases like that end up creating a new domain value that needs to be destroyed afterwards? Isn't that costly?

@mppf
Copy link
Member

mppf commented May 6, 2023

why do A.domain in cases like that end up creating a new domain value that needs to be destroyed afterwards? Isn't that costly?

_array._dom calls _getDomain which calls _domain.init(_pid: int, _instance, _unowned: bool) with _unowned=true which just copies the pointer into a new record. Then _domain._do_destroy has a condition if ! _unowned, but I take it that the GPU support isn't able to figure out that it is an unowned domain (since that is a runtime condition).

I do not remember why it is that A.domain returns a shallow copy rather than a const ref. I am sure there is something that we were facing at the time but I do not remember what. It might have to do with managing array and domain memory properly or with privatization. Perhaps looking at the git history would tell us more about the history.

@stonea
Copy link
Contributor Author

stonea commented May 31, 2023

I've forked a new issue to follow up on this that's more focused on using Array.domain in an otherwise GPU eligible loop:

#22433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants