Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some issues with the GPU support to enable simple Arkouda builds #20239

Merged
merged 11 commits into from
Jul 28, 2022

Conversation

e-kayrakli
Copy link
Contributor

@e-kayrakli e-kayrakli commented Jul 18, 2022

This branch makes bunch of small-ish changes in an attempt to get Arkouda to
compile with the GPU locale model. I can separate this into multiple PRs, but I
want to test it some more as a whole. Fixes with this PR:

  • Resolves Clang error compiling included C++ headers with GPU locale model #19754
    • Stops adding extern "C" in the C headers included at the command line.
      This was proposed by @milthorhpe.
    • Fixes an interop test broken by this.
  • To be able to do that makes runtime's generated code interface for complex
    numbers C-based even if we're compiling with C++. (A prior PR that made some
    improvements in that direction is
    Adjust runtime headers to better handle C++ compilation #16847)
  • This enables defining c_string_to_complex* functions to be included in the
    generated executable, because their return types are no longer
    std::complex<> which you can't link with C linkage.
  • Adds a no gpu codegen pragma to thwart gpuization within some internal
    functions. Currently this only applies to:
    • chpl__initCopy_shapeHelp: Has PRIM_ASSIGN that gets normalized after
      gpuization, but not resolved. We probably need this function, and the lack
      might be preventing us from using loop expressions to initialize arrays on
      gpus.
  • Fixes .name implementation of the GPULocale type.
    • Adds a relevant test.
  • Adds support for BitOps module.
    • Adds a relevant test.
  • While there, cleans up gpuTransforms.cpp a bit.

Test:

  • gpu/native
  • gpu/interop
  • make check + spot checks in types/complex with intel as the host+target compiler
  • make check + spot checks in types/complex with cray as the host+target compiler

@e-kayrakli
Copy link
Contributor Author

@stonea -- I did the refactoring we discussed online and some more. This commit has them (with a new function this PR adds). Let me know if you think I went too far and want me to split that commit into its own PR.

@mppf -- This commit has the core of the complex-related change in the runtime. I'll do some portability testing there. Do you have any specific concerns as to the configs where what I am doing can be problematic? In your #16847, you just did a full local testing, it looks like.

@mppf
Copy link
Member

mppf commented Jul 18, 2022

@e-kayrakli - I don't have any concerns or specific test configurations to bring up.

@e-kayrakli
Copy link
Contributor Author

@milthorpe -- This still needs some work, but we'll merge it soon. I'm wondering if you can give it a try to see if you can get a local, no-module Arkouda server built with it?

compiler/optimizations/gpuTransforms.cpp Outdated Show resolved Hide resolved
@stonea
Copy link
Contributor

stonea commented Jul 20, 2022

@stonea -- I did the refactoring we discussed online and some more. This commit has them (with a new function this PR adds). Let me know if you think I went too far and want me to split that commit into its own PR.

I don't think we need to be sticklers for separating refactoring changes out into a separate PR. Having it as a separate commit is nice for software archeology and reviewing purposes so I appreciate that.

@milthorpe
Copy link
Contributor

Today I re-tried compiling Arkouda and got this error, which I believe is a known problem:

In file included from <built-in>:3:
In file included from /tmp/chpl-milthorpe.deleteme-jUa4RL/command-line-includes.h:7:
In file included from /noback/milthorpe/chapel-1.27.0/modules/packages/ZMQHelper/zmq_helper.h:26:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/qio/qio.h:26:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/qio/qbuffer.h:39:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/qio/deque.h:44:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/chpl-mem.h:30:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/arg.h:24:
In file included from /noback/milthorpe/chapel-1.27.0/runtime/include/chpltypes.h:39:
In file included from /noback/milthorpe/chapel-1.27.0/third-party/llvm/install/linux64-x86_64/lib/clang/14.0.0/include/cuda_wrappers/complex:33:
/auto/software/swtree/ubuntu20.04/x86_64/gcc/11.3.0/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../include/c++/11.3.0/type_traits:44:3: error: templates must have C++ linkage
  template<typename... _Elements>
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/chpl-milthorpe.deleteme-jUa4RL/command-line-includes.h:2:1: note: extern "C" language linkage specification begins here
extern "C" {
^

@e-kayrakli
Copy link
Contributor Author

@milthorpe did you get that with this PR applied on main? It is not really a must that you do it for this PR to proceed, but if you are getting this error with this PR applied, that's surprising.

@milthorpe
Copy link
Contributor

@milthorpe did you get that with this PR applied on main? It is not really a must that you do it for this PR to proceed, but if you are getting this error with this PR applied, that's surprising.

Oops, I didn't correctly apply the PR. With the PR, I get:

$CHPL_HOME/modules/internal/DefaultRectangular.chpl:1494: error: GPU support does not currently allow nested kernel launches. Do you have
nested forall/foreach loops or looping over a multidimensional domain?

@e-kayrakli
Copy link
Contributor Author

e-kayrakli commented Jul 22, 2022

Ah, OK. That's good enough for this PR, I think/hope.

FWIW, the error you are getting is also a bit surprising but probably not from the GPU support perspective. Do you have a module in your ServerModules.cfg that supports multidimensional arrays in Arkouda? (which I know is something @bmcdonald3 has recently worked on). In any case, this PR causes issues with bunch of other Arkouda modules, too. But you should be able to compile a local, no-module Arkouda server.

@bmcdonald3
Copy link
Member

Had a chance to look into this a bit and I confirmed that the error you are running into is from the HDF5MultiDim module:

$CHPL_HOME/modules/internal/DefaultRectangular.chpl:1494: error: GPU support does not currently allow nested kernel launches. Do you have
nested forall/foreach loops or looping over a multidimensional domain?

So that can be removed by commenting out/deleting that module from the ServerModules.cfg file in the Arkouda repo.

I was able to get Arkouda to compile with GPU support using this patch with all the modules commented out from the build, but it does seem that some of the Arkouda modules still don't build with GPU support unfortunately. There is some documentation about the modular building of Arkouda located at https://github.com/Bears-R-Us/arkouda/blob/master/MODULAR.md, but I am happy to offer more assistance there if it would help.

@e-kayrakli
Copy link
Contributor Author

Thanks for checking @bmcdonald3!

but it does seem that some of the Arkouda modules still don't build with GPU support unfortunately

That's right. I looked at only a few of those modules. It was relatively easy to find the "offending" foralls. And I believe trying to get all the modules to compile is a good stress test for our GPU support down the road.

Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
- Add a new function to check for the new "no gpu codegen" pragma
- Store the parent function of the loop in a private field
- Refactor the recursive function into a non-recursive wrapper and a
  recursive, static helper
- Remove `allowFnCalls` from the GpuizableLoop interface

Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
@e-kayrakli e-kayrakli marked this pull request as ready for review July 27, 2022 23:31
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
Signed-off-by: Engin Kayraklioglu <e-kayrakli@users.noreply.github.com>
@e-kayrakli e-kayrakli merged commit 7e6992d into chapel-lang:main Jul 28, 2022
@e-kayrakli e-kayrakli deleted the gpu-zmq2 branch July 28, 2022 01:00
e-kayrakli added a commit that referenced this pull request Jul 28, 2022
Add the explicit include to runtime's bitops header

This was an obvious missing piece from #20239.

[Trivial, not reviewed]

Test:
- [x] standard
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clang error compiling included C++ headers with GPU locale model
5 participants