Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

Open
mroethlin opened this issue Oct 13, 2020 · 3 comments
Open

CUDA Ico Backend Refactoring (Persistence, Encapsulation) #1038

mroethlin opened this issue Oct 13, 2020 · 3 comments

Comments

@mroethlin
Copy link
Contributor

The encapsulation in the generated code for CUDA Ico is quite bad. The outer class does not properly wrap the inner class, and the driver needs to access the generated name of the stencil directly. This is quite cumbersome when the stencil is evolving since the id changes depending on the stencil structure. For this issue: Generate code closer to the unstructured naive interface, with proper forwarding to inner stencil classes

mroethlin added a commit that referenced this issue Oct 16, 2020
## Technical Description

This PR enables globals for the unstructured backends. Furthermore, an unreported issue where globals were not propagated from the wrapper class to the stencil class was fixed. Additionally, a bug in the unstructured cuda codegen was fixed when translating stencils that only use dense dimensions.

### Resolves / Enhances

Fixes #1030
Fixes #1028

### Notes

The methods to set and get globals in the cuda backend are on the inner stencils. This will be addressed in [this issue](#1038). Also, a method to communicate globals from FORTRAN will need to be devised (not addressed yet). 

### Testing

New tests in dawn4py and a new unstructured integration test to test the correct operation of the `CXXNaiveIco` backend. `CudaIco` backend tested manually. 

### Dependencies

This PR is independent.
@mroethlin
Copy link
Contributor Author

This is less pressing now because we offer various run_STENCILNAME wrappers now, which enable the driver to run the stencil without the need of holding a stencil object themselves. However, one problem this approach has is that the stencil object is not persistent in these run functions. This may impose severe performance penalties when the stencil needs to allocate fields in its constructor.

@mroethlin
Copy link
Contributor Author

This may require more discussion after all. In our current scope, each stencil instantiation may only hold a single stencil. From a conceptual stand point wrapping the stencil is thus not necessary.

@mroethlin mroethlin changed the title Proper Encapsulation for CUDA Ico Backend CUDA Ico Backend Refactoring (Persistence, Encapsulation) Dec 7, 2020
@mroethlin
Copy link
Contributor Author

Further discussions revealed that we may want to introduce distinct setup and run functions. This would enable us to do clean timings further down the road, and ensure proper separation of concerns (the first call to run would otherwise need to allocate temp fields). It would be preferable if both setup and run functions return void s.t. we don't need to introduce a opaque c_ptr to be returned to FORTRAN, which then needs to be managed on the FORTRAN end. A current rough sketch how this might look like reads:

class stencil {
  static double *tmpField;
  static size_t size_;
public:
  stencil() {}
  static void setup(size_t size) {
  	size_ = size;
  	tmpField = new double[size];
  }
  void run() {
    for (size_t i = 0; i < size_; i++) {
      tmpField[i] = i;
    }
  }
};
void setup(size_t size) { stencil::setup(size); }
void run() { stencil().run(); }

mroethlin added a commit that referenced this issue Jan 18, 2021
…1089)

## Technical Description

Currently, temporary fields are allocated in the constructor of the generated class. Since the API functions to the FORTRAN and cpp drivers hold the stencil on the stack, this leads to memory (de-)allocation on each call. This is fine for debugging, but not for production runs. Thus, this PR keeps that behavior for the convenience wrappers starting from host memory, but introduces static `setup` and `free` functions which have to be called by the host when using the production interface which assumes device pointers. 

Additionally, since the APIs are touched either way, globals can now be communicated from FORTRAN to the CUDA backend. 

Furthermore this PR contains a small refactoring and removes the (now) superfluous template parameter from the generated stencil class

### Resolves / Enhances

Addresses part of #1038 
Fixes #1042
### Testing

Since this affects the CUDA-ico backend this is tested by `icondusk-e2e`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant