Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CeedQFunctionContext synchronisation logic #183

Closed
YohannDudouit opened this issue Dec 17, 2018 · 11 comments · Fixed by #596
Closed

CeedQFunctionContext synchronisation logic #183

YohannDudouit opened this issue Dec 17, 2018 · 11 comments · Fixed by #596

Comments

@YohannDudouit
Copy link
Contributor

I have issues with the Fortran QFunction/Operator tests because a context magically appears in the QFunction. I think that the CeedQFunctionSetContext function should call a backend function that sets/updates the ctx, otherwise it's difficult to avoid unnecessary systematic memCpy for a GPU backend. This drives us to the question of knowing if we would want to have the same memory access logic for the ctx as for the Vectors, with GetCtx(), GetCtxRead(), RestoreCtx() and RestoreCtxRead().
Besides, I feel it's wrong that the Fortran interface is using the ctx as it's meant to be used by the backend (If I understood correctly). If a backend wants to use the context for some reason, it is currently impossible as the Fortran interface is already using it.
In my case the tests fail because a context has been created by the Fortran interface after the creation of the QFunction, I imagine I could memCpy the data on the GPU, but this data is actually useless for my backend, which makes me really think that there is a design flaw here.

@jeremylt
Copy link
Member

jeremylt commented Dec 21, 2018

There is no Fortran test that uses context for anything other than holding the Fortran function pointer, so it shouldn't be causing any problems for you as your .cu shouldn't use any context data. At worst it induces an unused data copy.

This will cause a problem in the future when we have a Fortran test that uses context in the QFunction, but I have put together a fix in #185.

To clarify, the ctx field is not for backend use, it is for user data that gets used at all quadrature points, such as parameters. A backend should not modify the ctx except inside of the user QFunction.

@YohannDudouit
Copy link
Contributor Author

My concern is exactly the unused data copies. (The bug has been corrected)

@jeremylt
Copy link
Member

My fix returns a ctxsize of 0 when only Fortran ctx is set and should prevent those copies.

@YohannDudouit
Copy link
Contributor Author

Copying static ctx every time is also a concern I think.

@jeremylt
Copy link
Member

Does the CUDA backend do a copy when ctxsize = 0? OCCA only copies when ctxsize > 0 so I am preventing that unneeded copying in that backend with my fix.

@YohannDudouit
Copy link
Contributor Author

YohannDudouit commented Dec 21, 2018

Having ctxsize > 0 does not imply that the context is dynamic. (ceed/ex1 for instance)

@jeremylt
Copy link
Member

jeremylt commented Dec 21, 2018

I think you are melding this issue with #86. They are separate issues.

Issue #86: Static vs Dynamic ctx
Issue #183: Fortran ctx bug

Or was this issue meant to be a redo of issue #86?

@jeremylt
Copy link
Member

Hey, @jedbrown and @YohannDudouit, what do you think about adding a state counter to ctx data, like we have for vectors? If a user function changes the ctx, CeedQFunctionSetContext should be set again, to be safe, as the context may be set on the device or host. This wouldn't make a big difference for CPU only code, but I think this would address our concern with extra data copies of static data to the GPU. I figure this should be a small change, we would just need to decide on the symantics for the interface.

@jedbrown
Copy link
Member

jedbrown commented Sep 16, 2019 via email

@jeremylt
Copy link
Member

I like that idea - I could work on it after MC9.

@jeremylt
Copy link
Member

I plan on picking this up on Monday (provided that #579 doesn't need any more major work).

My plan is to make a new ctx object. It will include

  • Interface
    • state counter with only single access allowed at at time
    • set method
    • get method (with memtype)
  • Backends
    • host/device pointers as needed
    • sync status (as with vector)

It should be sufficient to have a single read/write access interface for the data, vs separate write and read only access.

This should make a minor performance improvement. (not included in the BPs, but will be in other examples)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants