Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec/qf - initial valid/borrowed/owned split for data #853

Merged
merged 36 commits into from
Dec 17, 2021

Conversation

jeremylt
Copy link
Member

@jeremylt jeremylt commented Dec 7, 2021

As we talked about on the telecon. I still want to do a pass to simplify the logic, but this runs and passes the tests.

@jeremylt
Copy link
Member Author

jeremylt commented Dec 7, 2021

I think we should make sure this design lines ups with @nbeams's path ahead for CeedVector precision, but I think the owned vs borrowed changes in here are important to get into main as they represent a current [minor] flaw in our interface.

Copy link
Contributor

@nbeams nbeams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeremylt for working this up so quickly. I think this will be helpful to have in place before adding multiprecision storage to CeedVector.

I assume we also want to use void * instead of CeedScalar * in the CeedQFunctionContext_[Cuda/Hip] backend data structs? And in the parameter for the data variable in CeedQFunctionContextTakeData_Cuda? (The type in the HIP version got changed.)

backends/cuda/ceed-cuda-qfunctioncontext.c Show resolved Hide resolved
backends/cuda/ceed-cuda-qfunctioncontext.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-qfunctioncontext.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-vector.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-vector.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-vector.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-vector.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-vector.c Outdated Show resolved Hide resolved
@jeremylt
Copy link
Member Author

jeremylt commented Dec 9, 2021

I assume we also want to use void * instead of CeedScalar * in the CeedQFunctionContext_[Cuda/Hip] backend data structs? And in the parameter for the data variable in CeedQFunctionContextTakeData_Cuda? (The type in the HIP version got changed.)

Thanks for catching that! Updated.

@jeremylt
Copy link
Member Author

I rearranged and clarified the logic. I think this should be easier to understand/remember and modify in future work.

@jeremylt
Copy link
Member Author

With this latest pass, I pulled the data validity checks up into the interface, simplifying the logic in the backends. I think this should be pretty close to what we need.

@jeremylt jeremylt force-pushed the jeremy/vec-borrow-own branch 2 times, most recently from 8b51f80 to 7b1c546 Compare December 15, 2021 00:06
@jeremylt jeremylt requested a review from nbeams December 15, 2021 00:08
@jeremylt
Copy link
Member Author

Everything passes the test suite. This PR got bigger than I wanted, but I think we should be good for review + squash + merge, modulo anything else that we want to fixup.

@nbeams
Copy link
Contributor

nbeams commented Dec 15, 2021

A quick test tells me we've broken something in the MFEM integration for MFEM's algebraic CEED solver -- from MFEM's ex1 with -d ceed-cuda -a -pa options. That part seems to be fixed by swapping a GetArray for GetArrayWrite, but then I'm getting a segfault from CeedOperatorGetActiveElemRestriction, which is most likely not related to this PR, I'd assume... (I cherry-picked the commits from mfem/mfem#2569 since I know the MFEM integration was already broken.)

At any rate, this may be a breaking change for some people using libCEED, depending on how it's integrated (where they should use GetArrayWrite now instead of GetArray), in case we want to be more explicit about a warning.

@jeremylt
Copy link
Member Author

Since this is a breaking change, it might be a fair point to cut a new release. We haven't done a fall release yet.

Copy link
Contributor

@nbeams nbeams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still have a few minor questions, but I'm going ahead and marking as "approve" for this version of the interface. Thanks for all the work @jeremylt

backends/blocked/ceed-blocked-operator.c Outdated Show resolved Hide resolved
backends/cuda/ceed-cuda-operator.c Outdated Show resolved Hide resolved
python/tests/test-1-vector.py Show resolved Hide resolved
@nbeams
Copy link
Contributor

nbeams commented Dec 15, 2021

Though I guess, do we need to investigate the GitLab failure on Noether?

@nbeams
Copy link
Contributor

nbeams commented Dec 15, 2021

Oh wait, I have another question about GetArrayWrite, prompted by the issue with the MFEM integration. It seems I may have run into a situation where calling GetArrayRead after GetArrayWrite throws the "no valid data to read" error, though GetArrayWrite should leave the vector in a valid state. Hmmm... will need to investigate further to make sure the issue is on the MFEM side rather than libCEED.

@jeremylt
Copy link
Member Author

ICC/IFort failure is Intel's flakey installer, unrelated to this PR

@jeremylt
Copy link
Member Author

It seems I may have run into a situation where calling GetArrayRead after GetArrayWrite throws the "no valid data to read" error, though GetArrayWrite should leave the vector in a valid state. Hmmm... will need to investigate further to make sure the issue is on the MFEM side rather than libCEED.

I think it's on the MFEM side of the integration? t124 tests this specific use case and all backends pass.

@nbeams
Copy link
Contributor

nbeams commented Dec 15, 2021

It seems I may have run into a situation where calling GetArrayRead after GetArrayWrite throws the "no valid data to read" error, though GetArrayWrite should leave the vector in a valid state. Hmmm... will need to investigate further to make sure the issue is on the MFEM side rather than libCEED.

I think it's on the MFEM side of the integration? t124 tests this specific use case and all backends pass.

Yes, after looking into it some more, I do think it's just that we broke the MFEM integration (but it was already broken).

@jeremylt jeremylt force-pushed the jeremy/vec-borrow-own branch 2 times, most recently from 3e8c176 to 9c7553f Compare December 15, 2021 21:18
@jeremylt jeremylt merged commit 9c774ed into main Dec 17, 2021
@jeremylt jeremylt deleted the jeremy/vec-borrow-own branch December 17, 2021 18:04
rezgarshakeri pushed a commit that referenced this pull request Jan 12, 2022
* vec/qf - initial valid/borrowed/owned split for data

* vec/qf - tidy logic for checking active/stale data

* minor - add missing NULL

* doc - explain VectorTakeArray update

* minor - update error messages

* test - update error message in junit/tap

* gpu - fix stray CeedScalar vs void for QFunctionContext

* vec/qf - clarify/simplify access logic

* vec - calloc host arrays when no value set to make empty

* style - minor

* style - minor

* minor - fix error messages

* vec/qf - move data validity checking to backend interface

* gpu - add missing sync error checking for qfcontext

* gpu - homogonize use of impl for backend data to reduce confusion

* vec - clarify access conditions

* python - update test for stricter vector access

* vec - minor fixes

* minor - fix ipython change

* vec - add missing declarations in ceed/backend.h

* ctx - mirror vector borrowed data check in ctx interface

* vec - add CeedVectorGetArrayWrite

* vec - consistent use of CeedVectorGetArray vs CeedVectorGetArrayWrite

* python - small vec fixes

* doc - describe vector data semantics

* magma - update restriction

* gpu - fix restr bug I added, need to sum into target

* magma - fix restriction bug

* cpu - fix restriction bug here too

* op - fix evec allocations

* julia - fix ElemRestriction for new vector access rules

* op - double check GetArray vs Read vs Write usage

* doc - small fix

* restr - clean up read/write logic for restr

* python - add vec.array_write

* magma - typo fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants