Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery kernel loop #122

Merged
merged 28 commits into from
Dec 2, 2016
Merged

Recovery kernel loop #122

merged 28 commits into from
Dec 2, 2016

Conversation

mlange05
Copy link

This new feature enables user-level customisation of the error handling procedures in the main kernel loop through JIT and SciPy mode. The core idea is that kernels can now throw different types of errors, eg. ErrorCode.OutOfBounds and that the user can specify custom kernels to override the default behaviour to create model-specific recovery behaviour. The recovery kernels themselves are always run in Python and can be injected via the recovery={Error.MyErrorCode: MyRecoveryKernel} argument.

Caveats:

  • Minor: OutOfBounds errors do not propagate the exact sampling location that created the error as SciPy mode does, but the location of the particle at the time. Adding that would require dynamically allocating memory for meta-information, which is left for another PR.
  • Major: Kernel errors do not trigger rollback! That entails that a kernel that updates particle information before throwing an error results in inconsistent particle data, since time is not incremented the kernel will be applied again once error recovery succeeded. This can be prevented by performing all field sampling before updating particle data, as most built-in kernels do.

Michael Lange added 25 commits November 10, 2016 11:56
All kernel loops are adaptive theses days.
The whole thing is now moved into it's own sub-module `kernels.error`.
Also slightly re-writes the special-casing in the Python kernel loop and
enables the timed failure test for JIT.
This now handles explicit out-of-bounds errors thrown by the field
sampling routines and sets the according error code. In the recovery
part we then propagate the caught exceptions to the user via the default
recovery kernels.
For this we separate the low-level execution routines for JIT and Python,
check if we have failing kernels after the first execution and repeatedly
apply recovery kernels before attempting the main timestepping loop again
in a while loop. Note, that this can cause infinite looping!
We also now explicitly test the size of the final ParticleSet and that
the right error has been thrown during the execution loop.
This is necessary to propagate the error to the Python recovery loop.
@erikvansebille
Copy link
Member

Code looks great @mlange05, glad you got it all to work after what looks like Herculean coding!

I'm really glad we now have this error framework sorted, and I like the syntax of recovery={ErrorCode.Error: CustomKernel}, it is very intuitive and still quite powerful.

A few general comments/caveats that we may (or may not) want to think about right now, or defer for later versions

  1. It is very easy for the code to get into an infinite loop, now. For example, if the recovery function for ErrorOutOfBounds results in the particle still being out of bounds, then the recovery will never exit.
  2. Is there a way to tell a particle to simply be deleted when reaching the boundary? I tried in test_execution_recover_out_of_bounds, but def MoveLeft(particle): particle.delete() made the code hang (see point above). Having a way to delete a particle when it reaches a boundary is important for non-global domains.
  3. For periodic boundary conditions (global domains), it will be important to have separate ErrorOutOfBounds for the different sides of the domain. Thinking about periodic boundary conditions, we want particles which exit on the right (east) to enter on the left (west). But particles that exit on the left have to enter on the right. In fact, ideally we want to simply use the modulo (%) operator to sample the fields, if the domain is with periodic boundary conditions?
  4. We will also need to provide an 'OnLand' error, possibly via a grid.landmask field that the user can set. I'm happy to have a go at that, probably in a new branch once this Pull Request lands.

@mlange05
Copy link
Author

mlange05 commented Dec 1, 2016

Ok, thanks for the review. Regarding the comments:

  1. That is a general problem and I'm not sure there is an easy answer for that. We can probably add some kind of fail-safe mechanism at some point, but I'd like to wait and see how often people get stuck on this before we spend time building this.
  2. That was a real bug, and I believe is now fixed in the latest commit.
  3. Yes, and I think will be covered in a follow-on PR. My intuition is that is requires the additional meta-data mentioned under caveats in the PR description, since we can determine the direction from the coordinate of the OutOfBoundsError, if that is provided.
  4. Agreed. But like the above a follow-on feature that should have it's own PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants