Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add guard for missing additional_store before executing job #76

Closed
ml-evs opened this issue Feb 19, 2024 · 2 comments · Fixed by #59
Closed

Add guard for missing additional_store before executing job #76

ml-evs opened this issue Feb 19, 2024 · 2 comments · Fixed by #59

Comments

@ml-evs
Copy link
Member

ml-evs commented Feb 19, 2024

As tested in #59, if a job requires an additional store for its output, and that store is not defined in the JFR config, then the job will only fail at the point of trying to access that store.

I think we should add a guard in the runner to prevent this job from being submitted. I'm not sure what JobState it should be left with (perhaps a new one, HELD or some such, so it can be easily rerun once the runner config is updated?) Thoughts @gpetretto?

@gpetretto
Copy link
Contributor

Thanks for raising this point. It would be indeed better to stop it before the execution of the Job, also because I am afraid that one could not just rerun the last "remote" step, but will need to rerun the whole Job.

I would consider a few points:

  • I am open to discuss this, but I would not introduce a new state, as there are already quite a lot. I think the error Is fine. I would raise a RemoteError with no_retry=True here. This avoids that the runner retries multiple times.
  • I think the check in Add testing and runtime checks for additional stores #59 is good, maybe we can refine it, if needed. I still need to go through the additional store logic again.
  • Maybe the same check could be already performed when the jobs are inserted and directly refuse to add them to the DB? This will already inform the user of the issue, instead of waiting for it to be submitted. The other check needs to stay in place as well, since the Job with a specific additional store may be added dynamically or the user could change the config file, and could thus reach the submission any way.

@ml-evs
Copy link
Member Author

ml-evs commented Feb 20, 2024

* I am open to discuss this, but I would not introduce a new state, as there are already quite a lot. I think the error Is fine. I would raise a `RemoteError` with `no_retry=True` here. This avoids that the runner retries multiple times.

I agree it is only worth introducing if we then act on that new state in a specific way; remote error would be the state with the correct behaviour (in that it can be restarted after a config update without issue) but it feels like weird naming for an issue referring to the "local" config (or the runner config, at least). I will implement it with the RemoteError for now (and thanks for the no_retry tip).

* Maybe the same check could be already performed when the jobs are inserted and directly refuse to add them to the DB? This will already inform the user of the issue, instead of waiting for it to be submitted. The other check needs to stay in place as well, since the Job with a specific additional store may be added dynamically or the user could change the config file, and could thus reach the submission any way.

Yes I thought about this; it would need to be run immediately after the call to job.to_db_dict() (as far as I can recall). If I can extract a simple function from the existing check I will try calling it in both places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants