Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store job process and queue id #86

Closed
benfitzpatrick opened this issue Aug 13, 2012 · 8 comments
Closed

Store job process and queue id #86

benfitzpatrick opened this issue Aug 13, 2012 · 8 comments
Assignees
Milestone

Comments

@benfitzpatrick
Copy link
Contributor

It would be useful to store a job's process and/or queue id in order to query the status of a running job or kill it.

@matthewrmshin
Copy link
Contributor

With this information in place, it should be possible to provide a functionality in Cylc to manually or automatically query the status of a task in a queueing system. If a running task is not found in the queueing system where it should be, it should be marked as failed.

@cylc
Copy link
Collaborator

cylc commented Aug 16, 2012

Yes, this would be useful. It would handle nasty external hard-kills that we can currently only handle with timeouts.

@dpmatthews
Copy link
Contributor

In addition to the queue id and process id we also need to record what system the task was submitted to (noting that the host can be a command so you can't just rely on the suite config) - this info will need to be available after a restart.
We should also record any task messages and the task exit code. Upon restart or task submission/execution timeout (or on demand), cylc can then query all submitted or running tasks to update their status.
See also issue #115.

@hjoliver
Copy link
Member

hjoliver commented Dec 4, 2012

I thought I'd already noted these points:

  • the kind of ID stored and how to query it, will presumably need to be associated with the job submission method
  • detaching tasks (internal job submissions), such as pre-Rose UM tasks, will not fit into this framework.

@matthewrmshin
Copy link
Contributor

The following ideas were sent to @hjoliver via email originally:

Introduce the task information/event log. E.g. $HOME/cylc-run/$SUITE/log/job/$TASK_LOG_ROOT.log.

For each task job, create an event log file to store common information. E.g.:

  • Submit-time=(ISO8601 date-time)
  • Submit-method=(llsubmit, qsub, etc.)
  • Submit-method-id=(Job ID returned by llsubmit, qsub, etc.)
  • Run-init-time=(ISO8601 date-time)
  • (Other runtime messages.)
  • Run-exit-time=(ISO8601 date-time)
  • Run-status=(pass, fail, return code?)

It will also have markers to indicate whether a message has been sent to the suite or not.

The file has these uses:

  • cylc restart can use its information to know what state a task is in.
  • If a task times out, the suite can inspect the file to see what state it is in.
  • cylc task message commands will write their event messages to the file.
    On a successful call to notify the suite of the event,
    it will insert a marker at the end of the file.
    A subsequent cylc task message command will also write its message to the file.
    It will then look for the marker in the file to determine what message(s) to send to the suite.
    If, for example, the previous task message command has failed to contact the suite,
    the current command will send both the previous message and the current message to the suite.

E.g.:

submit-time=20121225T010203
submit-method=llsubmit
submit-method-id=1234567
>
run-init-time=20121225T020406
>
message-1=merry christmas
run-exit-time=20121225T030609
run-status=0

A line containing > means that the events before it are sent to the suite. In this case, events after message-1 were not sent to the suite.

@matthewrmshin
Copy link
Contributor

#282: Job process ID and submit method ID (at, background, loadleveler) are now stored.

@hjoliver
Copy link
Member

hjoliver commented Feb 8, 2013

@matthewrmshin - can this issue be closed now? I can open new smaller issues for extending this functionality to other job submission methods.

@matthewrmshin
Copy link
Contributor

@hjoliver If you are happy for this to close, please feel free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants