Feature Request: 'wait' subcommand #117

bheisler · 2020-10-13T14:55:54Z

Is your feature request related to a problem? Please describe.
When writing shell scripts that submit a number of pueue jobs (either to the main queue or to a specific group) it would be nice if pueue had some way to wait until all of the jobs are finished. Instead the shell script terminates and the user must periodically check the status to find out the progress.

Describe the solution you'd like
pueue should provide a subcommand which blocks until all jobs in the main queue or a specified group are complete. Ideally, it should also display some basic progress information, such as how many jobs are complete and how many are remaining.

Describe alternatives you've considered
I suppose one could run pueue follow on the last job, but that gives no information about the overall progress and I'm not even sure if follow works before the last job starts running.

The text was updated successfully, but these errors were encountered:

Nukesor · 2020-10-13T16:15:12Z

Hmm. I see how this can be handy, when writing scripts.

I'm curious though, why do you use pueue in combination with scripts. Do you use it as a "load balancer" to specify the amount of parallel running tasks?

If so, you could try to simply add another task to the queue, which should run after the previously added commands.
Dependencies can be added via the --after flag, when using pueue add. You can specify multiple tasks at once (e.g. all task id's you just added)

I just want to check, if your use-case can be covered by already existing options :)

Nukesor · 2020-10-13T16:24:29Z

Another possibility would be to use pueue status -j and parse the json output yourself.

Waiting for specific tasks to finish in a script is not an operation which pueue's CLI has been designed for :D
And I'm not sure, if it's something that belongs into the commandline interface.

IMO, CLIs should be primarily designed human usage. pueue wait will probably only be used for notifications and scripting. And notifications can be done via the callback option.

However, that's what the -j flag is for. An easily accessible interface to get data out of pueue.

Nukesor · 2020-11-30T16:38:46Z

@bheisler Hey :)

Just in case the previous comments appeared a bit harsh. I wanted to discuss the issue and state my thoughts without ignoring any of your arguments. I'm still open for suggestions and curious if your use-case can be covered by the existing tools.

Should I close the issue or do you still think this should be added?

bheisler · 2020-11-30T17:07:37Z

Well, my basic use case here is that I have a script that runs a bunch of experimental runs of our software (this is at work) on various datasets, with various options, etc. My goal is to gather information about each run to analyze later to make decisions about how my changes affected the results. Does this change to the algorithm give more or less accurate results, does it run faster or slower, does it generalize well to other data sets, that sort of thing. This is entirely a batch process - submit a bunch of tasks and wait for them to all be done. Sometimes it takes days for this whole process to be completed.

Each of these runs uses some number of CPU's in our computing cluster for an irregular amount of time. I use Pueue to control the parallelism of these test runs, so that my testing doesn't use too many CPU's and inconvenience my coworkers who are trying to use the same cluster. I also appreciate how it keeps the logs and tracks whether each task ran successfully - before I started using Pueue, I would often waste a lot of time repeating the whole batch process because I had no easy way to re-run only the tasks that failed.

But the abstraction leaks a bit because Pueue's model - submitting tasks to a queue to be run asynchronously - doesn't match that of a normal shell, where the shell is blocked until the task is completed. Shells can submit asynchronous background tasks and use the wait command to block until they're all completed, but this mechanism gives no concurrency control and doesn't track the success or failure or log output of each task. The abstraction is still leaky, especially in the presence of Unix Signals, but it mostly works. Pueue has no equivalent to this command.

It's not critically important, just a conspicuous gap in the user interface. I don't think I'm trying to do any sort of complex scripting or automated interop with Pueue, just submit a batch of tasks to run in parallel and wait until they're all done. I certainly could rig up something for myself that would do the same job, but the fact that I have to do that for something this basic seems like something that Pueue might want to fix. If that's hard to do, though, maybe it's not worth the effort.

Nukesor · 2020-11-30T18:31:30Z

Ah ok.

Just to clarify, you want to execute something after all tasks in the current queue are finished, regardless of the result/exit_code of the those tasks.

That explains why dependencies won't work, since they require all previous tasks to finish successfully. I didn't think about this requirement yet!

Getting the JSON output and checking the current status of the current queue would still work, but I can see how this is pretty unconvenient for such a simple use-case. A dedicated wait command is probably not a bad idea for simple scripting purposes.

There're a few things that need to be clarified, though.

Pueue allows to create groups, which are basically multiple concurrently running queues.
What do you think about these defaults/options for handling the various queues:

pueue wait waits until the default queue is done
pueue wait -g $GROUP waits for this specific group
pueue wait -a waits for all groups + default queue

Further questions:

How should the wait command behave, if the daemon get's paused? There's the pause_on_failure option, which prevents any new tasks from being spawned, if any task fails. This could result in the daemon being paused with wait running and no one noticing it... Should we care about this?
What about tasks with other statuses (Paused, Stashed or Locked). Should wait only trigger, if every task in the queue is in some state of Done? Or should some states be actively ignored?
This could otherwise result in wait blocking indefinitely, until somebody changes the state manually.

bheisler · 2020-12-01T17:36:57Z

Those options make sense to me for handling different queues.

With regard to the paused state, I think it would be mostly fine to just block until the jobs are manually unpaused and completed.

I could see someone mistakenly wasting time waiting for jobs that aren't running through. I don't think it's required, but the wait command could print a message when one of the jobs it's waiting for changes state. If it printed "Job 123 has changed state to PAUSED" that would at least alert the user that something happened and they should investigate further.

Nukesor · 2020-12-01T18:12:49Z

I like the idea of logging the current progress.
We could even go so far as to log all changes.
E.g.

...
Task 2 changed to "running"
Task 1 finished with exit code 128
Task 2 has been paused
...

Nukesor · 2020-12-02T21:55:28Z

I drafted a solution for this feature in the linked PR :)

Feel free to check it out. I would be interested in your opinion.

Nukesor · 2020-12-09T11:42:36Z

Ping @bheisler
I would like to release a new version soonish and would love to see this included :)
Since you requested this feature, it would be nice to know if the PR lives up to your expectations.

bheisler · 2020-12-09T16:38:58Z

Yeah, that seems to do what I need it to.

bheisler · 2020-12-09T23:35:10Z

Thanks for the help!

Nukesor · 2020-12-09T23:37:49Z

You're welcome 🙂

bheisler added the t: Feature A new feature that needs implementation label Oct 13, 2020

Nukesor added the t: Discussion label Oct 19, 2020

Nukesor mentioned this issue Dec 2, 2020

'wait' subcommand #145

Merged

Nukesor removed the t: Discussion label Dec 4, 2020

Nukesor closed this as completed in #145 Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: 'wait' subcommand #117

Feature Request: 'wait' subcommand #117

bheisler commented Oct 13, 2020

Nukesor commented Oct 13, 2020 •

edited

Nukesor commented Oct 13, 2020 •

edited

Nukesor commented Nov 30, 2020

bheisler commented Nov 30, 2020

Nukesor commented Nov 30, 2020 •

edited

bheisler commented Dec 1, 2020 •

edited

Nukesor commented Dec 1, 2020 •

edited

Nukesor commented Dec 2, 2020 •

edited

Nukesor commented Dec 9, 2020

bheisler commented Dec 9, 2020

bheisler commented Dec 9, 2020

Nukesor commented Dec 9, 2020

Feature Request: 'wait' subcommand #117

Feature Request: 'wait' subcommand #117

Comments

bheisler commented Oct 13, 2020

Nukesor commented Oct 13, 2020 • edited

Nukesor commented Oct 13, 2020 • edited

Nukesor commented Nov 30, 2020

bheisler commented Nov 30, 2020

Nukesor commented Nov 30, 2020 • edited

bheisler commented Dec 1, 2020 • edited

Nukesor commented Dec 1, 2020 • edited

Nukesor commented Dec 2, 2020 • edited

Nukesor commented Dec 9, 2020

bheisler commented Dec 9, 2020

bheisler commented Dec 9, 2020

Nukesor commented Dec 9, 2020

Nukesor commented Oct 13, 2020 •

edited

Nukesor commented Oct 13, 2020 •

edited

Nukesor commented Nov 30, 2020 •

edited

bheisler commented Dec 1, 2020 •

edited

Nukesor commented Dec 1, 2020 •

edited

Nukesor commented Dec 2, 2020 •

edited