Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torque qsub equivalent to SGE qsub -sync or LSF qsub -K #268

Open
gbeane opened this issue Oct 2, 2014 · 11 comments
Open

Torque qsub equivalent to SGE qsub -sync or LSF qsub -K #268

gbeane opened this issue Oct 2, 2014 · 11 comments

Comments

@gbeane
Copy link

gbeane commented Oct 2, 2014

Both SGE and LSF have qsub options that cause the qsub command to wait until the job has completed before returning. Some 3rd party pipeline applications make use of this feature, which makes porting them to Torque more difficult (it necessitates the use of a wrapper script that does the qsub and then polls the job using qstat in a loop until the job finishes).

@adeslatt
Copy link

adeslatt commented Apr 7, 2016

Is it not possible using the qsub -Wdepends feature on PBS/Torque?
qsub -Wdepend=afterok:$newjobid testing.sh

@knielson
Copy link
Contributor

knielson commented Apr 7, 2016

That should work. Make sure keep_competed is set in qmgr. Also, $newjobid
must already be queued.
On Apr 7, 2016 3:41 AM, "adeslatt" notifications@github.com wrote:

Is it not possible using the qsub -Wdepends feature on PBS/Torque?
qsub -Wdepend=afterok:$newjobid testing.sh


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#268 (comment)

@gbeane
Copy link
Author

gbeane commented Apr 28, 2016

no, using job dependencies would not work for this. I'm talking about having the qsub command block until the job you are submitting completes, so instead of having qsub return immediately with a job ID it would block until the job runs. The return value of qsub would be the return value of the job. Both SGE and LSF have this feature, and some workflow tools make use of it. Switching to dependencies would require altering the workflow tool. (my own workflow tool uses dependencies, but some rely on this blocking submission behavior.)

@knielson
Copy link
Contributor

Glen,

So make a qsub that queues and runs the job in one step as far as the user
is concerned. Right?

Ken

On Thu, Apr 28, 2016 at 8:44 AM, Glen Beane notifications@github.com
wrote:

no, using job dependencies would not work for this. I'm talking about
having the qsub command block until the job you are submitting completes,
so instead of having qsub return immediately with a job ID it would block
until the job runs. The return value of qsub would be the return value of
the job. Both SGE and LSF have this feature, and some workflow tools make
use of it. Switching to dependencies would require altering the workflow
tool. (my own workflow tool uses dependencies, but some rely on this
blocking submission behavior.)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#268 (comment)

[image: Adaptive Computing] http://www.adaptivecomputing.com
[image: Twitter] http://twitter.com/AdaptiveMoab [image: LinkedIn]
http://www.linkedin.com/company/448673?goback=.fcs_GLHD_adaptive+computing_false_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&trk=ncsrch_hits
[image:
YouTube] http://www.youtube.com/adaptivecomputing [image: GooglePlus]
https://plus.google.com/u/0/102155039310685515037/posts [image: Facebook]
http://www.facebook.com/pages/Adaptive-Computing/314449798572695?fref=ts
[image:
RSS] http://www.adaptivecomputing.com/feed
Ken Nielson Sr. Software Engineer
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com

@gbeane
Copy link
Author

gbeane commented Apr 28, 2016

Yes. qsub --sync (SGE) or bsub -K (LSF) will block until the job finishes, and their return value will be the same as the job.

Here is the LSF documentation for bsub -K:

-K

Submits a job and waits for the job to complete. Sends the message "Waiting for dispatch" to the terminal when you submit the job. Sends the message "Job is finished" to the terminal when the job is done. If LSB_SUBK_SHOW_EXEC_HOST is enabled in lsf.conf, also sends the message "Starting on execution_host" when the job starts running on the execution host.

You are not able to submit another job until the job is completed. This is useful when completion of the job is required to proceed, such as a job script. If the job needs to be rerun due to transient failures, bsub returns after the job finishes successfully. bsub exits with the same exit code as the job so that job scripts can take appropriate actions based on the exit codes. bsub exits with value 126 if the job was terminated while pending.

You cannot use the -K option with the -I, -Ip, or -Is options.

@knielson
Copy link
Contributor

knielson commented Jun 8, 2016

We could make this work pretty easily. I will bring it up with the Torque team.

@knielson
Copy link
Contributor

knielson commented Jun 8, 2016

Using this functionality with Torque would mean you would bypass any scheduling benefits. It would run on a first fit basis.

@pipitone
Copy link

pipitone commented Jun 9, 2016

Just to point out that currently you can get this to work with PBS using this hack:

qsub -W depend=after:$jobid -I -x true

@gbeane
Copy link
Author

gbeane commented Jun 9, 2016

Using this functionality with Torque would mean you would bypass any scheduling benefits. It would run on a first fit basis.

No Ken,

The job should be scheduled the same as any other job. That is, Moab (if you use it) would apply the same prioritization/policies to this job as if it had been submitted without this qsub option. Please read the SGE qsub --sync and LSF bsub -K documentation to see what behavior we are suggesting. If it shortcircuited scheduling and just took the next available slot then that would not be the correct implementation.

The only difference is the qsub command blocks waiting for the job to be scheduled and run. Then once the job is finished, qsub will exit with the return value of the job (so if the job script exited with a non-zero value, then qsub will return that same value). It could print a message, similar to qsub -I, saying that "job XXXX waiting to start".

@knielson
Copy link
Contributor

knielson commented Jun 9, 2016

Glen,

Thanks for the clarification. That requires more work. Still doable but we
won't be able to do it like a Moab backfill job.

Ken

On Wed, Jun 8, 2016 at 7:50 PM, Glen Beane notifications@github.com wrote:

Using this functionality with Torque would mean you would bypass any
scheduling benefits. It would run on a first fit basis.

No Ken,

The job should be scheduled the same as any other job. That is, Moab (if
you use it) would apply the same prioritization/policies to this job as if
it had been submitted without this qsub option. Please read the SGE qsub
--sync and LSF bsub -K documentation to see what behavior we are
suggesting. If it shortcircuited scheduling and just took the next
available slot then that would not be the correct implementation.

The only difference is the qsub command blocks waiting for the job to be
scheduled and run. Then once the job is finished, qsub will exit with the
return value of the job (so if the job script exited with a non-zero value,
then qsub will return that same value). It could print a message, similar
to qsub -I, saying that "job XXXX waiting to start".


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#268 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ACCEHIJhSuuVp0vuy93bEJ0N0gOlBQb8ks5qJ3FugaJpZM4CqAa7
.

[image: Adaptive Computing] http://www.adaptivecomputing.com
[image: Twitter] http://twitter.com/AdaptiveMoab [image: LinkedIn]
http://www.linkedin.com/company/448673?goback=.fcs_GLHD_adaptive+computing_false_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2_*2&trk=ncsrch_hits
[image:
YouTube] http://www.youtube.com/adaptivecomputing [image: GooglePlus]
https://plus.google.com/u/0/102155039310685515037/posts [image: Facebook]
http://www.facebook.com/pages/Adaptive-Computing/314449798572695?fref=ts
[image:
RSS] http://www.adaptivecomputing.com/feed
Ken Nielson Sr. Software Engineer
+1 801.717.3700 office +1 801.717.3738 fax
1712 S. East Bay Blvd, Suite 300 Provo, UT 84606
www.adaptivecomputing.com

@gdevenyi
Copy link

Wondering how adding this feature is going? My tool for abstracting different cluster systems has a blocking submission feature stuck on this:CoBrALab/qbatch#103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants