Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PB-Assembly #12

Closed
yingzhang121 opened this issue Sep 14, 2018 · 1 comment
Closed

PB-Assembly #12

yingzhang121 opened this issue Sep 14, 2018 · 1 comment

Comments

@yingzhang121
Copy link

Operating system
Linux, CentOS7

Package name
falcon-kit 1.2.3
pypeflow 2.1.0

Describe the issue
I am running the falcon job in our Moab PBS system, so I don’t have the -W block=T option available. And given this post:adaptivecomputing/torque#268, I don’t think Moab will implement some blocking function in the near future.

So instead of using “pwatcher_type=blocking”, I used “pwatcher_type=fs_based”.

It turned out this option is not well implemented, for example, for some of the tasks, it worked, but for others, it failed.
When it fails, the error message will (always) be something like:
[INFO]CALL:
qdel Pedec92ba939bd0
qdel: illegally formed job identifier: Pedec92ba939bd0

I suspect this means falcon tries to kill an already-killed job after it detects the "run.sh.done" file. Normally A simply re-submission of the same job script will resume the whole pipeline. But I am just wondering whether we could fix this issue so that people don't need to re-submit the same job over and over.

Then I tried to use a combination of “-I -x” in hope to make an equivalent case for -W block=T (see this post: https://stackoverflow.com/questions/5982857/making-qsub-block-until-job-is-done). This time, the job keeps running, however, even the pbs job is “done” (status C in queue), there is no “run.sh.done” file generated in the designated directory. I am not sure whether this is the default behavior, or I hit another bug.

Error message
when using "pwatcher_type=fs_based":
[INFO]CALL:
qdel Pedec92ba939bd0
qdel: illegally formed job identifier: Pedec92ba939bd0

@pb-cdunn
Copy link

Then I tried to use a combination of “-I -x” in hope to make an equivalent case for -W block=T (see this post: https://stackoverflow.com/questions/5982857/making-qsub-block-until-job-is-done). This time, the job keeps running, however, even the pbs job is “done” (status C in queue), there is no “run.sh.done” file generated in the designated directory. I am not sure whether this is the default behavior, or I hit another bug.

If you really have a blocking qsub call, you will definitely get run.sh.done upon successful completion.

The fs_based process-watcher is very difficult to maintain. Please file an issue at

And try to create a simpler example, so we can focus solely on pypeFLOW.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants