-
Notifications
You must be signed in to change notification settings - Fork 115
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fatal: hook Fail Dir:node_fail #1872
Comments
The feature that is causing this The error message that you got is likely a result of changing the Makeflow and rerunning it without cleaning the previous run. After changing the makeflow you may have gotten a message that the makeflow was corrupted in reference to the makeflowlog. If you deleted the makeflowlog, the fail directories would not be removed and Makeflow would see them and assume they were there prior to Makeflow and not remove them. There may be other reasons why it persisted and I will look into those as well. Unfortunately this will happen on each fail dir that exists if that same node fails on a different run. Makeflow is conservative about files it did not create. As a result when the makeflowlog that recorded making this files is deleted Makeflow now treats these files as pre-existing and will not remove them. |
Thanks, that is indeed what happened. Just a further question: when I get the error "corrupted makeflowlog", maybe because I changed Makeflow file, how can I preserve the already executed commands. I have 5000+ jobs, and starting again each time is a risk I cannot take. Bw. |
This is not directly answering you question, but is there a reason you need all 5000+ in a single Makeflow. Are they structured as a set of pipelines or as an intricate tree of tasks? |
My pipeline is just 3 independent benchmarks with many combination of parameters each, that go to one script for descriptive statistics of all of them. |
@nhazekam Can this be closed? |
@stemangiola Please let us know if this is still an issue for you. |
Hello,
I got this error, I cannot understand the cause.
2018/02/22 20:03:13.66 makeflow[15674] fatal: hook Fail Dir:node_fail returned 1
received signal 15 (Terminated), cleaning up remote jobs and files...
Killed
Any suggestion is appreciate.
Here the debug file
makeflow.zip
The text was updated successfully, but these errors were encountered: