-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exponential backoff implementation #105
Conversation
JTSIV1
commented
Nov 27, 2023
- Should prevent short server blips from crashing job/core
Hey @JTSIV1 , great job so far. We discussed the issue being that the job doesn't emit the job end event. From looking over the code, I think maybe what we need to do is in the portion of the code that says "we waited too long give up" (this if: https://github.com/cybergis/cybergis-compute-core/pull/105/files#diff-5483e642678bd804cbce6f7ada5dbc64abb41ec9df389d9986471455ba9ac1fcR147) is to emit the JOB_FAILED event ourselves with a message like "Failed to connect to HPC". Not sure which of these two syntaxes used elsewhere in the supervisor should be used: (1) cybergis-compute-core/src/Supervisor.ts Line 191 in 3247c1c
If this works you should also be able to remove the Something interesting to check with the current code: does your current "job fail" show a job fail event in the database? I'm guessing not because it seems that the Emitter is the one updating that in the DB. So one way to check if this change is working is to pull up the database and go to the events table to look for these JOB_FAILED/JOB_ENDED events. |
} | ||
try { | ||
console.log("jdebug ok"); | ||
await sleep(wait * 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await sleep
will block the execution of the job as they are executed sequentially. A more appropriate method might be to wrap the createMaintianerWorker
at line 113 this.createMaintainerWorker(job);
within a Promise, use setTimeout
as the timer, and interrupt the execution using reject
:
new Promise((resolve, reject) => {
console.log("Function starts");
setTimeout(() => {
console.log("setTimeout callback runs");
reject('Interrupted by setTimeout');
}, 1000);
// Simulate some work
for (let i = 0; i < 5; i++) {
console.log(`Processing ${i}`);
}
resolve('Completed successfully');
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you can use this to emit a failure event:
self.emitter.registerEvents(
job,
"JOB_REGISTERED",
`job [${job.id}] is registered with the supervisor, waiting for initialization`
);
Abandoning this effort in favor of: #108 |