You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Genie, the Janitor Thread periodically runs to check jobs which don't have update time changed in the last configured time. At the same time, when a new job is submitted after the initial init in the db, genie tries to run an update query changing status to RUNNING after launching the job.
The above two threads end up in a deadlock, causing the job to remain in INIT state in db, but the process gets launched successfully.
We should retry this update call to see if that fixes the issue.
The text was updated successfully, but these errors were encountered:
… 2*.
* #67 - Added retries while updating the job to status running after successful launch.
* Changed scripts to create a sample cluster with version 2.4.0.
Deployed and verified the fix in our production environment. The job was successfully submitted in-spite of hitting a deadlock in production and there has been no 5xx errors since deployment last week. Will re-open if the problem happens again.
In Genie, the Janitor Thread periodically runs to check jobs which don't have update time changed in the last configured time. At the same time, when a new job is submitted after the initial init in the db, genie tries to run an update query changing status to RUNNING after launching the job.
The above two threads end up in a deadlock, causing the job to remain in INIT state in db, but the process gets launched successfully.
We should retry this update call to see if that fixes the issue.
The text was updated successfully, but these errors were encountered: