Possible race condition on Job activation between Gateway and Broker #4239
Labels
kind/bug
Categorizes an issue or PR as a bug
kind/toil
Categorizes an issue or PR as general maintenance, i.e. cleanup, refactoring, etc.
scope/broker
Marks an issue or PR to appear in the broker section of the changelog
Milestone
Description
During our engine latency investigation we observed sometimes a high latency of the job activation. We found out that this high latency stand in relation to the
LongPollingTimeout
. Further think about this, bring us to the point that it might be a race condition between gateway and broker.If the Gateway requested jobs multiple times after a certain amount (i think after three retries) it will switch to long polling. It might happen that between switching to long polling and establishing the job subscription a new job is created on the broker side and the notification which is send might get lost. After the long poll timesout the client will retry and the job will be activated successfully.
This can also be avoid with using more then one worker.
The text was updated successfully, but these errors were encountered: