-
-
Notifications
You must be signed in to change notification settings - Fork 431
Description
Describe the bug
First of all, this probably appears only in really large setups, where the recovery is taking more time than a polling cycle.
When connectivity returns after an outage in a remote poller setup, the poller recovery does not work properly.
It spawns multiple processes and causes a lot of DB stress, because it does not take into account that a process is already running.
I also found the actual flaw in the code and i think i was able to fix it.
To Reproduce
Let a really large (in terms of data sources) remote poller go into "Heartbeat" mode and let it recover again.
You will see multiple processes running after a while, which are trying all the same: starting the data sync from scratch over and over again.
Expected behavior
Poller recovery should run only once.
Every subsequent call should detect that a process is running already.
Additional context
The problem is, that the "recovery_pid" is nowhere inserted into the corresponding table.
It is only deleted at one point in the code.
This way the subsequent processes always see an empty value (because there is no "recovery_pid" entry in the "settings" table) and will start the process all over again.