-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unifi job does not try again after failure #107
Comments
Final fixes for this pushed in a4910e8 . |
The jobs should not cancel. The only one we missed was unifi. Can you test develop and verify? |
I'm already on develop, so I'm assuming my unRAID will update it this weekend when it runs through restarts. I don't think it had an issue this weekend when it failed to connect. Could have just been luck of the draw this week. 😄 |
We just pushed to master so you should be gtg there too. |
@1activegeek We had another user with a similar problem. Make sure Influx has started and is running before Varken. Without that we will exit since it is the bases for everything. |
I don't think I had that issue, if that was the case I'd imagine they would all fail as well, correct? In my scenario, it was ONLY the unifi job that was canceled. We'll see when this weekend updates come around. |
Is your feature request related to a problem? Please describe.
Frequently I'm hitting the scenario of a timeout for Varken connecting to a data source when certain types of maintenance is performed, or if there is temporary outages on an application or data source. This can happen in scenarios where perhaps the Unifi Controller is inaccessible for a period while network maintenance happens (such as software upgrades on network components or the controller itself). It also happens regularly at this point on a weekly basis when my unRAID system is scheduled to run a backup. On restart, it seems Varken frequently starts faster than the Arrs do and thus it can't retrieve this data and cancels the job.
Describe the solution you'd like
I'd like to see the option to enable longer timeouts, or an exponential growing check timeout. Something to not have the system just decide that if it can't reach a data source for a short period, that it should cancel the job and stop collecting data points.
Describe alternatives you've considered
My alternative at this point is creating a script to restart the Varken container on a more frequent basis to reset any jobs that may have been canceled due to reaching the timeout.
The text was updated successfully, but these errors were encountered: