Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unifi job does not try again after failure #107

Closed
1activegeek opened this issue Feb 21, 2019 · 6 comments
Closed

[BUG] Unifi job does not try again after failure #107

1activegeek opened this issue Feb 21, 2019 · 6 comments
Labels
approved Enhancement or request that was approved to be worked on bug Something isn't working in-next-release In the next release

Comments

@1activegeek
Copy link

Is your feature request related to a problem? Please describe.
Frequently I'm hitting the scenario of a timeout for Varken connecting to a data source when certain types of maintenance is performed, or if there is temporary outages on an application or data source. This can happen in scenarios where perhaps the Unifi Controller is inaccessible for a period while network maintenance happens (such as software upgrades on network components or the controller itself). It also happens regularly at this point on a weekly basis when my unRAID system is scheduled to run a backup. On restart, it seems Varken frequently starts faster than the Arrs do and thus it can't retrieve this data and cancels the job.

Describe the solution you'd like
I'd like to see the option to enable longer timeouts, or an exponential growing check timeout. Something to not have the system just decide that if it can't reach a data source for a short period, that it should cancel the job and stop collecting data points.

Describe alternatives you've considered
My alternative at this point is creating a script to restart the Varken container on a more frequent basis to reset any jobs that may have been canceled due to reaching the timeout.

@1activegeek 1activegeek added the awaiting-triage Request awaiting triage label Feb 21, 2019
@dirtycajunrice
Copy link
Member

Final fixes for this pushed in a4910e8 .

@dirtycajunrice dirtycajunrice added bug Something isn't working approved Enhancement or request that was approved to be worked on in-next-release In the next release and removed awaiting-triage Request awaiting triage labels Mar 12, 2019
@samwiseg0
Copy link
Member

The jobs should not cancel. The only one we missed was unifi. Can you test develop and verify?

@samwiseg0 samwiseg0 changed the title [Feature Request] Expand timeouts before canceling a job [BUG] Expand timeouts before canceling a job Mar 12, 2019
@samwiseg0 samwiseg0 changed the title [BUG] Expand timeouts before canceling a job [BUG] Unifi job does not try again after failure Mar 12, 2019
@1activegeek
Copy link
Author

I'm already on develop, so I'm assuming my unRAID will update it this weekend when it runs through restarts. I don't think it had an issue this weekend when it failed to connect. Could have just been luck of the draw this week. 😄
Thanks for the update and the work, appreciate it!!

@samwiseg0
Copy link
Member

We just pushed to master so you should be gtg there too.

@samwiseg0
Copy link
Member

@1activegeek We had another user with a similar problem. Make sure Influx has started and is running before Varken. Without that we will exit since it is the bases for everything.

@1activegeek
Copy link
Author

I don't think I had that issue, if that was the case I'd imagine they would all fail as well, correct? In my scenario, it was ONLY the unifi job that was canceled. We'll see when this weekend updates come around.

@Boerderij Boerderij locked and limited conversation to collaborators Mar 19, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Enhancement or request that was approved to be worked on bug Something isn't working in-next-release In the next release
Projects
None yet
Development

No branches or pull requests

3 participants