[BUG] Unifi job does not try again after failure #107

1activegeek · 2019-02-21T03:24:52Z

Is your feature request related to a problem? Please describe.
Frequently I'm hitting the scenario of a timeout for Varken connecting to a data source when certain types of maintenance is performed, or if there is temporary outages on an application or data source. This can happen in scenarios where perhaps the Unifi Controller is inaccessible for a period while network maintenance happens (such as software upgrades on network components or the controller itself). It also happens regularly at this point on a weekly basis when my unRAID system is scheduled to run a backup. On restart, it seems Varken frequently starts faster than the Arrs do and thus it can't retrieve this data and cancels the job.

Describe the solution you'd like
I'd like to see the option to enable longer timeouts, or an exponential growing check timeout. Something to not have the system just decide that if it can't reach a data source for a short period, that it should cancel the job and stop collecting data points.

Describe alternatives you've considered
My alternative at this point is creating a script to restart the Varken container on a more frequent basis to reset any jobs that may have been canceled due to reaching the timeout.

dirtycajunrice · 2019-03-12T00:38:42Z

Final fixes for this pushed in a4910e8 .

samwiseg0 · 2019-03-12T00:39:27Z

The jobs should not cancel. The only one we missed was unifi. Can you test develop and verify?

1activegeek · 2019-03-12T02:47:20Z

I'm already on develop, so I'm assuming my unRAID will update it this weekend when it runs through restarts. I don't think it had an issue this weekend when it failed to connect. Could have just been luck of the draw this week. 😄
Thanks for the update and the work, appreciate it!!

samwiseg0 · 2019-03-12T02:48:05Z

We just pushed to master so you should be gtg there too.

samwiseg0 · 2019-03-12T18:24:20Z

@1activegeek We had another user with a similar problem. Make sure Influx has started and is running before Varken. Without that we will exit since it is the bases for everything.

1activegeek · 2019-03-12T23:21:57Z

I don't think I had that issue, if that was the case I'd imagine they would all fail as well, correct? In my scenario, it was ONLY the unifi job that was canceled. We'll see when this weekend updates come around.

1activegeek added the awaiting-triage Request awaiting triage label Feb 21, 2019

samwiseg0 added a commit that referenced this issue Mar 12, 2019

Make changes to USG to prevent canceling job. Should fix #107

859a167

dirtycajunrice added bug Something isn't working approved Enhancement or request that was approved to be worked on in-next-release In the next release and removed awaiting-triage Request awaiting triage labels Mar 12, 2019

samwiseg0 changed the title ~~[Feature Request] Expand timeouts before canceling a job~~ [BUG] Expand timeouts before canceling a job Mar 12, 2019

samwiseg0 changed the title ~~[BUG] Expand timeouts before canceling a job~~ [BUG] Unifi job does not try again after failure Mar 12, 2019

samwiseg0 closed this as completed Mar 12, 2019

Boerderij locked and limited conversation to collaborators Mar 19, 2019

dirtycajunrice pushed a commit that referenced this issue Jun 24, 2019

Make changes to USG to prevent canceling job. Should fix #107

d710d2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unifi job does not try again after failure #107

[BUG] Unifi job does not try again after failure #107

1activegeek commented Feb 21, 2019

dirtycajunrice commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

1activegeek commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

1activegeek commented Mar 12, 2019

[BUG] Unifi job does not try again after failure #107

[BUG] Unifi job does not try again after failure #107

Comments

1activegeek commented Feb 21, 2019

dirtycajunrice commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

1activegeek commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

samwiseg0 commented Mar 12, 2019

1activegeek commented Mar 12, 2019