Adding retries for failed REST calls to the ISY #46

rccoleman · 2019-03-09T01:46:01Z

As per Michel @ UDI, the ISY will return 404 to an otherwise valid REST call if it thinks that the Insteon network is overwhelmed. That command may or may not actually succeed, but this patch adds a retry mechanism with a 1 second delay until either the ISY acknowledges that the command succeeded or we run out of retries.

Support retrying ISY commands if they fail

rccoleman · 2019-03-25T02:56:26Z

How can I move this along? I would expect that anyone sending commands to multiple devices on the ISY at once would run into the same failures that I have, so I think this would be a valuable change.

rmkraus · 2019-03-25T03:12:42Z

My apologies, I meant to reply to this a few weeks ago and then it slipped my mind.

So I'm back and forth on this one for a few reasons. Primarily, it is a hack for dealing with a bad response from the controller. The controller is sending the wrong error code back. The (most) correct answer to this would be to get the issue fixed with the ISY firmware.

If this code is used, then a legitimately bad request will block the thread for, at least, 5 seconds before failing out. That is also less than ideal.

Then, possibly the worst of all, if the ISY becomes overwhelmed with requests, it will likely return a percentage of requests back with 404. Based on Michael's response, a percentage of those 404's will actually send a command over the Insteon network. Then we'd repeat the requests that returned 404, a percentage of those may 404, a percentage of those will send a request over the network, and we will repeat for up to 5 seconds. I would be concerned that this would flood the Insteon network for a large enough set of requests. I think this may do more harm than good.

rccoleman · 2019-03-25T03:33:37Z

No problem on the delay. You're right that I'm trying to work around issues in the ISY, but it's really two-fold - first, it periodically rejects requests and maybe actually sends them, and secondly it returns the wrong error code (n my humble opinion). It would be great if we could get one or both of those resolved on the ISY side, but it seems way, way down the priority list for them, and the release schedule is glacial. If it's important to a PyISY user to fix this, I fear that the only option is through PyISY or the application that uses it.

My goal with this change was to make it as simple as possible, but I take your point about blocking the main thread. Perhaps a better solution would be to spawn a worker thread to handle the retries, but we're also limited in worker threads on the ISY (I think there are three, and they also handle node servers), so I suspect that the communication will be throttled either way. As it is, failed commands usually succeed after 1 or 2 retries, so the likelihood that we'll go through the full five is low.

Fixing the "wrong" error code is really just cosmetic for my use case, but I can see that it's problematic for a general-purpose solution. I don't know how to improve that without waiting for UDI to change their implementation.

I was also worried about flooding the network, and I think I was making it worse when I only had a 0.5s delay between retries. After extending it to 1s, it seems to back off enough to avoid flooding the Insteon network, even if a large number of devices are being turned on. Best practice for that is really to create a scene on the ISY and just turn the scene on, and performance will be better with that, too.

For me, it's all about fine tuning the backoff delay to balance Insteon network traffic with eventually having commands succeed. As it is, commands just fail sometimes, and that's a terrible user experience and essentially unusable for automation (at least for me).

I'm an embedded C dev who's just getting started with Python and I'm open to alternative implementations. I do strongly feel that some solution is required for users of the REST interface to expect predictable results.

Edit: It occured me that we should be able to differentiate "real" 404s from these failures by seeing if node is in the list of valid nodes before retrying.

rmkraus · 2019-03-25T13:38:33Z

I see your points. It is a bit of a lose lose situation. Are you using this with Home Assistant? I ask because I don’t currently have my home automation system setup as I’m moving this year. Ideally, I’d like this patch to be tested with hass for a few days before bumping Hass’s required version.

On Sun, Mar 24, 2019 at 23:33 rccoleman ***@***.***> wrote: No problem on the delay. You're right that I'm trying to work around issues in the ISY, but it's really two-fold - first, it periodically rejects requests and maybe actually sends them, and secondly it returns the wrong error code (n my humble opinion). It would be great if we could get one or both of those resolved on the ISY side, but it seems way, way down the priority list for them, and the release schedule is glacial. If it's important to a PyISY user to fix this, I fear that the only option is through PyISY or the application that uses it. My goal with this change was to make it as simple as possible, but I take your point about blocking the main thread. Perhaps a better solution would be to spawn a worker thread to handle the retries, but we're also limited in worker threads on the ISY (I think there are three, and they also handle node servers), so I suspect that the communication will be throttled either way. As it is, failed commands usually succeed after 1 or 2 retries, so the likelihood that we'll go through the full five is low. I was also worried about flooding the network, and I think I was making it worse when I only had a 0.5s delay between retries. After extending it to 1s, it seems to back off enough to avoid flooding the Insteon network, even if a large number of devices are being turned on. Best practice for that is really to create a scene on the ISY and just turn the scene on, and performance will be better with that, too. For me, it's all about fine tuning the backoff delay to balance Insteon network traffic with eventually having commands succeed. As it is, commands just fail sometimes, and that's a terrible user experience and essentially unusable for automation (at least for me). I'm an embedded C dev who's just getting started with Python and I'm open to alternative implementations. I do strongly feel that *some* solution is required for users of the REST interface to expect predictable results. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#46 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEuhGm-XojdS36ZvSaI0QTR3sejSdMhwks5vaEOSgaJpZM4bmaJY> .

-- Thanks, Ryan Kraus

rccoleman · 2019-03-25T13:49:48Z

I originally used these changes for a few days with Home Assistant while I refined the code, but got frustrated with the difficulty of maintaining custom dependencies on the Hass.io variant. I made equivalent changes with the same algorithm in the isy994 Hass component and I’ve been using it for several weeks. Before submitting this PR, I again put these changes back into my system and observed that the retries were happening as expected down in PyISY.

Without these changes, I was strongly considering just ditching the ISY and moving to native Insteon PLM support in Hass. Now that I can actually rely on the results, they bought the ISY more time.

jimboca · 2019-03-25T19:33:40Z

I wanted to review this as well but haven't had any time. Idealy tt should be using a requests session when talking to the ISY so it's not opening and closing the connection with each request.

rccoleman · 2019-03-27T05:29:20Z

Do you have an example of that? I poked around in the ISY developer’s handbook and it wasn’t clear to me which direction to go. I don’t think it will improve the situation that this patch is targeting, though, because it seems to be related to temporary congestion on the Insteon network.

jimboca · 2019-03-27T16:01:15Z

My reference is Polyglot V2. When it tried to send many requests to the ISY it would get flooded and we got 404's, after switching to keeping the socket open with keep-alive and reusing we could push many more changes faster to the ISY. Polyglot V2 uses node.js so I can't supply a direct example, but converting PyISY to use a Python requests session would allow a socket to be reused instead of every command opening and closing a new one for each http://docs.python-requests.org/en/master/user/advanced/ It's possibly you can still get a 404 when opening the initial session, so you will still need retries, but once it's open you can pass many more commands and you are less likely to flood the ISY.

rccoleman · 2019-03-27T17:01:24Z

Makes sense. It sounds like a separate effort from this one, considering that we still need to a backup plan for failed requests. Aren't we also competing with local traffic on the ISY, making it possible that we'll still get 404s beyond the initial connection?

jimboca · 2019-03-27T17:04:39Z

IMO, it's the same effort. When PyISY is sending many commands then it's the one creating the problem so by switching to a session you will have much more reliable (and much faster) responses to commands.

rccoleman · 2019-03-27T17:23:45Z

From this thread, it sounds like the most likely explanations for 404 are that the ISY is unable to communicate with the device or the ISY determines that the Insteon network is "busy". As I read it, it doesn't sound like it's related to the REST commands themselves.

jimboca · 2019-03-27T17:31:02Z

My opinion is based on direct experience we had with this exact same issue while developing Polyglot V2 and working with Michel and e42 to resolve it.

rccoleman · 2019-03-27T21:43:38Z

@rmkraus Is it really as simple as this?

diff --git a/PyISY/Connection.py b/PyISY/Connection.py
index 6d267ce..c1b68fb 100755
--- a/PyISY/Connection.py
+++ b/PyISY/Connection.py
@@ -23,6 +23,8 @@ class Connection(object):
         self._username = username
         self._password = password
 
+        self.req_session = requests.Session()
+
         # setup proper HTTPS handling for the ISY
         if use_https and can_https(self.parent.log, tls_ver):
             self._use_https = True
@@ -31,7 +33,6 @@ class Connection(object):
             requests.packages.urllib3.disable_warnings()
 
             # ISY uses TLS1 and not SSL
-            req_session = requests.Session()
             req_session.mount(self.compileURL(None), TLSHttpAdapter(tls_ver))
         else:
             self._use_https = False
@@ -69,7 +70,7 @@ class Connection(object):
             self.parent.log.info('ISY Request: ' + url)
 
         try:
-            r = requests.get(url, auth=(self._username, self._password),
+            r = self.req_session.get(url, auth=(self._username, self._password),
                     timeout=10, verify=False)
 
         except requests.ConnectionError as err:

I'm just messing around with a little test program, but I can still successfully get the list of nodes and turn them on and off.

… isy_retries

Isy retries

rccoleman · 2019-04-02T17:40:58Z

Like Don Quixote, I continue :). I made the very simple changes above that I think will establish and use a persistent session and it's working as expected in my home system. I do see some retries kicking in if I try to turn lots of individual lights or scenes on at once, but it's maxed out at 3 with a large amount of immediate traffic. I don't know how to prove that the persistent session is doing the right thing other than to follow the guidance and examples online.

It turns out that it's really very easy to use custom dependencies in Hass.io by following the instructions here to specify my Github fork as a requirement, rather than the released version of PyISY. They get being pulled into a directory structure under config/deps, which gets added to the search path by "bootstrap" and is persistent across restarts.

rccoleman · 2019-04-07T16:37:34Z

Is there any interest in this change? I'm using it and it makes it possible for me to reliably integrate ISY devices into my Hass setup, so I'm just trying to share the wealth.

jimboca · 2019-04-08T14:21:15Z

I am very interested, but haven't had a chance to test in my environment.

rccoleman · 2019-04-08T14:29:37Z

Ok, great! No hurry.

shbatm · 2019-05-05T01:13:19Z

@rccoleman @jimboca: I'm working on a separate issue with Hass under #44 related to fixing bad Climate/Thermostat calls. In my changes, the isy994_control event has been updated to also expose value. This is expected to be a non-breaking change in PyISY; (PyISY's event is now a dict instead of a string, but will represent itself the same as the command string it used to pass).

This change is vital to the climate integration, as some values are only made available from the event stream, not the node definition, but I am looking for someone else to test it to confirm it is not a breaking change.

I was going to pull your changes into my PyISY_beta branch, and am currently testing in HASS using a temporary PyISY_beta PyPi package... just wondering if either of you would be willing to help me test?

rccoleman · 2019-05-05T02:25:54Z

Sure, I can pull your branch into my production system. I'm not controlling any thermostats from my ISY, though. I'm really only using it as a hub for my Insteon switches and leak detectors.

shbatm · 2019-05-05T05:34:21Z

Thanks, that's OK if you don't have any thermostats, I really just want to make sure the events changes aren't going to break anything

rccoleman · 2019-05-05T18:42:48Z

I've been using your branch for an hour or so and it seems to be working for me. I also see my retries kicking in when I turn a bunch of lights/scenes on the ISY on or off. I'll continue to monitor.

rccoleman · 2019-05-06T04:01:11Z

So Insteon events aren't working or aren't working reliably. Specifically, I have motion sensors and wall switches that send events and they no longer trigger my Hass automations with your branch, while they work immediately when I switch back to my branch. I haven't spent time digging into what's wrong, but it's definitely not working.

shbatm · 2019-05-06T21:20:18Z

Thanks for the feedback. I'll look into it. I opened an issue at shbatm/PyISY#3 to not clutter this thread.

rccoleman added 3 commits March 8, 2019 17:33

Support retrying ISY commands if they fail

461fc10

Merge pull request #1 from rccoleman/isy_retries

917f739

Support retrying ISY commands if they fail

Fixed indentation

da71923

rccoleman closed this Mar 27, 2019

rccoleman reopened this Mar 30, 2019

rccoleman added 3 commits April 1, 2019 16:41

Use session

f904142

Merge branch 'isy_retries' of https://github.com/rccoleman/PyISY into…

7bc2125

… isy_retries

Merge pull request #2 from rccoleman/isy_retries

2b0e66e

Isy retries

shbatm mentioned this pull request May 5, 2019

Incorporate Automatic Retry Changes from @rccoleman shbatm/PyISY#1

Merged

shbatm mentioned this pull request May 15, 2019

Updates to ISY994 Component (Climate 1.0, Improved Z-Wave Device Support, Fix Attribute and Heartbeat Issues) home-assistant/core#20970

Closed

7 tasks

shbatm mentioned this pull request Jul 2, 2019

PyISY Beta Changes (Major Refactoring, Move to PEP8 & Python 3, Climate Fixes, Z-Wave Support, Retry Fixes) #44

Closed

2 tasks

Merge https://github.com/automicus/PyISY

a853ad8

shbatm mentioned this pull request Mar 1, 2020

PyISY V2 Changes (Major Refactoring, Code Cleanup, Thermostat Fixes, Z-Wave Support, Retry Fixes) #62

Merged

shbatm mentioned this pull request Mar 29, 2020

[V2] Retire Climate Module, Re-enable Networking, Parity with PyISY-Beta #88

Merged

OverloadUT mentioned this pull request May 2, 2020

v2 Release #95

Merged

balloob closed this Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding retries for failed REST calls to the ISY #46

Adding retries for failed REST calls to the ISY #46

rccoleman commented Mar 9, 2019

rccoleman commented Mar 25, 2019

rmkraus commented Mar 25, 2019

rccoleman commented Mar 25, 2019 •

edited

rmkraus commented Mar 25, 2019 via email

rccoleman commented Mar 25, 2019

jimboca commented Mar 25, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019 •

edited

rccoleman commented Apr 2, 2019 •

edited

rccoleman commented Apr 7, 2019

jimboca commented Apr 8, 2019

rccoleman commented Apr 8, 2019

shbatm commented May 5, 2019 •

edited

rccoleman commented May 5, 2019

shbatm commented May 5, 2019

rccoleman commented May 5, 2019

rccoleman commented May 6, 2019

shbatm commented May 6, 2019

Adding retries for failed REST calls to the ISY #46

Adding retries for failed REST calls to the ISY #46

Conversation

rccoleman commented Mar 9, 2019

rccoleman commented Mar 25, 2019

rmkraus commented Mar 25, 2019

rccoleman commented Mar 25, 2019 • edited

rmkraus commented Mar 25, 2019 via email

rccoleman commented Mar 25, 2019

jimboca commented Mar 25, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019

jimboca commented Mar 27, 2019

rccoleman commented Mar 27, 2019 • edited

rccoleman commented Apr 2, 2019 • edited

rccoleman commented Apr 7, 2019

jimboca commented Apr 8, 2019

rccoleman commented Apr 8, 2019

shbatm commented May 5, 2019 • edited

rccoleman commented May 5, 2019

shbatm commented May 5, 2019

rccoleman commented May 5, 2019

rccoleman commented May 6, 2019

shbatm commented May 6, 2019

rccoleman commented Mar 25, 2019 •

edited

rccoleman commented Mar 27, 2019 •

edited

rccoleman commented Apr 2, 2019 •

edited

shbatm commented May 5, 2019 •

edited