Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Danfoss thermostat is sleepy so use sendWhenActive #3240

Closed

Conversation

kennylevinsen
Copy link
Contributor

As a Sleepy End Device, we should use sendWhenActive for all requests to the device.

However, rather than manually setting this flag whenever someone discovers issues with sleeping devices like done here, maybe we should enable sendWhenActive by default for sleepy end-devices, possibly using "battery-powered end-device" as heuristic. Open to suggestions.

Note: Built upon #3239

@Koenkk
Copy link
Owner

Koenkk commented Oct 28, 2021

The problem is that we don't now when an end device is sleepy, so poll quite often so they can accept commands, some don't.

@kennylevinsen
Copy link
Contributor Author

kennylevinsen commented Oct 28, 2021

It wouldn't be perfect, but I imagine the vast majority of battery-powered end-devices are sleepy, even if they differ in poll rate and sometimes seem to work well enough without sendWhenActive.

Letting them default to sendWhenActive: true (With means to override) seems significantly better than the opposite, even if the heuristic is imperfect.

Alternatively, maybe an easy way to mark a device sleepy instead of having to augment every request with sendWhenActive.

@Koenkk
Copy link
Owner

Koenkk commented Oct 28, 2021

The problem is that send when active doesn't send when a device polls, it only triggers when the end device sends a message to the coordinator.

@kennylevinsen
Copy link
Contributor Author

kennylevinsen commented Oct 28, 2021

Hmm, that's an issue, but on the other hand we also cannot send blindly to a device that polls less often than the router cache period (7 seconds?) - which I imagine would be most devices with smaller battery capacity.

Do we have a means to detect when a device wakes and requests its message queue? Or are we stuck between a rock and a hard place of defective request transmit behaviors? :/

@kennylevinsen
Copy link
Contributor Author

An idea for a middle-ground solution: retryWhenActive for sleepy devices, possibly limited to MAC_TRANSACTION_EXPIRED and other errors that may suggest a sleeping SED. This would allow us to try immediately but fall back to waiting for signs of life.

There's of course also the brute-force solution of allowing many retries over longer periods of time, but doing that for possibly minutes seem... inelegant and noisy.

(I'm not deep enough into zigbee to know if we have other options, but it does seem silly that the rest of the network would not be made aware of a chance to talk to SEDs when it is valid for them to sleep for longer than the parent caching period...)

@MattWestb
Copy link

Zigbee 3 sleeping end device that is having one Zigbee 3 parent and have configured the pull control in cluster 0x0020 with it, is doing checkin (command 0x0000) to its parent that is being sent to the coordinator and is in fast pull mode until its timing out or the coordinator is sending one fast pull stop (0x0001) after have sending all dta it like to the SED or its no data to sending.

Its made for "cashing" data that shall being delivered to the device then its waking up also after very long sleeping time.

@pklokke
Copy link
Contributor

pklokke commented Oct 29, 2021

Zigbee 3 sleeping end device that is having one Zigbee 3 parent and have configured the pull control in cluster 0x0020 with it, is doing checkin (command 0x0000) to its parent that is being sent to the coordinator and is in fast pull mode until its timing out or the coordinator is sending one fast pull stop (0x0001) after have sending all dta it like to the SED or its no data to sending.

Its made for "cashing" data that shall being delivered to the device then its waking up also after very long sleeping time.

This behaviour isn't support on existing ZHA 1.2 devices however, and some coordinator handling for legacy devices would be appreciated. @Koenkk was kind enough to implement the sendWhenActive to help with some ugly kludges I had previously proposed for sending messages when reports came in, having something a bit more streamlined for all SEDs would also help.

Poll Control Cluster support would also be "nice to have" in Z2M, and would help alot with remote control style devices, which talk to bulbs normally rather than the coordinator, but this is more complimentary to this proposal, rather than replacing.

@kennylevinsen
Copy link
Contributor Author

Okay, so it seems like we have a path for a good solution for Zigbee 3.0, and a "good enough" solution for other devices:

  1. If ZIgbee 3.0 Poll Control Clusters are available, we can respond on check-in and keep sending within the fast-poll period. We both know exactly when the device is awake and have the ability to control it.
  2. Otherwise, it should be safe to assume that a device will poll for a bit after a message, and it should be safe to assume that most SEDs will wake up when it wants to report something, making sendWhenActive our most reliable way to communicate with it. It should also be perfectly valid to extend sendWhenActive with a window, so that we don't just empty a queue once on data, but keep sending directly for a short while after that.
  3. Lastly, some devices might need quirks due to not ffitting in 1 or 2 (e.g. older devices that wake without anything to report, or SEDs that always poll faster than 7.68 seconds), but that'll always be true, and this should turn the table from having quirks on the majority of SEDs to a minority instead.

A basic version of 2 would just be to enable sendWhenActive for all assumed sleepy end devices using the heuristic mentioned earlier. Koenkk/zigbee-herdsman#440 extends sendWhenActive to have a window instead of simply emptying the queue once, improving latency when ping-ponging a bit with the device after wake-up.

1 will first require adding handling of poll control clusters, but would basically just wire into sendWhenActive, replacing the time of flush with check-in, and active duration with the configured fast-poll period.

@MattWestb
Copy link

I have reading all Silabs papers of pull control then hunting extreme battery draining if controller device like IKEA and Philips HUE and if there parent is not supporting or the setup is not working the coordinator is not getting checkin (0x0000) and is never sending one checkin response (0x0000) to the device (thru its parent that can being the coordinator).
So one "safe" way is starting with alternative 1 and if the coordinator i sending one checking response the is safe to using it for sending queered data to the device.

But it can being that one device (SED) is changing its parent to one that is not setting up pull control OK but i think the devices is broadcasting one parent accouterments for informing all routers that is have one new parent.
Perhaps shall it that also being in the logic so not only looking if cluster is present on the device also checking if its being active used.
Perhaps if parent announcement the back to alternative 1 until getting one checkin from the device then alternative 2.

@kennylevinsen
Copy link
Contributor Author

I have a branch on zigbee-herdsman where I respond to checkin commands and use them to control sendWhenActive behavior when present. It requests fast poll only if there are pending messages, marks the device as "active" for direct sending while we're emptying the queue, and stops fast poll when the queue has been emptied.

I unfortunately didn't seem to be getting any checkin commands on my network during my testing. I'll have to sniff some traffic, but I wonder if the commands will only be sent whenever the device has been sleeping for longer than LONG_POLL_INTERVAL. If that's the case (conclusion pending some sniffing), we may still need more than just checkin. I suspect the proper way to handle even zigbee 3.0 sleeping devices may then be to immediately send, and then only queue and rely on checkin when the request failed to be delivered.

This discussion is more about the general Zigbee stack handling than cluster converters at this point (apart from possibly enabling/disabling this behavior from here). @Koenkk do you want me to create an issue on zigbee-herdsman for this discussion?

@MattWestb
Copy link

You can reading the attribute checkin_interval (id: 0x0000) on the SED (its 1/4 seconds) and if its being sett up OK with its parent is doing the checking request after the time have running out after it was active (being in sleep and radio is off but excluding pull request from its parent).

If not configuring IKEA controllers they is using 13200 = 55 minutes as default in the firmware.

For testing you can setting the checkin interval shorter so its easier doing the test but its then eating your batteries.

Then you is sniffing paring of one Zigbee 3 SED is the End Device Timeout Request then its telling its parent how long its can being sleeping without being deleted in the parents child table and need doing one rejoin if is extending this time, and the long pull interval must being shorter then it or you is getting rejoining all the time its trying sending somthing to its parent

@Koenkk
Copy link
Owner

Koenkk commented Nov 1, 2021

@kennylevinsen yes, sounds good please create an issue about this so we can properly track this.

@kennylevinsen
Copy link
Contributor Author

Koenkk/zigbee-herdsman#445 opened with a summary of the information I have gathered so far.

@kennylevinsen
Copy link
Contributor Author

This will no longer be necessary once Koenkk/zigbee-herdsman#453 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants