Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAP over CoAP #52

Closed
wants to merge 1 commit into from
Closed

HAP over CoAP #52

wants to merge 1 commit into from

Conversation

roysjosh
Copy link
Contributor

WIP, looking for initial feedback

@roysjosh
Copy link
Contributor Author

Branch where HA work is being done: https://github.com/roysjosh/core/tree/hap-over-coap

@Jc2k
Copy link
Owner

Jc2k commented Jan 15, 2022

This is looking good. I'll try and find time to give it a quick review this weekend. The first thing that catches my eye is, can we share code for the pair-setup and pair-verify stuff with the existing pairing code? The existing code is already decoupled from TCP ("sans io") and has been used with BLE successfully, but can be refactored further if needed.

@roysjosh
Copy link
Contributor Author

roysjosh commented Jan 15, 2022

It is very likely they can be combined. One of the few differences is the creation of a 3rd encryption context for events at the end of pair-verify. I'll find a way to make it work.

Copy link
Owner

@Jc2k Jc2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction this is going in. I'll probably wait for you to de-dupe the crypto bits before I have another look.

In general try to follow Home Assistant's coding style and naming conventions and best practices (https://developers.home-assistant.io/docs/development_guidelines). Old code from homekit_python doesn't, though thats only until I have time to refactor it.

Go wild with dataclasses, enums and typing generally. A lot of the ones in aiohomekit were generated automatically and are rubbish. Feel free to fix as needed.

There is a lot of legacy cleanup in here to do still, so if something is annoying feel free to refactor it - especially if its an internal bit not exposed to Home Assistant (if it does impact Home Assistant, we'll need a PR to both). Any cleanups you want, can you put them in seperate PR's to keep this one clean.

pyproject.toml Outdated Show resolved Hide resolved
aiohomekit/controller/coap/connection.py Outdated Show resolved Hide resolved
aiohomekit/controller/coap/connection.py Outdated Show resolved Hide resolved
aiohomekit/controller/coap/connection.py Outdated Show resolved Hide resolved
@roysjosh
Copy link
Contributor Author

Hmm I need to find a way to mark the bulb as unavailable when the CoAP comms time out + invalidate the subscriptions whenever the CoAP session needs to be recreated (pair-verify re-executed) as that generates new push event keys.

@Jc2k
Copy link
Owner

Jc2k commented Jan 17, 2022

I've never done anything with CoAP. Is there a maximum age for a CoAP session? Or is it based on the time since the last exchange?

The super simple way would just be to use a time.monotonic() (so its DST safe) and have ensure_connected() sort out the session if the timer has expired. HA polls every 30s, so if the session drops out and we don't notice, then HA will bring it back to life in at most 30s. Ultimately this is what we are doing for TCP when the networking stack doesn't notice the connection drop (like a device crashes and reboots it won't RST existing connections).

That said, I would like to eventually be smarter about polling (it's pretty hard on the battery of my Eve Degree 😢). So alternatively you could start an asyncio task that will close the session after a timeout, and cancel it and reschedule it every time there is a succesful communication with the device? This is kind of annoying in asyncio because you'd need to spawn a coroutine, but then something needs to collect the results (or exceptions) of the coroutine, otherwise you leak expections and things can end up silently broken. I don't think we've got anything like that in aiohomekit atm that you can reuse, though.

Or did you have something nicer in mind?

@roysjosh
Copy link
Contributor Author

I'll squash all of this at the end. For now I'll try to commit chunks of enhancements under brief but descriptive commits to help you see what has changed. I'll also comment on your open review items. Big item in this batch was the crypto de-dupe which somehow went without incident. Yay for good specs.

@roysjosh
Copy link
Contributor Author

I'm guessing the CoAP behavior is going to be implementation-specific where the spec isn't clear. Sessions are likely tied to IP & port. I haven't left my HA dev instance up more than a half of a day yet, plus the NL Thread network occasionally has a hiccup which tends to reset the comms so I'm not sure if these bulbs have a maximum session lifetime yet. I'm aiming to make the crypto bits robust to desynchronization or full context resets. It's a lot better than the first code but likely still more to go.

@Jc2k
Copy link
Owner

Jc2k commented Jan 19, 2022

. I'll also comment on your open review items. Big item in this batch was the crypto de-dupe which somehow went without incident. Yay for good specs.

Sounds good!

@codecov
Copy link

codecov bot commented Jan 26, 2022

Codecov Report

Merging #52 (cc102f1) into main (845502b) will decrease coverage by 8.49%.
The diff coverage is 44.99%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #52      +/-   ##
==========================================
- Coverage   86.48%   77.98%   -8.50%     
==========================================
  Files          48       54       +6     
  Lines        3470     4302     +832     
==========================================
+ Hits         3001     3355     +354     
- Misses        469      947     +478     
Flag Coverage Δ
unittests 77.98% <44.99%> (-8.50%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiohomekit/controller/coap/connection.py 14.87% <14.87%> (ø)
aiohomekit/controller/coap/pairing.py 28.16% <28.16%> (ø)
aiohomekit/controller/coap/discovery.py 28.20% <28.20%> (ø)
aiohomekit/controller/controller.py 57.02% <37.83%> (-11.11%) ⬇️
aiohomekit/const.py 66.66% <60.00%> (-3.34%) ⬇️
aiohomekit/controller/coap/structs.py 82.05% <82.05%> (ø)
aiohomekit/controller/coap/__init__.py 100.00% <100.00%> (ø)
aiohomekit/controller/coap/pdu.py 100.00% <100.00%> (ø)
aiohomekit/controller/ip/connection.py 85.71% <100.00%> (-0.98%) ⬇️
aiohomekit/controller/pairing.py 81.57% <100.00%> (+1.02%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 845502b...cc102f1. Read the comment docs.

@roysjosh
Copy link
Contributor Author

@Jc2k okay, I think this is ready for another round of reviews, please. I've had my dev HA up for most of 3 days without any major hiccups.

@roysjosh
Copy link
Contributor Author

RE: polling, do you have any feel for if devices generally support events properly? iOS very rarely polled my fake bulb app's lightbulb characteristics & seemed to largely rely on events. It did poll the ping characteristic under the accessory runtime information service on a regular but infrequent basis.

@Jc2k
Copy link
Owner

Jc2k commented Jan 29, 2022

RE: Polling and events. Good question. We don't have any telemetry so I don't have any solid answers, here are some observations.

Regardless of reliability, I know some characteristics are poll only. These days this is less of a problem but I think in R1 of the HAP spec maybe temperature was poll only (it has events these days for sure).

I know we have some TCP HAP devices that refuse subscription requests, so we actually have a path that disables events if it detects this has happened. (It toasts the secure session, so we have to stop trying to even send the subscription request). This is probably a bug on our end but it's not been a big enough problem to look into further yet.

I know we have some devices that either have really flaky WiFi or they are just on the edge of their owners range. So we'd get packet loss related failures often. This has caused lots of tickets which are mostly just more errors to silence.

Connections dropping without a TCP RST was common. This includes from devices crashing.

IIRC one of my devices seems to have a maximum TCP uptime of about 17 hours before the connection needs rotating.

While not about events reliability exactly we have observed that the implementations vary wildly in quality. Some favourite bugs:

  • A device could parse {"foo":"bar"} but not {"foo": "bar} (yes, the space)
  • A device could generate invalid JSON that python can't parse (["foo",])
  • A device could handle 1, 0 and true for a BOOL field, but not false.
  • A device was really picky about the structure of its Host header
  • A device was sensitive about how many packets were used to send HTTP requests
  • A device randomly returns a 500 or 400 with HTML instead of HAP response for some requests
  • Devices returning e.g. 70402 instead of -70402

Which has made me very mistrustful of them.

All that said, the events implementation on Hue has been really solid. I have a Hue Remote that i use in my office (i.e. multiple times daily for a year) and i've never noticed it drop an event. Just got an outdoor Friends of Hue switch that so far that is also pretty solid.

I would like to rely on events more in the future. I have an Eve Degree and it eats batteries because of the polling. But I probably won't be just switching off polling completely given the vast range of devices we are dealing with. It would be good to slowly back off the poll rate if events are seen to be working for a connection maybe. But with some sort of ping to stop the TCP connection timing out. I don't think the ping characteristic is widely available, although it could be i just didn't notice it before. So i'd probably just poll something else like name.

@Jc2k
Copy link
Owner

Jc2k commented Jan 29, 2022

RE: Review - i'll make it my next priority, though might be next week now.

How easy would it be for you to pull some bits out into seperate PR's? E.g. the changes to tlv8./tlv.py would be a good candidate, they are quite self contained and we can easily add tests for them. (It's quite large, so if we can get the easy bits landed then there is less noise when im re-reviewing). This is what I do to help the Home Assistant reviewers with my larger changes upstream, its a bit of a PITA with all the rebasing but a big help. And i reckon things like the tlv changes on their own I can review sooner.

@Jc2k
Copy link
Owner

Jc2k commented Feb 1, 2022

I'd like to look at pulling out the TLV structs into aiohomekit/controller/coap/structs.py and add some tests for them (ideally decoding actually in-the-wild PDU data as the next thing when the outstanding PR's are merged.

I'd also like to see if we can reuse .model.Accessories, .model.Service and .model.Characteristic. They already have code to synthesize the IP-style JSON responses and they have a layer to do lookups by iid and type. I'm currently trying to make aiohomekit nicer to use from home assistant and i'm thinking of having list_accessories_and_characteristics return an Accessories as I just load it into one by hand on the other side anyway. So it would be good to use them from the start.

I'm sort of imagining Pdu09Characteristic gains a to_characteristic, Pdu09Service gains a to_service, Pdu09Accessory gains a to_accessory and Pdu09Database gains to_accessories.

Copy link
Owner

@Jc2k Jc2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in parallel, you could look at getting the parsed_address change made in home assistant zeroconf. I'm not going to be the one reviewing that and you might as well get thee ball rolling.

aiohomekit/controller/coap/pairing.py Outdated Show resolved Hide resolved
aiohomekit/controller/coap/pairing.py Outdated Show resolved Hide resolved
if response.code != Code.CHANGED:
logger.warn(f"CoAP POST returned unexpected code {response}")

try:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres no i/o in the decrypt retry code. So can we pull it out of post so thats its more testable, and add some tests for it.

@Jc2k
Copy link
Owner

Jc2k commented Feb 1, 2022

RE: Tests, IP currently has a mock accessory that it uses. I hate this approach. It's slow and fails at random. Open to ideas on how you'd like to proceed there. I don't want to drown you in test writing, but it would be good to have some coverage of the happy path as I am in the midst of a (admittedly quite slow) cleanup exercise and could very easily break this.

We could break the i/o out, so for read_characteristics we might have:

def serialize_read_characteristics_request(ids: list[tuple[int, int]]) -> bytes:
    pdu = b"".join(
        [
            hap_pdu(
                HAP_PDU_OPCODES.HAP_CHARACTERISTIC_READ, idx, int(aid_iid[1]), b""
            )
            for (idx, aid_iid) in enumerate(ids)
        ]
    )

def deserialize_read_characteristics_response(ids: list[tuple[int, int]], payload: bytes) -> dict[str, Any]:
        offset = 0
        results = dict()
        while True:
            pduControl, pduTid, pduStatus, pduBodyLen = struct.unpack(
                "<BBBH", payload[offset : offset + 5]
            )
            aid_iid = ids[pduTid]
            logger.debug(
                "Read value for aid %d iid %d: %s, len %d"
                % (
                    int(aid_iid[0]),
                    int(aid_iid[1]),
                    pduStatusMap[pduStatus],
                    pduBodyLen,
                )
            )

            if pduStatus == 0:
                # decode TLV to get byte value
                value = (
                    decode_pdu_03(payload[offset + 5 : offset + 5 + pduBodyLen])
                    if pduBodyLen > 0
                    else b""
                )
                # find characteristic so we can get the data type
                characteristic = self.info.find_characteristic_by_iid(int(aid_iid[1]))
                # if we found it & have a non-empty value...
                if characteristic is not None and pduBodyLen > 0:
                    # set the raw bytes
                    characteristic.raw_value = value
                    # and get the decoded value
                    value = characteristic.value
                # add result to dict
                results[aid_iid] = {
                    "value": value,
                }
            else:
                results[aid_iid] = {
                    "description": pduStatusMap[pduStatus],
                    "status": pduStatus,
                }

            offset += 5 + pduBodyLen
            if offset >= len(payload):
                break

    async def read_characteristics(self, ids):
        pdu = serialize_read_characteristics_request(ids)
        payload = await self.encCtx.post(pdu)
        results = deserialize_read_characteristics_response(payload)
        logger.debug(f"Read characteristics: {results!r}")
        return results

Both of the 2 helpers are "pure", they don't need to know about the connection classes or any of the state on them. They could even live in a different module. So they are super easy to test.

We could also choose to mock self.encCtx.post. My only thought there is I'd like to not have too many mocks, as they can be quite brittle when refactoring. So i'd probably lean towards the first approach.

@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 3, 2022

Rebased w/o changes, more commits on the way to address comments

@Jc2k
Copy link
Owner

Jc2k commented Feb 3, 2022

Awesome. Your work has inspired me to sort BLE out (had it working in homekit_controller aaages ago, but ripped it out when I did the asyncio port). So i'm expecting to be able to reuse bits of this there.

@Jc2k Jc2k closed this Feb 3, 2022
@Jc2k Jc2k reopened this Feb 3, 2022
@Jc2k
Copy link
Owner

Jc2k commented Feb 3, 2022

(Sorry for close, child helping 😵‍💫 )

@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 4, 2022

BLE would be great! One day we could set up devices completely in HA.

@Jc2k
Copy link
Owner

Jc2k commented Feb 4, 2022

I've got pairing and get/put/identify/list-pairings working already via the CLI. Just cleaning it up a bit and adding some tests. bleak is so much nicer to use than when we did this with dbus-python for homekit_python!

Not sure i'll add events straight away, there are 2 different event systems (connected and disconnected) and i need to have a background task constantly monitoring the advertisement data to pick up disconnected events. I think I want to clean up aiohomekit a lot more first.

Hopefully we can find something in the Thread Control Point for managing the thread network.

Draft is here: #60.

@Jc2k
Copy link
Owner

Jc2k commented Feb 5, 2022

Would it be OK with you if we made a dev branch and we put both BLE and CoAP in there, and both work against the same branch for a bit? We'd both raise PR's against it till we were happy with it.

I want to rework the discovery interface a bit, and I need to change all 3 backends to do it, so will be easier this way and you can still work on the other bits in parallel.

@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 5, 2022

I think that makes sense. I'm refactoring a few things still & then I'll work on some tests.

@Jc2k
Copy link
Owner

Jc2k commented Feb 5, 2022

Good one. It'll do it tomorrow then (hopefully).

@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 6, 2022

Squashed this PR

Jc2k added a commit that referenced this pull request Feb 6, 2022
@Jc2k
Copy link
Owner

Jc2k commented Feb 6, 2022

dev branch is open with this and BLE support is merged. Let's work there, with PR's per cleanup/fix/feature. When ready this will become 1.0.0 because its such a huge new set of features.

I'll open up tickets with a coap label in the 1.0.0 milestone and probably tag you as well where there is something I want to change or clarify. Feel free to raise things in 1.0.0 either to ask me or just to keep track of our progress.

I'll open PR's and assign them to you where I touch coap stuff, at least while we are both working on this.

My area of focus is going to be around zeroconf and BLE discovery for the time being. One thing I want to change is the weird lifecycle zeroconf has:

  • In HA, an existing zeroconf instance is passed to us. It's cache has already made a bunch of discoveries.
  • In the CLI, the Controller has to make its own zeroconf instance. It's completely cold.

It's making that code a bit harder to reason about.

Cheers for all the work you've done on this. Hopefully nearly there with it..

@Jc2k Jc2k closed this Feb 6, 2022
@roysjosh
Copy link
Contributor Author

roysjosh commented Feb 6, 2022

Awesome, thanks! It's been a fun project to work on. Looking forward to getting it over the line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants