Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up context ID generation. #30

Merged
merged 2 commits into from
Aug 29, 2022

Conversation

blueshiftlabs
Copy link
Contributor

Using a SHA-256 cryptographic hash is major overkill for the purposes of finding a short representation of a string. This change switches _short_hash to use Python's built-in hash() function (used for hashing in dicts and sets), which works just fine for this, and is much faster.

In addition, since context IDs are length-limited, use base-85 encoding to pack as many bits of the hash and index into the context as possible. This slightly changes the context format, but the new format should be a bit easier to read.

Using a sha256 hash is major overkill for the purposes of finding
a short hash of a string. This change switches _short_hash to use
Python's built-in hash() function (used for hashing in dicts and sets).
which works just fine for this, and is much faster.

In addition, since context IDs are length-limited, use base-85 encoding
to pack as many bits of the hash and index into the context as possible.
@blueshiftlabs
Copy link
Contributor Author

Hey @basnijholt, could you please take a look at this?

@basnijholt
Copy link
Owner

Hi @blueshiftlabs, thanks a lot for taking the time! I am still thinking about whether I want to change the numbering to base85. The numbering has helped me a lot when debugging, in base85 it's not really readable anymore.

@blueshiftlabs
Copy link
Contributor Author

I can limit the base85 to just the entity ID hash without affecting anything. My real concern there had to do with repeating context IDs, but since the counter gets reset to 0 at every HA restart, 10 million should be enough - even at one call per second, that's still 115 days of HA uptime before we run out of IDs.

One note - since context IDs are persisted to the database, it might be worth saving the counter value across Home Assistant restarts to avoid collisions on every restart. Not sure if it matters in this case, but if it does, base85 might be a lot more useful. I could also do base36 or something similar that might be a bit more human-readable, in exchange for giving up some keyspace.

@RubenKelevra RubenKelevra added this to To do in PRs Aug 1, 2021
@basnijholt
Copy link
Owner

Thanks a lot @blueshiftlabs! This has taken me 2 years to merge, but now that we have tests, it makes me more confident that it works.

@basnijholt basnijholt merged commit 707f955 into basnijholt:master Aug 29, 2022
PRs automation moved this from To do to Done Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
PRs
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants