-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SHA256 for cache invalidation #1985
Conversation
So it’s considered broken then, since our equality on terms is alpha-equality, and our Binary instance doesn’t store terms in a nameless form. |
Yes, that's correct. Cache invalidation will throw out more caches than strictly necessary, however it will never keep a cache that's stale. I'd like to point out that this is true before this patch too - I'm working on another proposal (two actually) to fix - or rather work around - the broken |
As an addendum, I don't think our caching logic is completely broken. As long GHC keeps its generated names stable (either by being deterministic, or because entities were loaded from hi-files) caching should still work. Only in cases where GHC generates the same Core modulo alpha equivalence a cache gets pessimistically thrown out. This is annoying, but not nearly as bad as using a stale cache - which can happen in the current state. |
After discussion on Slack, I'm suggesting the following PR cover letter and commit message:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with the reasoning that we "need" a cryptographic hash function because only that would "guarantee" that we won't have collisions.
We shouldn't have to worry about someone malicious trying to create hash collisions here. If someone were to do that they'd only hurt themself.
The stuff we're hashing is essentially random data as far as the hashing algorithm is concerned. And any competent hashing algorithm should be able to handle that.
Especially because we're not hashing thousands of things, like for example a hashmaps might.
In fact we're only comparing two hashes, the current compile target and one cached compile target.
As I see it the most meaningful effect of this change is a big increase in the hash digest size, from 64 to 256 bits.
Which does drop the collision chance astronomically. But I intuitively feel the collision chance was quiet small already.
The added work of the more expensive hash function is unlikely to be noticeable compared to the rest clash has to do.
And if this allows us to sidestep the issues with the #1916 Hashable instances 👍
changelog/2021-11-06T20_01_38+01_00_use_sha256_for_cache_invalidation
Outdated
Show resolved
Hide resolved
Right, that's largely why I suggested the new commit message and PR cover letter, to better cover the reasoning for the change. Given the tradeoffs with speed versus uniformity that For that reason, I think it's a good idea to switch to SHA-256 and not use We really don't want the cache invalidation to pick up a stale entity in practice; using [edit] |
Good comments, thanks Leon and Peter. The cover letter can indeed be a lot better. I also should have included the reason why I'm suspicious of
Additionally, things like this do not sit well with me:
I realize these are somewhat cherry-picked examples, and that they are by no means proof that we can't use it. But given the seriousness of a stale cache hit and the tiny change needed I'd like to go through with this PR. I will fix the |
Before this commit, we used `hashable`s `hash`. For the purposes of cache invalidation, it is very important that if `hash a == hash b`, the chance that `a /= b` is vanishingly small. It is not clear that `hashable` satisfies this requirement, whereas a cryptographically secure hash such as SHA-256 clearly does.
708fe1a
to
11fab18
Compare
I thought about backporting it but... #2008. |
Before this commit, we used
hashable
shash
. For the purposes ofcache invalidation, it is very important that if
hash a == hash b
,the chance that
a /= b
is vanishingly small. It is not clear thathashable
satisfies this requirement, whereas a cryptographicallysecure hash such as SHA-256 clearly does.
TODO: