Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove caching from Prelude while cached imports are α-normalized #733

Open
sjakobi opened this issue Sep 7, 2019 · 8 comments
Open

Remove caching from Prelude while cached imports are α-normalized #733

sjakobi opened this issue Sep 7, 2019 · 8 comments

Comments

@sjakobi
Copy link
Collaborator

@sjakobi sjakobi commented Sep 7, 2019

The Prelude currently uses hashes to cache local imports, for example here.

The downside of these hashes is that the protected imports are α-normalized, making them harder to read.

I suggest that until we have standardized a caching system that doesn't cause α-normalization, we remove these hashes.

@sjakobi sjakobi changed the title Remove caching from Prelude while caches are α-normalized Remove caching from Prelude while cached imports are α-normalized Sep 7, 2019
@Gabriel439

This comment has been minimized.

Copy link
Contributor

@Gabriel439 Gabriel439 commented Sep 8, 2019

So I don't think we should remove the internal hashes. They currently provide a lot of value by improving the performance of unprotected imports of the Prelude

However, I am open to the idea of just not α-normalizing imports when caching them or computing the integrity check. It would be a breaking change, but I think we can figure out a way to ease the migration (like the Haskell implementation providing backwards-compatible support for the old hashes).

@singpolyma

This comment has been minimized.

Copy link
Collaborator

@singpolyma singpolyma commented Sep 9, 2019

I would be strongly opposed to reducing the amount of alpha-normalization we do. In dhall-ruby I actually alpha-normalize every time I beta-normalize already. Removing this for computing the integrity check would be a big step backwards, in my opinion, since just changing a variable name would change the hash, thus significantly breaking the benefit of the "semantic" in semantic integrity check.

@Gabriel439

This comment has been minimized.

Copy link
Contributor

@Gabriel439 Gabriel439 commented Sep 9, 2019

@singpolyma: I think my comment in dhall-lang/dhall-haskell#1212 (comment) addresses half of your concern, by proposing a separate type of hash (i.e. cache:…) that behaves differently from the current semantic integrity check.

The idea is that if you tag an import with cache:… then it does not α-normalize the expression but it also does not reject a hash mismatch. In other words, the hash is used entirely for caching purposes and not security reasons.

@singpolyma

This comment has been minimized.

Copy link
Collaborator

@singpolyma singpolyma commented Sep 9, 2019

@Gabriel439

This comment has been minimized.

Copy link
Contributor

@Gabriel439 Gabriel439 commented Sep 9, 2019

@singpolyma: Regular hashes would still be cached, too. Both types of hashes would still be protected against cache poisoning because they would still be content-addressable. Specifically, the cache:… would both hash and cache the non-α-normalized form to avoid cache poisoning.

However, the more general reasoning behind the cache:… idea is that we currently don't capture the user's intent when we add integrity checks: sometimes they add them for security reasons and sometimes they add them for caching reasons, and by preserving that high-level intent in the code we can design around it (both in the language and in supporting tooling).

The main reason I haven't thought through a Haskell-implementation-specific fix is that it leads to an observable deviation from the standard, which users might interpret as bugs in other implementations. One of the things I'm actively avoiding is the Haskell implementation becoming the de facto Dhall standard in the same way that GHC became the de facto standard for Haskell.

@singpolyma

This comment has been minimized.

Copy link
Collaborator

@singpolyma singpolyma commented Sep 10, 2019

@sjakobi

This comment has been minimized.

Copy link
Collaborator Author

@sjakobi sjakobi commented Sep 10, 2019

I don't think an implementation choosing to preserve names would be a deviation from the standard

I don't see how the standard could be interpreted this way. It says

Cache the fully resolved, αβ-normalized expression, and encoded expression

… and later on

If the file exists and is readable, verify the file's byte contents match the hash and then decode the expression from the bytes using the decode judgment instead of importing the expression

In fact, I think it would be desirable to have a "transparent" cache, where a protected import is resolved to the exactly same expression, no matter whether it is in the cache or not.

@singpolyma

This comment has been minimized.

Copy link
Collaborator

@singpolyma singpolyma commented Sep 12, 2019

@sjakobi indeed, the cache should be as transparent as possible (which is one reason why in my implementation I always alpha-normalize even when there is no cache) however this is just as easily an argument in favour of name-preservation, since implementations are already allowed to return non-alpha-normalized from a protected expression that is not cached, also doing that when it is cached would not break the standard in any way.

Gabriel439 added a commit to Gabriel439/dhall-manual that referenced this issue Dec 1, 2019
Once an import with a semantic integrity check succeeds, it is a cache hit
afterwards.  This should perhaps change at some point as I've mentioned here:

* dhall-lang/dhall-lang#733 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.