Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should all inferred types be β-normalized? #724

Closed
sjakobi opened this issue Sep 2, 2019 · 13 comments · Fixed by #745
Closed

Should all inferred types be β-normalized? #724

sjakobi opened this issue Sep 2, 2019 · 13 comments · Fixed by #745

Comments

@sjakobi
Copy link
Collaborator

sjakobi commented Sep 2, 2019

I've made a small experiment in the Haskell implementation where I removed most normalization steps in the type inference. To keep the acceptance tests green, and to satisfy a property test that checks that all inferred types are normalized, it was then sufficient to sprinkle very few normalizes in a few spots.

I wonder if it's worth upstreaming these changes into the standard.

For example the inferred type of type annotations would need to change:

The type of

True : { x = Bool }.x

would then be

Bool

instead of

{ x = Bool }.x

(The Haskell implementation currently gets this wrong)

The biggest benefit would be that we could get rid of the :⇥ helper and type inference implementations would no longer have to normalize the types of subexpressions.

@sjakobi
Copy link
Collaborator Author

sjakobi commented Sep 2, 2019

@Gabriel439 also pointed out that this would be a change to the soundness properties defined at

https://github.com/dhall-lang/dhall-lang/blob/5fea4544f6fe57acc034828a0a6ccebfd52aad15/.github/CONTRIBUTING.md#soundness-of-proposed-changes

@philandstuff
Copy link
Collaborator

I’ve been having similar thoughts but from a different angle: I’ve been rewriting dhall-golang to a normalisation-by-evaluation algorithm, and the paper I’m working from has a typechecking algorithm that always returns types in normal form. So it would be very convenient for my rewrite if I could always normalize the output of the type inference function.

@MonoidMusician
Copy link
Collaborator

It's interesting because I am doing the opposite in my implementation: I am picking (extremely) unnormalized inferred types, for two reasons: I need to make sure each node stays in the same context, and I hope it will be better for the user to keep names (especially let-bound names) visible for longer. So this means that let Foo = {} in {=} : Foo has inferred type let Foo = {} in Foo and, perhaps most strangely, (\(A : Type) -> \(a : A) -> a) Natural has inferred type (\(A : Type) -> forall (a : A) -> A) Natural (which is judgmentally equivalent to Natural -> Natural, but note that each node in the output occurs in the same context as the corresponding node in the input). However, if the name is free, then I do simplify it: let foo = 1 in {=} just has inferred type {} (and any variables with the same name are shifted by -1, of course). (I think it's easier to implement shifting, rather than substitution, because it commutes with typechecking and normalization, but I might want to revisit that decision ...)

@philandstuff
Copy link
Collaborator

@MonoidMusician I can kind of see what you’re doing but I’m not sure about it. Why do you want to keep the bound names around? Is it for error reporting? Your let Foo = {} in Foo example is cute but definitely not conformant with the standard.

Dhall-Haskell solves a similar problem by ensuring the context of bound variables and their types is part of the type error value, so that it can output types as the standard says to infer them but can give extra contextual information to help the user if there’s a type error.

@MonoidMusician
Copy link
Collaborator

Imagine some large type MyLargeType, maybe one that represents an HTML tree. Would you rather see a type error that looks like Could not match let MyLargeType = ... in MyLargeType with let MyLargeType = ... in Text -> MyLargeType or the expanded version of that? I think it will be much, much better for the user to see the names they used to describe things, rather than the expanded type synonyms, which are very very hard to parse/analyze to the inexperienced eye. Having the context available is actually very hard to use if the type is normalized, because then the user has to guess where each (arbitrarily large) part of the normalized type came from in the context. They can always normalize the result of type inference themselves, of course (and I will most likely display the individual unnormalized inferred types along with a diff of their normalized types). I'm essentially saying that the result of inferring the type of subnodes should be used verbatim (i.e. without substitution, or with a variable shifted out) in the larger inferred type when possible, because I believe it will be better for users to see the provenance of each part of the output. (And I have various other mechanisms for keeping track of provenance, too.) Furthermore, regarding the compliance with the standard, I would argue that any reliance on the output of type inference should reference its normalized result, as is already recommended.

@ocharles
Copy link
Member

ocharles commented Sep 4, 2019

I think this is another point to @AndrasKovacs point that you very rarely want to see normal forms.

@Gabriella439
Copy link
Contributor

I think we need to first state the requirement here (i.e. what problem we are trying to solve), otherwise we don't have a way to evaluate proposed solutions.

My understanding is that the original problem @sjakobi was trying to solve was to reduce the incidence of duplicate normalizations (i.e. the same expression being normalized twice in a row) for performance reasons.

There's also a countervailing requirement that @MonoidMusician added that inferred types remain readable.

Now there are roughly three proposed solutions that I see going forward:

  • Do nothing (i.e. leave the standard as is)
  • Do what @sjakobi proposes and rework the standard to always infer normalized types
  • Standardize @MonoidMusician's solution to defer normalization of inferred types

I'll give my own thoughts in a follow-up comment.

@Gabriella439
Copy link
Contributor

I personally don't think we should go with the first option ("Do nothing") if we have people available to standardize one of the latter two options. The reason why is that the standard is not consistent with regards to satisfying either requirement. In some places the standard infers types with normal forms and in other places the standard infers types with non-normal forms, and we shouldn't rely on historical accident of how things were standardized to decide whether certain inferred types are normalized; that's a decision we should be making intentionally.

Given the choice between the two latter requirements I weakly prefer to standardize what @sjakobi proposes (update the standard to infer normalized types), mainly because it is easier to objectively evaluate whether or not an inferred type is in normal form, but it's harder to evaluate whether an inferred type is readable. I foresee the requirement of preserving readability will bog down any proposals that touch type inference due to lack of an objective criteria for resolving disagreements.

That said, I don't think we should necessarily constrain @MonoidMusician's implementation if they want to experiment in this regard. In particular, I think we should make this requirement non-binding (i.e. "MAY" instead of "MUST") so that implementations have the option to infer judgmentally equivalent types.

@MonoidMusician
Copy link
Collaborator

Yes, that sounds reasonable to me :)

@singpolyma
Copy link
Collaborator

Reducing normalization calls in the type inference logic would seem like a big win if possible

@Profpatsch
Copy link
Member

Profpatsch commented Sep 9, 2019

After some heavy end-user dhall programming over the last few days, @MonoidMusician’s idea of keeping around assigned names sounds magnificent to me.
Reading error messages requires the user to map from expanded forms to their internal understanding of the code constantly, so it should be paramount to keep their naming in error messages as much as possible.

I think this really only refers to let-bindings in practice, so maybe a compromise can be found.

@Gabriella439
Copy link
Contributor

@Profpatsch: I think you might have meant to comment in #733 or dhall-lang/dhall-haskell#1212. This issue is strictly about β-normalization

@Gabriella439
Copy link
Contributor

@Profpatsch: Oops, never mind. You're commenting in the correct issue. I thought you were talking about α-normalization, but you're actually talking about preserving type synonyms.

sjakobi added a commit that referenced this issue Sep 23, 2019
Quoting from `type-inference.md`:

> Types inferred according to the following rules will be in β-normal form, provided that
> all types in the input context `Γ` are in β-normal form.
> The advantage of β-normalizing all inferred types is that repeated normalization of the
> same types can largely be avoided.
>
> However implementations MAY choose to infer types that are not β-normalized as long
> as they are equivalent to the types specified here.

For the `merge`, `toMap` and type annotation rules, this patch changes the result to be
inferred from the term to the left of the colon, instead of returning the annotation on the
right. This has two advantages:

1. With the inferred type of the term we already have a β-normalized type, so we can save
ourselves the hassle of (explicitly) normalizing the annotation.
2. The inferred type is now stable with regard to normalization. This removes
[a little gotcha](dhall-lang/dhall-haskell#1310) for implementations
that normalize unprotected imports, causing them to infer different types depending on how
the expression is accessed.

Closes #724.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants