Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Imports #473

Merged
merged 22 commits into from
Jun 28, 2018
Merged

Alternative Imports #473

merged 22 commits into from
Jun 28, 2018

Conversation

f-f
Copy link
Member

@f-f f-f commented Jun 14, 2018

This PR implements dhall-lang/dhall-lang#172

Does not work yet, pushing this work in progress here for feedback, suggestions, etc.

So far:

  • the missing keyword works, but we might want to change the standard (as now it gets parsed at the level of all the other imports, and I'm not quite sure how to implement in another place)
  • the ImportAlt constructor for Expr is there, but does not work yet - currently stuck at parsing, not quite sure how to include the Import parameter in there yet, here.
    In general I don't think that constructor is right, so I'll think about this a bit harder and I welcome suggestions.

Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to change the original grammar. What you have is equivalent to the grammar even if implemented in a different way

@@ -473,6 +478,7 @@ instance Monad (Expr s) where
Field a b >>= k = Field (a >>= k) b
Project a b >>= k = Project (a >>= k) b
Note a b >>= k = Note a (b >>= k)
ImportAlt a _b _c >>= k = k a -- TODO: does this make sense?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this makes sense

Copy link
Member Author

@f-f f-f Jun 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part I'm not sure about is that the contained expressions (_b, _c) will not bind to k (as it happens with the other operators)

@@ -1889,6 +1912,7 @@ isNormalized e = case denote e of
else True
_ -> True
Note _ e' -> isNormalized e'
ImportAlt _ _ _ -> True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

ImportAlt _ b c -> isNormalized b && isNormalized c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, missed that, thanks 👍

importAltExpression embedded =
noted (do
l <- orExpression embedded
a <- embedded -- FIXME: broken
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just do:

let a = Missing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite, since we need to keep the parsers polymorphic (as it gets concrete only in exprA = expr import_).
I think the problem here is that the constructor is ImportAlt a (Expr s a) (Expr s a), and I'm not sure how to get an a from the embedded without running it (which we cannot, as we have to continue the parsing of the current tree), so that I can run the fold two lines later with (ImportAlt a). That's why I'm concerned that the constructor might not be right.

@Gabriella439
Copy link
Collaborator

@f-f: Another option is to remove the first field from ImportAlt and actually allow that constructor to be present in import-free expressions

Then what you could do is:

  • for type-checking, type-check the first branch
  • for normalization, normalize the first branch and return that

@f-f
Copy link
Member Author

f-f commented Jun 16, 2018

@Gabriel439: yeah this approach is much simpler, I have an almost-working implementation up in the latest commit.
Now I typecheck and normalize only the left branch, and the ? operator seems to work fine, e.g.:

❯ .stack-work/install/x86_64-osx/lts-11.12/8.2.2/bin/dhall <<< "1 ? 2"
Natural

1

However in case an import fails to resolve, the exception is thrown before the ImportAlt is removed:

❯ .stack-work/install/x86_64-osx/lts-11.12/8.2.2/bin/dhall <<< "1 ? missing"

↳ missing

Error: Missing keyword found

I'd like to catch it here, but I just realized that branch is never called and I should instead catch it and remove the ImportAlt somewhere around here

Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@f-f: The second catch you linked to here:

https://github.com/dhall-lang/dhall-haskell/pull/473/files#diff-af3f1fd4c90ee30ad5bc7e05d5bbb9d0R707

... rethrows the exception, so it shouldn't be interfering with the outward propagation of the exception

loadStaticWith from_import ctx n (ImportAlt a b) = do
resolveExpr a `catch` handler
where
resolveExpr expr = fmap join (traverse process expr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this to:

resolveExpr = loadStaticWith from_import ctx n

Copy link
Member Author

@f-f f-f Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, thanks!

However, that new branch that pattern-matches on ImportAlt never gets called, because the last one is called on the complete expression, and that traverses the tree calling always the first branch on every Import.

This happens because the process function is something like Import -> IO (Expr Src X), and because the (auto-derived) implementation of traverse for the ImportAlt constructor is:

traverse f (ImportAlt a b) = ImportAlt <$> traverse f a <*> traverse f b

Since instead of keeping the structure here we would need to get rid of this constructor and keep the first branch that doesn't fail, we should have instead:

traverse f (ImportAlt a b) = 
  catch (traverse f a)
        (\e -> let _ = (e :: SomeException) in traverse f b)

So basically I think that the simplest way to implement this is to replace that call to traverse with a call to a mapM' (maybe called importsMapM) that will reimplement traverse in all other cases except for ImportAlt.
(I'll probably pull in uniplate to do this)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@f-f: Have you tested the change I suggested? The reason I suggested it was because it should resolve the problem you are describing. By calling loadStaticWith on the two sub-expressions it should correctly try the branches for Embed/ImportAlt before doing the fall-back case for all other constructors. I think the reason your code has the problem was because you inlined the logic for the last branch for all other constructors

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, never mind. Now I understand the problem you are describing. The issue is an expression like this:

App (ImportAlt a b) c

Once it pattern matches on App and doesn't find ImportAlt or Embed then it skips straight down to the Embed constructors at the leaves of the tree

I think the correct thing to do here is to just spell out the other cases instead of relying on traverse + join

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which, I should probably simplify the AST a little bit to group operators and built-ins under one constructor each so that functions like these are easier to write

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, +1 on simplifying the AST with groupings

@quasicomputational
Copy link
Collaborator

quasicomputational commented Jun 18, 2018

I've just realised I've got a query about the semantics here. Which sorts of failures allow the fallback imports, and which should bubble up? Suppose I've got an import like ./a.dhall ? ./b.dhall, and a.dhall has a syntax error. Will that error be masked if b.dhall works? What if a.dhall is ill-typed?

So this suggests one more thing to note down in the spec (whether all errors should be candidates for fallback, or only, e.g., HTTP errors), and also that it would be nice to have another mode of operation that will unmask all masked errors (i.e., chase all import branches and collect errors from all of them, and also make sure that the ones that don't fail have the same type).

@Gabriella439
Copy link
Collaborator

@quasicomputational: Good point. I think the kind of errors that should trigger the fallback import are only if the import is missing (i.e. the file does not exist, the URL is a 404, or the environment variable is unset)

@quasicomputational
Copy link
Collaborator

We want to be more aggressive in falling back for HTTP errors than just 404s, since a primary use-case here is to swap out failing services for mirrors and failing services can break in a variety of ways. Probably anything except an (eventual, via redirects) 200 should fall back.

@f-f
Copy link
Member Author

f-f commented Jun 19, 2018

@quasicomputational I also have a concern about import hashes: e.g. should two alternative expressions have the same hash? If not, does it mean that when an import fails, we might fetch something else with a different hash?

(My take is that we should not enforce this "consistent hashing" on expressions unless the user requests it explicitly by putting an hash for the whole expression)

@Gabriella439
Copy link
Collaborator

@f-f: I don't think we should require the same hash or the same value for alternatives. A simple example for why you would the two alternatives to differ would be optionally retrieving an environment variable:

[ env:HOME as Text ] : Optional Text ? [] : Optional Text

@Gabriella439
Copy link
Collaborator

@quasicomputational: Yeah, I agree that fallback on non-200 response code should be fine

for @Expr@, with the exception for the @ImportAlt@ constructor, that
gets eliminated while we search for imports that fail to resolve.
-}
importMapM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this function seems specialized to loadStaticWith, I'd suggest inlining it in loadStaticWith, like this:

loadStaticWith from_import ctx n expr = case expr of
    ImportAlt a b -> ...
    Const a -> ...
    Var a -> ...

@f-f
Copy link
Member Author

f-f commented Jun 20, 2018

@Gabriel439 so I reimplemented traverse with the ImportAlt removal (from my tests everything looks fine, but I have some doubts on the correctness of cases like let and string interpolation, but I'll add tests to verify my doubts).
Also I think the style/formatting might not be super in line with the rest of the code, so feel free to add suggestions on that.

So, now we can have alternative expressions:

❯ ./dhall <<< "1 ? missing"
Natural

1

And the first expression to resolve is returned:

❯ ./dhall <<< "missing ? 1 ? 2"
Natural

1

However, this behaviour was a bit of a surprise:

❯ ./dhall <<< "env:UNSET_VARIABLE as Text ? missing" 

↳ missing

Error: Missing keyword found

Basically if all the expressions fail to resolve, only the last error is returned.
From the UX point of view it kind of makes sense, since you are supposed to fix that expression anyways.
However, if we want to do something about it we could provide some context to the user (or just return the list of previous failures in case all expressions fail to resolve, which is probably better).

@Gabriella439
Copy link
Collaborator

@f-f: The user experience should probably be to return all the errors if every branch fails, but with one caveat: any missing-related errors should not show up in that list. This is necessary so that the law missing ? e = e is true.

The way I would do this is to wrap any exception related to imports being missing into a new exception storing a list of exceptions. In other words:

newtype MissingImports = MissingImports [SomeException]

... and then the catch logic for ImportAlt would look something like:

loadStaticWith from_import ctx n (ImportAlt a b) =
    loadStaticWith from_import ctx n a `catch` handler₀
  where
    handler₀ (MissingImports es₀) =
        loadStaticWith from_import ctx n b `catch` handler₁
      where
        handler₁ (MissingImports es₁) =
            throwIO (MissingImports (es₀ ++ es₁))

... and then instead of:

throwIO MissingKeyword

... you do:

throwIO (MissingImports [])

... and then the pretty-printing logic for MissingImports displays all the import-resolution failures if the list is non-empty and displays something like "No valid imports" if the list is empty

@Gabriella439
Copy link
Collaborator

Another thing I realized is that we should specify that if a transitive import fails that should count as the parent import failing

@f-f
Copy link
Member Author

f-f commented Jun 24, 2018

@Gabriel439
Updates:

  • inlined importMapM into loadWithStatic
  • we now return the list of errors, I had to wrap the exceptions in a MissingImport container type in order to pass lists around, and this leads to some awkward cases (e.g. the two handlers here), but it's better than all the other alternatives I could think about
  • about the above point, not that I'm not wrapping some cases, e.g. InternalError, or errors from typechecking, which will except right away instead of getting into the list (I think)
  • the missing now correctly "disappears", but sometimes a bit too much (see last example)
  • I'm not sure how/where I should do the plumbing for your last comment about transitive imports failing

Examples of the current behaviour:

❯ ./dhall <<< "env:UNSET1 as Text ? env:UNSET2 ? missing ? env:UNSET3"

Error: Failed to resolve imports. Error list:

↳ env:UNSET1as Text

Error: Missing environment variable

↳ UNSET1

↳ env:UNSET2

Error: Missing environment variable

↳ UNSET2

↳ env:UNSET3

Error: Missing environment variable

↳ UNSET3
❯ ./dhall <<< "missing ? env:UNSET"

↳ env:UNSET

Error: Missing environment variable

↳ UNSET

However, I don't quite like this, because it doesn't display anymore the location of the error (I think the problem is here):

 ❯ ./dhall <<< "missing"

Error: No valid imports

Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to merge as is. I had just had some minor suggestions

show (MissingImport e) = show e

-- | List of Exceptions we encounter while resolving Import Alternatives
newtype MissingImports = MissingImports [MissingImport]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simpler way to do this is to define MissingImports as:

newtype MissingImports = MissingImports [SomeException]

... and then replace all occurrences of MissingImport with Control.Exception.toException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I'll try that 👍


-- | List of Exceptions we encounter while resolving Import Alternatives
newtype MissingImports = MissingImports [MissingImport]
deriving (Typeable)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to derive Typeable any longer. I recently learned that it's automatically derived for all Haskell types without having to specify this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. IIRC in our Haskell projects at work we have to derive it, but it probably depends on GHC version and/or extensions, I'll read up on this.

else do
m <- zoom cache State.get
case Map.lookup here m of
Just expr -> return expr
Nothing -> do
let handler
:: MonadCatch m
let handler₀
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to change the logic for this handler? My understanding is that this mainly affects how errors are rendered. In other words, if we didn't change the handler the error might look like:

↳ ./foo
  ↳ Failed to resolve imports.  Error list:

      ↳ ./bar

      ↳ ./baz

... whereas your change causes it to render like this:

Failed to resolve imports. Error list:

↳ ./foo

  ↳ ./bar

↳ ./foo

  ↳ ./baz

... but maybe the original way might be more clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary (with this implementation atleast):

The reason why we need to match & unwrap the MissingImports here is that if we don't we will have things like MissingImports (Imported ... (MissingImports [])) when we parse a missing.
This means that we will render an additional "No imports found" in place of the missing keyword, while we wanted to avoid that.
And we do need to wrap every import error in a MissingImports, so that the ImportAlt case works (so we have to do the unwrapping either here or there)

(I think my explanation might not be that clear, so I can re-explain more extensively if it doesn't make sense)

Not sure if there is a nicer way to do this whole exception piping 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Yeah, in this case keep what you have

@f-f
Copy link
Member Author

f-f commented Jun 25, 2018

Awesome, thanks! I'd like to add some tests before we merge.

@Gabriella439
Copy link
Collaborator

Alright! Sounds good

@Gabriella439
Copy link
Collaborator

Also, in terms of how to do this more nicely ...

One way to think about this is in terms of the following algebraic interpretation of import resolution:

  • x, y, and z are semantically Set [Import]
    • Each [Import] represents an import chains
    • The Set represents the set of alternative import chains that we can take
  • x * y means "x depends directly on y" (i.e. "x imported y" in the simple case)
    • i.e. x * y = liftA2 (++) x y if there were a Set Applicative
  • x + y is an alternative import (i.e. + is synonymous with our ?)
    • i.e. x + y = Data.Set.union
  • 0 is missing
    • i.e. 0 = Data.Set.empty
  • There is no Dhall analog of 1, yet (maybe there should be?)
    • i.e. 1 = Data.Set.singleton []

Anyway, the idea is that we can use the usual algebraic rules to reason about how imports work:

-- e ? missing = e
x + 0 = x

-- missing ? e = e
0 + x = x

-- (e₀ ? e₁) ? e₂ = e₀ ? (e₁ ? e₂)
(x + y) + z = x + (y + z)

-- If `x` imports `missing` that is the same as `missing`
x * 0 = 0 

-- If `x` depends on either `y` or `z` then that is the same as either
-- `x` depending on `y` or `x` depending on `z`
x * (y + z) = (x * y) + (x * z)

-- (Reading form right to left) If we depend on either `x` and `y` and
-- they each depend on `z` then we can guarantee that we always depend
-- on `z`
(x + y) * z = (x * z) + (y * z)

The other benefit of thinking about things this way is that it suggests that the way you are doing things right now is probably the right way because it is equivalent to reducing the import chain to an arithmetic normal form (i.e. a sum of products). I think the main thing we can do to make things nicer is just change the exception types to enforce the normal form (i.e. use the types to prevent nesting MissingImports inside of an Imported)

@Gabriella439
Copy link
Collaborator

Actually, the analogy is even more exact if you just replace + with ||, replace * with &&, replace 0 with False (for a missing import) and replace 1 with True (for a successful import). Then if the expression evaluates to True the import chain has a solution and the normal form is disjunctive normal form.

@f-f
Copy link
Member Author

f-f commented Jun 25, 2018

@Gabriel439 great analysis, it helps 👍 (I think I was already subconsciously convinced about all of this, thanks for putting it into words)

I already thought about some typed way to prevent nested MissingImports, but couldn't come up with anything meaningful, so I went with the manual unwrapping.

The closest thing to a solution I could come up with was making a Typeclass such that:

  • our MissingImports will contain only exceptions that implement it
  • all the exceptions we define will have an instance

But it wouldn't work, because I think our "exception set" is not closed, and we can have other exceptions around (e.g. I think HttpException), so MissingImports should really have a list of SomeException.
And OTOH MissingImports needs to implement Exception, as we need to be able to throw it, so if we want to prevent the nesting at the type level we should probably change approach.
Maybe Maybes instead of Exceptions?

@Gabriella439
Copy link
Collaborator

@f-f: Yeah, I don't think that properly fixing Dhall's import exception hierarchy is a requirement to merge this. I think you can merge this already as is

@f-f
Copy link
Member Author

f-f commented Jun 25, 2018

Great, thanks. I'll clean this up and try to get it ready to merge before the release (probably tomorrow).

@f-f
Copy link
Member Author

f-f commented Jun 27, 2018

@Gabriel439
I added some tests for the imports, and realized that tests have been broken for quite some commits, but Hydra didn't notice (to be fair Hydra does notice, the problem is that the build is marked green here on GitHub): the last commit passing tests was 2a2a136, and the first commit that builds but doesn't pass tests is d4dcf44. There are two commits between these two (bdeebfa and 8051981) that don't even build. However, Hydra marks all of them with the green mark, and that's because of the pwd build step that comes after the build always passes.
Opened #487 to track this.

Anyways, I'll look into the problem: regression for issue151 is not passing anymore so I think I did something wrong in the "inline traverse" commit, 8051981

@f-f
Copy link
Member Author

f-f commented Jun 27, 2018

@Gabriel439 fixed the failing test by throwing an exception when encountering a typecheck error, instead of adding it to the queue (which I think is the right behaviour).

So if we have an import that resolves successfully but fails to typecheck, Dhall won't try other alternatives. E.g.:

❯ ./dhall <<< "./tests/regression/issue151a.dhall ? 1"

Use "dhall --explain" for detailed errors

↳ ./tests/regression/issue151a.dhall

Error: Unbound variable

A

If that's fine (together with the new changes) we might be done here 🎉

@f-f f-f changed the title WIP: Alternative Imports Alternative Imports Jun 27, 2018
Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small comments

@@ -1207,6 +1222,8 @@ alphaNormalize (Note s e₀) =
Note s e₁
where
e₁ = alphaNormalize e₀
alphaNormalize (ImportAlt l₀ _r₀) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be:

alphaNormalize (ImportAlt l₀ r₀) =
    ImportAlt l₁ r₁
  where
    l₁ = alphaNormalize l₀
    r₁ = alphaNormalize r₀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You commented here:

for normalization, normalize the first branch and return that

The point is that by the time we run alphaNormalize there should be no ImportAlt constructors anymore.
I can add a comment on that line so it looks less surprising.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

tests/Import.hs Outdated
catch
(do
_ <- Dhall.Import.load actualExpr
return ())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this line should be a fail, too, otherwise if the import succeeds it will pass this test

@Gabriella439
Copy link
Collaborator

@f-f: Also, to clarify, I agree that the behavior should be to not fallback to other imports if type-checking an import fails.

@Gabriella439
Copy link
Collaborator

@f-f: I still think that alphaNormalize should preserve the ImportAlt constructor because we don't guarantee or require that it will always be used after betaNormalize

@f-f
Copy link
Member Author

f-f commented Jun 28, 2018

@Gabriel439 fixed that 👍

@Gabriella439
Copy link
Collaborator

@f-f: Alright, looks great! Feel free to merge

@f-f f-f merged commit 5416269 into master Jun 28, 2018
@f-f
Copy link
Member Author

f-f commented Jun 28, 2018

@Gabriel439 many thanks for all the feedback & support :)

@f-f f-f deleted the f-f/alternative-imports branch June 28, 2018 16:01
@Gabriella439
Copy link
Collaborator

@f-f: You're welcome! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants