Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for headers.dhall configuration #2236

Merged
merged 25 commits into from
Sep 25, 2021

Conversation

timbertson
Copy link
Collaborator

@timbertson timbertson commented Jun 29, 2021

This PR implements support for per-host header configuration, as standardised in dhall-lang/dhall-lang#1192.

The basic idea is that instead of headers being inline with an import (URL with toMap { Authorization = "TOKEN" }), you will be able to specify per-origin headers in a config file, e.g:

-- ~/.config/dhall/headers.dhall
toMap {
  `raw.githubusercontent.com:443` = toMap { Authorization = "TOKEN" }
}

TODO:

@Gabriella439
Copy link
Collaborator

@timbertson: Sorry for the delayed review!

My first general comment is that I usually do the standardization in dhall-lang first before I begin work on the Haskell implementation, for two main reasons:

  • Forcing things to go through the standard acts like a simplicity filter

    The natural deduction pseudo-code we use usually keeps things from growing too complex, because it is so restricted in what it can express

  • Working on the standard first helps me to design the feature and solicit feedback at a higher level

That said, I don't foresee any issue standardizing this (it seems pretty easy to specify using our existing notation)

@timbertson
Copy link
Collaborator Author

I think this is getting close to ready. I'm pretty happy with how everything hangs together now.

The one thing I'm not quite satisfied with is the error reporting. When loading the headers, we need a fresh Status (because e.g. it shouldn't attempt remote imports, or loading headers recursively). I copy the _stack from the real import though in order to represent a useful call stack (i.e. "why was the header file being loaded").

But it doesn't seem to work quite right. In some cases it reports a good looking stack, but then also reports an additional error:

$ env DHALL_HEADERS=https://example.com ./dist/build/Dhall/dhall --file dhall-lang/tests/import/success/userHeadersA.dhall
dhall:
↳ https://httpbin.org/user-agent as Text
  ↳ env:DHALL_HEADERS
    ↳ https://example.com/

Error: Cannot import a remote URL from the headers configuration expression.


1│ https://example.com

1:1

The first half is good, but I think it's trying to display some dhall code after 1:1 and struggling.

In other cases, it doesn't show a stack at all:

$ env 'DHALL_HEADERS={ foo }' ./dist/build/Dhall/dhall --file dhall-lang/tests/import/success/userHeadersA.dhall

Use "dhall --explain" for detailed errors

Error: Unbound variable: foo

1│ { foo }

1:1

Is this because of the type of the error?

A parse error seems to be reported well, at least:

$ env 'DHALL_HEADERS={ foo' ./dist/build/Dhall/dhall --file dhall-lang/tests/import/success/userHeadersA.dhall
dhall:
↳ https://httpbin.org/user-agent as Text
  ↳ env:DHALL_HEADERS

Error: Invalid input

1:6:
  |
1 | { foo
  |      ^
unexpected end of input
expecting ',', =, operator, whitespace, or }

@timbertson
Copy link
Collaborator Author

(the error handling code in question is in Import.hs siteHeadersLoader)

@timbertson timbertson marked this pull request as ready for review August 3, 2021 10:56
@timbertson timbertson changed the title [WIP] support for headers.dhall configuration Support for headers.dhall configuration Aug 3, 2021
@Gabriella439
Copy link
Collaborator

@timbertson: So I think the solution to this problem is to stick to dhall-lang/dhall-lang#1192 as written. In particular, this section of the import resolution logic:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall ? []) = userHeaders
getKey(userHeaders, origin, []) = headers  ; Extract the first `mapValue` from `userHeaders`

The precise interpretation of that is that the userHeaders fetched from one of the standard headers locations is not further transitively resolved (or even interpreted at all). In other words, it needs to be a record. So based on that interpretation then you don't need to use the _stack at all to transitively resolve the headers expression; just require the imported expression to be a record. Specifically, the siteHeadersLoader function would use Dhall.Import.assertNoImports instead of loadWithStatus

@timbertson
Copy link
Collaborator Author

Thanks, I'll give that a go. Strangely, I was suddenly struck by #2024 so I'm a bit stuck for now :/

@timbertson
Copy link
Collaborator Author

Hmm, I like that it's simpler, but it gives worse error messages:

$ env DHALL_HEADERS=https://example.com ./dist/build/Dhall/dhall --file dhall-lang/tests/import/success/userHeadersA.dhall
dhall:
Import resolution is disabled

I'd really like for any errors to be reported with an evaluation stack containing either env:DHALL_HEADERS or $HOME/.config/dhall/headers.dhall, because if I got this error message with no further detail, I'd assume that the original import was inexplicably disallowed, rather than realising that the error is about the headers expression.

Should I give up the notion of merging this with a stack of imports, and just wrap all exceptions from this function with a new SiteHeadersLoadException or something, which explicitly tracks the source (envvar or file)?

@Gabriella439
Copy link
Collaborator

@timbertson: What you could do is create a custom exception just for the headers file. The idea is that instead of using assertNoImports, you write a function which uses the same logic except with a different exception (since it's implementation is very simple). The custom exception could just say something like:

Import resolution is disallowed in the headers file located at:

↳ ${PATH of headers.dhall file}

It still won't have the stack of how it got there, but it's probably enough information for the user to figure out how to fix the immediate problem.

@timbertson
Copy link
Collaborator Author

So it turns out disabling imports is unfortunately not what I want. The primary use case in mind that I have for this feature would use the following headers file:

toMap {
  , `raw.githubusercontent.com:443` = toMap {
    , Authorization = "token ${env:GITHUB_TOKEN as Text}"
    , User-Agent = "dhall"
    , Accept = "application/vnd.github.4.raw"
    }
}

If I'm sharing dhall expressions it's much easier to have folks set GITHUB_TOKEN and use this standard file, instead of making a copy with their own token inlined. But env:GITHUB_TOKEN is an import.

So I definitely would like to support that, does this mean I need to amend the standard? I'd be OK with it supporting arbitrary local imports in the headers expression, although env:* is the only one I'd actually care about.

@Gabriella439
Copy link
Collaborator

Gabriella439 commented Aug 17, 2021

@timbertson: I would be okay with that if the transitive import behavior were behind a flag, mainly because it's technically deviating from the standard until the standard is amended

@timbertson
Copy link
Collaborator Author

timbertson commented Aug 17, 2021 via email

@Gabriella439
Copy link
Collaborator

The main thing I want to avoid is the Haskell implementation making changes ahead of the standard, because I want to ensure that no Dhall implementation is privileged over others.

What makes this particular situation a little complicated is that if we interpret the standard as written as broadly as possible to permit transitive import resolution then it could potentially lead to infinite loops for remote imports. That's one of the reasons I'm going in the other direction and reading the standard as narrowly as possible in order to be conservative and not implement behavior that other implementations haven't agreed to via the standardization process.

That said, there are multiple possible solutions to that which I would be fine with:

  • We guard the transitive import resolution behind a command-line flag (the suggestion I just made)

  • We postpone merging this feature until we've amended the standard to support transitive import resolution in headers.dhall

    That would mean cutting a 1.40 Haskell release that only supports standard version 21.0.0 except for headers.dhall support

  • We merge the headers.dhall support without transitive import resolution before cutting Haskell release 1.40 and then add transitive import resolution in a future release when the standard is amended

  • We merge your suggestion to only transitively resolve non-remote imports in headers.dhall

    … and declare that this was close enough to our original intent in standardizing the change and treat the upstream change to the standard as more of a clarification pull request.

@timbertson
Copy link
Collaborator Author

Now I feel bad for merging the standard PR right before the 21.0.0 cut 😬 .

I'd be happiest with either options 2 or 4. I'd rather not ship this until it does what I intended. For both of those, we probably want to work on this PR and a standard update PR in parallel, and whichever is ready first determines which path we take?

That would mean cutting a 1.40 Haskell release that only supports standard version 21.0.0 except for headers.dhall support

I'd be fine with that, but also: if the standard change is small / uncontroversial, could v1.40 simply support that version of the standard (v21.1.0 or v22.0.0)? Or do you want to release a version of dhall-haskell for each major standard version?

@Gabriella439
Copy link
Collaborator

@timbertson: I think the easiest route would be to cut a 1.40.0 release without merging this and then cut a small 1.40.1 release right after adding support for headers.dhall

@timbertson
Copy link
Collaborator Author

timbertson commented Aug 18, 2021 via email

Copy link
Collaborator

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is getting pretty close. Just one more change I can think of

@@ -35,3 +37,6 @@ fetchFromHttpUrl childURL Nothing = do
return body
fetchFromHttpUrl _ _ =
fail "Dhall does not yet support custom headers when built using GHCJS"

originHeadersFileExpr :: IO (Expr Src Import)
originHeadersFileExpr = return (Missing)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
originHeadersFileExpr = return (Missing)
originHeadersFileExpr = return Missing

-- OriginHeaders is identical to Code, except local imports with this mode
-- are allowed from any source (local or remote). This type is only used when
-- loading Origin headers, it can't be set by the user.
data ImportMode = OriginHeaders | Code | RawText | Location
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a new ImportMode if we implement this in the same way as the standard specifies. I'll comment elsewhere on how I think we can make this work without the OriginHeaders mode


fetchFromHttpUrl :: URL -> Maybe [HTTPHeader] -> StateT Status IO Text.Text
fetchFromHttpUrl childURL mheaders = do
Status { _loadOriginHeaders } <- State.get

originHeaders <- _loadOriginHeaders
Copy link
Collaborator

@Gabriella439 Gabriella439 Sep 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made one mistake when I previously suggested using the ambient StateT for _loadOriginHeaders. We do still want _loadOriginHeaders to use StateT, but we want the _stack to be temporarily reset to contain just one import, which is the path to origin headers. This is because of this line from the standard:

(ε, headersPath) × Γ₀ ⊢ userHeadersExpr ⇒ userHeaders ⊢ Γ₁ ; Resolve userHeadersExpr with an empty import context

The key bit is the (ε, headersPath) part, which says that when resolving the userHeadersExpr you want the stack to contain only the headers path and nothing else.

However, we don't have to reset the stack here within the fetchFromHttpUrl function. Instead, I think the best place to do that would be inside of the _loadOriginHeaders function. I'll comment below with some more notes about how to do that.

Comment on lines 1055 to 1065
originHeadersLoader :: IO (Expr Src Import) -> StateT Status IO OriginHeaders
originHeadersLoader headersExpr = do
partialExpr <- liftIO headersExpr

let go key₀ key₁ = do
let expected :: Expr Src Void
expected =
App List
( Record $ Core.makeRecordField <$>
Dhall.Map.fromList
[ (key₀, Text)
, (key₁, Text)
]
)

let suffix_ = Dhall.Pretty.Internal.prettyToStrictText expected
let annot = case loadedExpr of
Note (Src begin end bytes) _ ->
Note (Src begin end bytes') (Annot loadedExpr expected)
where
bytes' = bytes <> " : " <> suffix_
_ ->
Annot loadedExpr expected

_ <- case (Dhall.TypeCheck.typeOf annot) of
Left err -> throwMissingImport (Imported _stack err)
Right _ -> return ()

return (Core.normalize loadedExpr)

let handler₀ (e :: SomeException) = do
{- Try to typecheck using the preferred @mapKey@/@mapValue@ fields
and fall back to @header@/@value@ if that fails. However, if
@header@/@value@ still fails then re-throw the original exception
for @mapKey@ / @mapValue@. -}
let handler₁ (_ :: SomeException) =
throwMissingImport (Imported _stack e)

handle handler₁ (go "header" "value")

headersExpression' <-
handle handler₀ (go "mapKey" "mapValue")

return url { headers = Just (fmap absurd headersExpression') }

normalizeHeaders url = return url
loaded <- loadWith (ImportAlt partialExpr emptyOriginHeaders)
headers <- liftIO (toOriginHeaders loaded)

-- return cached headers next time
_ <- State.modify (\state -> state { _loadOriginHeaders = return headers })

return headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where you will need to reset the stack to contain just the headers file. However, it will require restructuring things a little bit because the type of originHeadersLoader doesn't permit it to know what path the headersExpr came from. You might need to change it so that the headersExpr returns the original Import that the Expr came from so that originHeadersLoader can set the stack to that Import

Copy link
Collaborator Author

@timbertson timbertson Sep 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to change it so that the headersExpr returns the original Import that the Expr came from so that originHeadersLoader can set the stack to that Import

Hmm, I think I removed my ability to do this 🤦

headersExpr in the normal case is env:DHALL_HEADERS ? ~/.config/dhall/headers.dhall. So you can't know which branch was successful until after loadWith is called, but you need to give loadWith a stack. (Also, I don't see how loadWith could give back the successful Import since it returns an Expr Src Void)

I've pushed an alternative solution: when loading headers, we just start with the root import. I believe this is generally either . (for an expression) or the path to the main file (if using --file). So this should always be local.

The trick is I had to ensure reentrant calls to load the headers didn't also reset the stack, since that's how we'd detect the Cycle error. Unfortunately that didn't work in practice (see this comment), but conceptually that's how it's working.

I think I could remove that hack by also popping the most recent item off the stack in reentrant calls (so that it doesn't look like the headers are being loaded from the remote import), but it didn't seem worth that extra complexity to achieve the same result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that indicates that there is a bug in the standardized behavior. The bug is that, as you noted, resetting the stack as the standard specifies interferes with cyclic import detection.

What if we were to change the standard to not reset the stack (like you just implemented in your most recent changes) and instead change the logic to:

(Δ, parent, headersPath) × Γ₀ ⊢ userHeadersExpr ⇒ userHeaders ⊢ Γ₁

In other words, we append the headers path to the stack (not including the current URL import that the headers will be used for) when resolving the headers file.

status <- State.get

let headerLoadStatus = status {
_stack = pure (NonEmpty.last (_stack status)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change this to _stack = headerImport :| NonEmpty.tail (_stack status) then it will do the right thing and not trigger the referential sanity check. Then I think you wouldn't need the reentrantLoad work-around.

That does, however, mean that you would need to get the headerImport which is currently not in scope, but I'll leave a separate comment for that.

defaultOriginHeaders :: IO (Expr Src Import)
defaultOriginHeaders = do
fromFile <- originHeadersFileExpr
return (Note headersSrc (ImportAlt envOriginHeaders (Note headersSrc fromFile)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we won't be able to use ImportAlt here since, as you noted, it won't let us remember which import succeeded. i think you'd need to change the type from IO (Expr Src Import) to IO [Import] to get this to work. It would also improve the error messages, too, I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went down this path, but I couldn't see how to make it work. The fallback code is implemented by loadWith. I considered extracting the logic from the ImportAlt branch, as a function with signature StateT Status IO (Expr Src Void) -> StateT Status IO (Expr Src Void) -> StateT Status IO (Expr Src Void). I couldn't just take two Expr Src Void, since we need to provide a different _stack in each branch (based on the header we're importing).

But it seems we need to use the full loadWith, since that's also the code which does the Cycle detection. In particular, if we manually construct _stack = headerImport :| NonEmpty.tail (_stack status) then that would mean we'd need to also implement the cycle detection here too. That seems messy and repetitive.

Am I missing something?

I did try an alternate, which is to just literally use the parent stack. It removed the need to special-case reentrant loads, which is nice. It requires a runtime assertion (because we can't prove the parent stack is nonempty with types). It also shouldn't ever hit a referentially opaque import error, although that's an implementation detail (because we cache on the first load). I've made that change in c77f5a7

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timbertson: Oh, I just figured out a (hacky) way to do this, but at least it doesn't require the partial function: instead of trying to figure out which headers.dhall we imported, just add both of them to the stack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, would errors then report that e.g. ~/.config/dhall/headers.dhall was imported from env:DHALL_HEADERS (or vice versa)? The partial function shouldn't ever fail (despite being a bit yuck), right? 🤞

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. Then we'll stick with what you have for now

@timbertson
Copy link
Collaborator Author

🥳 hooray, thanks for all the assistance getting this in shape.

I'm fixing CI issues, currently stuck with this hydra error:

haddockPhase
Error: Incomplete haddocks

I'm not sure what to run to reproduce locally. I ran cabal v1-haddock and I get a few warnings, but nothing obvious. I thought it was referring to the <100% coverage in Import.hs, but dd515d9 doesn't seem to have fixed it.

@Gabriella439
Copy link
Collaborator

CI checks for either "Missing documentation for" or "Warning: … is out of scope"

! (./Setup haddock 2>&1 | grep --quiet 'Missing documentation for:\|Warning:.*is out of scope') || (echo "Error: Incomplete haddocks"; exit 1)

@timbertson
Copy link
Collaborator Author

Ah OK. I think I've fixed those, but now one hydra job is dying with:

/nix/store/p61072rqa772qnwqhbn9xy9nbpcxw7cb-stdenv-linux/setup: line 1316: 3437713 Killed                  ./Setup build
builder for '/nix/store/q87zqrqwnizj0zpvr25jd2cjw353613l-dhall-1.40.1.drv' failed with exit code 137

Which I think means out of memory. Is that a known problem? I couldn't see any way to retry the job, nix is too good at caching ;)

@Gabriella439
Copy link
Collaborator

It's a known problem. The machine sometimes runs out of memory. I can retry the job

@Gabriella439 Gabriella439 merged commit 33fc498 into dhall-lang:master Sep 25, 2021
@Gabriella439
Copy link
Collaborator

@timbertson: Thank you for doing this! 🙂

@timbertson
Copy link
Collaborator Author

I'm slightly reluctant to ask 😉 , but does the final approach imply any necessary changes to the standard? IIRC the standard loads headers with an empty context (import chain), but what I ended up doing was to load headers with the regular context. In order to circumvent the referentially opaque check, you need to either:

  • only ever load headers on the first import (guaranteed to be local)
  • disable the referentially opaque import check when loading headers

For efficiency the first is what every implementation will want to do, but I'm not sure which is easier to standardize.

@timbertson timbertson deleted the user-headers branch October 1, 2021 06:43
@Gabriella439
Copy link
Collaborator

Yes, the final approach will entail a change to the standard, but I can take care of that. It should be a small change

@timbertson
Copy link
Collaborator Author

Thanks 🙂

Gabriella439 added a commit to dhall-lang/dhall-lang that referenced this pull request Oct 21, 2021
The motivation for this change is the discussion here:

dhall-lang/dhall-haskell#2236 (comment)

If we correctly set the parent import for the `headers.dhall`
expression then the cyclic import detection will correctly reject
loops induced by `headers.dhall` having transitive remote imports
of its own.
@Gabriella439
Copy link
Collaborator

Gabriella439 added a commit to dhall-lang/dhall-lang that referenced this pull request Oct 29, 2021
The motivation for this change is the discussion here:

dhall-lang/dhall-haskell#2236 (comment)

If we correctly set the parent import for the `headers.dhall`
expression then the cyclic import detection will correctly reject
loops induced by `headers.dhall` having transitive remote imports
of its own.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants