standardise origin-based header configuration #1192

timbertson · 2021-07-09T11:41:32Z

This PR standardises support for per-host header configuration, as discussed in https://discourse.dhall-lang.org/t/supporting-transitive-imports-from-private-repositories/457.

The basic idea is that instead of headers being inline with an import (URL with toMap { Authorization = "TOKEN" }), you will be able to specify per-origin headers in a config file, e.g:

-- ~/.config/dhall/headers.dhall
toMap {
  `raw.githubusercontent.com:443` = toMap { Authorization = "TOKEN" }
}

There is also a (very early) implementation in a dhall-haskell PR: dhall-lang/dhall-haskell#2236

Unlike haskell, it's kinda hard to tell if I'm on the right track. I'm not sure how formally I need to define this (in terms of pseudo-code), feedback welcome :)

timbertson · 2021-07-09T11:45:44Z

standard/imports.md

+1. If the DHALL_HEADERS environment variable is set, interpret it as a dhall expression
+2. Otherwise, load the file at "$XDG_CONFIG_HOME/dhall/headers.dhall"
+2. If the XDG_CONFIG_HOME environment variable is not set, it is assumed to be `~/.cache` (i.e. `.cache` within the user's home directory).
+
+If DHALL_HEADERS is set, no configuration file is loaded. If a configuration file is not
+found at the searched path, it is treated as an empty list.


Should this section be written in pseudo-code instead? As well?

I'd add a fourth item to the list above:

4. Otherwise the is host map is assumed to be empty.

Or something like that.

BTW: We talk about two different maps here: The outer one (which I called "host map") and the inner one containing the actual headers. I think it was nice if we were naming those somehow in order to avoid confusion throughout the discussion.

"Otherwise" doesn't feel right to me, since steps 2/3 will always have a result, the file path we're looking for. I almost want pseudo-code here, but not really sure how to write that.

mmhat

Nice work so far! 👍

standard/imports.md

mmhat · 2021-07-09T12:09:13Z

standard/imports.md

+1. If the DHALL_HEADERS environment variable is set, interpret it as a dhall expression
+2. Otherwise, load the file at "$XDG_CONFIG_HOME/dhall/headers.dhall"
+2. If the XDG_CONFIG_HOME environment variable is not set, it is assumed to be `~/.cache` (i.e. `.cache` within the user's home directory).
+
+If DHALL_HEADERS is set, no configuration file is loaded. If a configuration file is not
+found at the searched path, it is treated as an empty list.


I'd add a fourth item to the list above:

4. Otherwise the is host map is assumed to be empty.

Or something like that.

BTW: We talk about two different maps here: The outer one (which I called "host map") and the inner one containing the actual headers. I think it was nice if we were naming those somehow in order to avoid confusion throughout the discussion.

mmhat · 2021-07-09T12:25:58Z

standard/imports.md

+(i.e. "Map Text (Map Text Text)" using the prelude `Map` type constructor)
+
+The key of this expression represents an HTTP(s) origin, including
+port - e.g. "https://github.com:443".


Do we want to support different headers for different endpoints? I.e https://domain.tld/path/to/endpoint and https://domain.tld/path/to/other/endpoint? 🤔

I've never heard of a case for headers other than authentication (and maybe setting appropriate accept headers for APIs), so can't imagine a path-specific use case.

I guess authentication is a good example here: Suppose you got one host with different applications serving different endpoints (e.g. a simple HTTP server with HTTP Basic Auth). Or a setup which needs to access endpoints of several projects of a GitLab instance (Not a terribly good setup IMHO, but those solutions exist).
At least these were the use cases I was thinking about when I wrote that comment.

one host with different applications serving different endpoints (e.g. a simple HTTP server with HTTP Basic Auth)

Do you mean two different apps with http basic auth on the same host? It feels pretty niche, assuming they're both hosting nonpublic dhall sources.

Or a setup which needs to access endpoints of several projects of a GitLab instance

I haven't used self-hosted gitlab, but would you have credentials per-project? I would have thought you'd use the same credentials for multiple projects.

Either way, I'm not convinced we need to support this now. I think it's reasonable to say that per-origin credentials are the overwhelming majority, and it's also how netrc works.

I think it would be possible to support paths in the future by simply including the prefix in the key, i.e. mygitlab.com:443/project1/. Since origins can't have a slash in them, we would be able to distinguish origin-only rules from path-specific rules. So if we do need to support it in the future, the current scheme won't get in the way.

Do you mean two different apps with http basic auth on the same host? It feels pretty niche, assuming they're both hosting nonpublic dhall sources.

Yes, I guess that's not the most common use case. Still, I am surprised how often I encounter those, especially as a workaround in organisations where keeping things private by default.

I haven't used self-hosted gitlab, but would you have credentials per-project? I would have thought you'd use the same credentials for multiple projects.

If you are a human user, yes, you have one account and authenticate with that. But if you are doing deployments within a larger organisation you normally use some token not connected to your identity. And if you need to access multiple projects throughout the deployment process you are likely using more than one. As an example, there is a long standing bug somewhere in the Docker bugtracker exactly because of this: The Docker daemon can only authenticate per-host, not per endpoint.

But maybe it is ok to add this in another proposal 🤔

mmhat · 2021-07-09T12:32:21Z

standard/imports.md

+(`using headers`), they are merged. If a header is specified in both locations,
+the user configuration takes precedence over the inline headers. This allows
+inline headers to be used as a fallback for compatibility with previous
+versions of dhall or users without custom configuration.


I think it was nice if implementations inform the user somehow if there are conflicting headers; Maybe even with an option to turn this into an error. I am a bit worried that this might lead to subtle bugs otherwise.

Gabriella439

My understanding is that you'll need to change two of the judgments in order to formalize this.

First, you'd need to change the judgment for resolving ordinary imports. Before the change it is currently this:

parent </> import₀ = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ import₀ ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

… and after the change you would need to "fork" it to create an additional rule for the case where we might obtain headers from an external configuration:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers
parent </> https://authority directory file using headers = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ https://authority directory file ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

parent </> import₀ = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ import₀ ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

Then you'd need to make a matching change to the judgment for imports that already specify explicit headers with using. Before the change it is currently:

(Δ, parent) × Γ₀ ⊢ requestHeaders ⇒ resolvedRequestHeaders ⊢ Γ₁
ε ⊢ resolvedRequestHeaders : H
H ∈ { List { mapKey : Text, mapValue : Text }, List { header : Text, value : Text } }
resolvedRequestHeaders ⇥ normalizedRequestHeaders
parent </> https://authority directory file using normalizedRequestHeaders = import
canonicalize(import) = child
referentiallySane(parent, child)
Γ₁(child) = e₀ using responseHeaders
  ; Append normalizedRequestHeaders to the above request's headers
corsCompliant(parent, child, responseHeaders)
(Δ, parent, child) × Γ₁ ⊢ e₀ ⇒ e₁ ⊢ Γ₂
ε ⊢ e₁ : T
──────────────────────────────────────────────────────────────────────────  ; * child ∉ Δ
(Δ, parent) × Γ₀ ⊢ https://authority directory file using requestHeaders ⇒ e₁ ⊢ Γ₂

… and after the change it would be:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers
ε ⊢ headers : List { mapKey : Text, mapValue : Text }
(Δ, parent) × Γ₀ ⊢ requestHeaders ⇒ resolvedRequestHeaders ⊢ Γ₁
ε ⊢ resolvedRequestHeaders : H
H ∈ { List { mapKey : Text, mapValue : Text }, List { header : Text, value : Text } }
headers # resolvedRequestHeaders ⇥ normalizedRequestHeaders
parent </> https://authority directory file using normalizedRequestHeaders = import
canonicalize(import) = child
referentiallySane(parent, child)
Γ₁(child) = e₀ using responseHeaders
  ; Append normalizedRequestHeaders to the above request's headers
corsCompliant(parent, child, responseHeaders)
(Δ, parent, child) × Γ₁ ⊢ e₀ ⇒ e₁ ⊢ Γ₂
ε ⊢ e₁ : T
──────────────────────────────────────────────────────────────────────────  ; * child ∉ Δ
(Δ, parent) × Γ₀ ⊢ https://authority directory file using requestHeaders ⇒ e₁ ⊢ Γ₂

timbertson · 2021-07-10T06:12:05Z

Thanks for the feedback.

@Gabriel439 I'll be honest, I can vaguely the judgement syntax but had no idea what to change, so thanks for the help :)

timbertson · 2021-07-10T06:19:34Z

standard/imports.md

@@ -617,6 +617,23 @@ then you retrieve the expression from the canonicalized path and transitively
 resolve imports within the retrieved expression:


+    Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME:~/.config}/dhall/headers.dhall") = headers


@Gabriel439 I changed this from your suggestion of:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers

But it still doesn't quite look right to me. The semantics I want are that:

if DHALL_HEADERS is set, evaluate it

otherwise, let BASE be XDG_CONFIG_HOME, or ~/.config if XDG_CONFIG_HOME is not set

load $BASE/dhall/headers.dhall, if it exists

The main difference with what's written here is that if DHALL_HEADERS contained invalid dhall, it should fail, rather than falling back to the file. And contrasted with your original suggestion, it will only ever attempt to load at most one file, it won't try XDG_CONFIG_HOME and then fallback to ~/.config if that file doesn't typecheck, for example.

Am I being too pedantic? Is there a way to represent what I mean?

Also, this is binding headers to the whole expression, but we want to extract the origin's key from the headers map, which I don't think this pseudocode is doing

The reason I wrote it that way is to try to piggy back on the semantics of the ? judgment as possible, even though it's still playing a bit fast and loose with notation, since we don't actually define the meaning of Γ(a ? b)

However, the reason why I like to reuse the same intuition as ? is because in #1181 we standardized that if if the first file contains invalid Dhall then it won't fall back to the second file

Oh awesome, I hadn't heard about #1181 but I love it.

I still don't see where it's dereferencing the specific origin headers within the overall configuration? It looks like this pseudocode is loading a single Map Text Text and using it for every request, but we want it to look for the origin's key in the overall Map Text (Map Text Text).

Oh yeah, you'll need to fix that

Hmm, I'm a little lost at how to do that. I want to introduce a siteHeaders binding for the value in the map (if any), should I add another function defined elsewhere? e.g.

getKey(userHeaders, origin) = siteHeaders

Or should I use some sort of pattern matching, like:

headers =[ ..., { mapKey = origin, siteHeaders }, ... ]

I don't really know what's valid in this pseudocode.

So the real answer is: anything goes for the notation, as long as the reader can figure out what's going on and translate it straightforwardly to code.

I think pattern matching would be fine in this case

Hmm, I tried this but got stuck because I couldn't figure out how to denote that origin is an input value while siteHeaders is an output variable to bind:

[ …, { mapKey = origin, mapValue = siteHeaders }, … ]

So for now I've introduced getKey with an inline description. Improvements welcome :)

standard/imports.md

timbertson · 2021-07-11T10:07:19Z

standard/imports.md

+The toplevel map is known as the "origin header configuration", and the individual maps
+which make up the keys of the toplevel map are known as the "per-origin headers".


@mmhat this is at attempt to address your "two different maps" comment. Better name ideas welcome, but I agree it's worth giving them names

I guess that's ok 👍 IMHO the names are not too important as long as everybody what we are talking about.

Gabriella439

I think the main thing this is missing is tests

The main test cases I'd suggest are:

A simple smoke test against https://httpbin.org/headers as Text where DHALL_HEADERS is set
A test where DHALL_HEADERS is set and the same headers are also set with the using keyword to test precedence
A header map with a malformed (but unused) origin in the header map

standard/imports.md

timbertson · 2021-07-22T11:12:21Z

I think the main thing this is missing is tests

Sounds good to me! I had a look, but I don't understand how they get run, in order to try it out. Specifically:

what should I run in dhall-haskell to try out my new tests against the current implementation?
how do I control DHALL_HEADERS in the test suite?

The current instructions include:

"You should make it so that the environment variable DHALL_TEST_VAR is set to
the string "6 * 7".

Which implies there's no support for per-test envs.

Also I looked at httpbin, the headers endpoint isn't going to be great because I can't just extract a single header (if only it had a dhall response type 😉 ), and there are nondeterministic headers in the response:

$ dhall repl
⊢ https://httpbin.org/headers as Text

''
{
  "headers": {
    "Accept-Encoding": "gzip",
    "Host": "httpbin.org",
    "X-Amzn-Trace-Id": "Root=1-60f94b7c-3983f920755332d1505fe2fd"
  }
}
''

We can test a single header value by using the user-agent, as long as implementations allow that header to override any internal default they might be using:

⊢ https://httpbin.org/user-agent using (toMap { `user-agent` = "test" }) as Text

''
{
  "user-agent": "test"
}
''

Gabriella439 · 2021-07-23T02:24:46Z

@timbertson: I think just adding a comment in ./tests/README.md explaining what you want to do plus a comment in the relevant test would be enough. It's really an "anything goes" situation with regards to test instructions 🙂

The way that most people do the test suite is they run the tests and then look more closely at whichever test fails, so if the test for this has a comment reminding the user to set the correct environment variable just for that test then it's probably good enough.

timbertson · 2021-07-23T06:23:00Z

Thanks. I've got tests running now, but the import tests aren't what I expected. There are both A.dhall and B.dhall files, but dhall-haskell's Test/Import.hs doesn't seem to be actually checking the expected output (if I change one, it still passes). Should this also be using Tasty.Silver.goldenVsAction like some other test suites?

timbertson · 2021-07-25T11:08:22Z

OK, tests added. Since the environment variable is the test case in these tests, I've introduced the concept of an optional ${testName}ENV.dhall containing the variables you should set, rather than trying to describe it in prose.

I haven't added your suggested case "A header map with a malformed (but unused) origin in the header map", because I wasn't quite sure what you meant. Do you mean a map key which is Text but isn't a hostname:port format? Or do you mean a dhall expression of the wrong type?

timbertson · 2021-07-26T10:54:13Z

(the above commit is a rebase + squash, since the commit history was messy).

What's the protocol for merging? It sounds like I could merge in 24h (if I'm reading CONTRIBUTING right). But should I wait until dhall-lang/dhall-haskell#2236 is nearly ready to merge? I assume it's inconvenient to have dhall-haskell only partially implement the standard if there are other standards changes you want to release, particularly if 21.0.0 is just about to be released.

Gabriella439 · 2021-07-26T23:14:24Z

@timbertson: You're clear to merge if you want, regardless of whether it has been implemented in dhall-haskell. It's up to you if you want to wait longer (e.g. after the dhall-haskell PR is merged or after the 21.0.0 release is cut).

timbertson · 2021-07-28T11:44:44Z

Cool, I'll wait until 21.0.0 since there's no real rush 👍

timbertson · 2021-08-10T11:34:34Z

I got itchy, I thought v21 was going to somewhat more imminent 😉

The implementation in dhall-haskell is also functionally complete now, though it hasn't seen much feedback so it'll probably still require a bit of rework before it's ready to merge.

Gabriella439 · 2021-08-13T03:54:22Z

@timbertson: No worries! I thought I would have time during my vacation to cut a release, but ended up waiting until I got back

timbertson commented Jul 9, 2021

View reviewed changes

mmhat reviewed Jul 9, 2021

View reviewed changes

Gabriella439 reviewed Jul 9, 2021

View reviewed changes

timbertson commented Jul 10, 2021

View reviewed changes

standard/imports.md Outdated Show resolved Hide resolved

timbertson commented Jul 11, 2021

View reviewed changes

Gabriella439 reviewed Jul 19, 2021

View reviewed changes

standard/imports.md Outdated Show resolved Hide resolved

timbertson mentioned this pull request Jul 23, 2021

Test/Import: check equivalence for import test cases dhall-lang/dhall-haskell#2261

Merged

Gabriella439 approved these changes Jul 25, 2021

View reviewed changes

timbertson force-pushed the http-headers branch from 1a4bf0f to b0eee41 Compare July 26, 2021 10:40

Standardise origin-based header configuration

3ffb2af

timbertson force-pushed the http-headers branch from b0eee41 to 3ffb2af Compare July 26, 2021 11:03

timbertson mentioned this pull request Jul 28, 2021

Support reading test-specific environment from *ENV.dhall file dhall-lang/dhall-haskell#2268

Merged

timbertson mentioned this pull request Aug 3, 2021

Support for headers.dhall configuration dhall-lang/dhall-haskell#2236

Merged

6 tasks

timbertson merged commit 481b26d into dhall-lang:master Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

standardise origin-based header configuration #1192

standardise origin-based header configuration #1192

timbertson commented Jul 9, 2021

timbertson Jul 9, 2021

mmhat Jul 9, 2021

timbertson Jul 10, 2021

mmhat left a comment

mmhat Jul 9, 2021

mmhat Jul 9, 2021

timbertson Jul 10, 2021

mmhat Jul 10, 2021

timbertson Jul 11, 2021

mmhat Jul 12, 2021

mmhat Jul 9, 2021

Gabriella439 left a comment

timbertson commented Jul 10, 2021

timbertson Jul 10, 2021

timbertson Jul 10, 2021

Gabriella439 Jul 10, 2021

timbertson Jul 11, 2021

Gabriella439 Jul 11, 2021

timbertson Jul 17, 2021

Gabriella439 Jul 17, 2021

timbertson Jul 18, 2021

timbertson Jul 11, 2021

mmhat Jul 12, 2021

Gabriella439 left a comment

timbertson commented Jul 22, 2021

Gabriella439 commented Jul 23, 2021

timbertson commented Jul 23, 2021

timbertson commented Jul 25, 2021

timbertson commented Jul 26, 2021

Gabriella439 commented Jul 26, 2021

timbertson commented Jul 28, 2021

timbertson commented Aug 10, 2021

Gabriella439 commented Aug 13, 2021

		@@ -617,6 +617,23 @@ then you retrieve the expression from the canonicalized path and transitively
		resolve imports within the retrieved expression:


		Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME:~/.config}/dhall/headers.dhall") = headers

		The toplevel map is known as the "origin header configuration", and the individual maps
		which make up the keys of the toplevel map are known as the "per-origin headers".

standardise origin-based header configuration #1192

standardise origin-based header configuration #1192

Conversation

timbertson commented Jul 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmhat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gabriella439 left a comment

Choose a reason for hiding this comment

timbertson commented Jul 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gabriella439 left a comment

Choose a reason for hiding this comment

timbertson commented Jul 22, 2021

Gabriella439 commented Jul 23, 2021

timbertson commented Jul 23, 2021

timbertson commented Jul 25, 2021

timbertson commented Jul 26, 2021

Gabriella439 commented Jul 26, 2021

timbertson commented Jul 28, 2021

timbertson commented Aug 10, 2021

Gabriella439 commented Aug 13, 2021