Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardise origin-based header configuration #1192

Merged
merged 1 commit into from
Aug 10, 2021

Conversation

timbertson
Copy link
Collaborator

This PR standardises support for per-host header configuration, as discussed in https://discourse.dhall-lang.org/t/supporting-transitive-imports-from-private-repositories/457.

The basic idea is that instead of headers being inline with an import (URL with toMap { Authorization = "TOKEN" }), you will be able to specify per-origin headers in a config file, e.g:

-- ~/.config/dhall/headers.dhall
toMap {
  `raw.githubusercontent.com:443` = toMap { Authorization = "TOKEN" }
}

There is also a (very early) implementation in a dhall-haskell PR: dhall-lang/dhall-haskell#2236

Unlike haskell, it's kinda hard to tell if I'm on the right track. I'm not sure how formally I need to define this (in terms of pseudo-code), feedback welcome :)

Comment on lines 744 to 749
1. If the DHALL_HEADERS environment variable is set, interpret it as a dhall expression
2. Otherwise, load the file at "$XDG_CONFIG_HOME/dhall/headers.dhall"
2. If the XDG_CONFIG_HOME environment variable is not set, it is assumed to be `~/.cache` (i.e. `.cache` within the user's home directory).

If DHALL_HEADERS is set, no configuration file is loaded. If a configuration file is not
found at the searched path, it is treated as an empty list.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this section be written in pseudo-code instead? As well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a fourth item to the list above:

4. Otherwise the is host map is assumed to be empty.

Or something like that.

BTW: We talk about two different maps here: The outer one (which I called "host map") and the inner one containing the actual headers. I think it was nice if we were naming those somehow in order to avoid confusion throughout the discussion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Otherwise" doesn't feel right to me, since steps 2/3 will always have a result, the file path we're looking for. I almost want pseudo-code here, but not really sure how to write that.

Copy link
Collaborator

@mmhat mmhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work so far! 👍

standard/imports.md Outdated Show resolved Hide resolved
Comment on lines 744 to 749
1. If the DHALL_HEADERS environment variable is set, interpret it as a dhall expression
2. Otherwise, load the file at "$XDG_CONFIG_HOME/dhall/headers.dhall"
2. If the XDG_CONFIG_HOME environment variable is not set, it is assumed to be `~/.cache` (i.e. `.cache` within the user's home directory).

If DHALL_HEADERS is set, no configuration file is loaded. If a configuration file is not
found at the searched path, it is treated as an empty list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a fourth item to the list above:

4. Otherwise the is host map is assumed to be empty.

Or something like that.

BTW: We talk about two different maps here: The outer one (which I called "host map") and the inner one containing the actual headers. I think it was nice if we were naming those somehow in order to avoid confusion throughout the discussion.

(i.e. "Map Text (Map Text Text)" using the prelude `Map` type constructor)

The key of this expression represents an HTTP(s) origin, including
port - e.g. "https://github.com:443".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to support different headers for different endpoints? I.e https://domain.tld/path/to/endpoint and https://domain.tld/path/to/other/endpoint? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of a case for headers other than authentication (and maybe setting appropriate accept headers for APIs), so can't imagine a path-specific use case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess authentication is a good example here: Suppose you got one host with different applications serving different endpoints (e.g. a simple HTTP server with HTTP Basic Auth). Or a setup which needs to access endpoints of several projects of a GitLab instance (Not a terribly good setup IMHO, but those solutions exist).
At least these were the use cases I was thinking about when I wrote that comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one host with different applications serving different endpoints (e.g. a simple HTTP server with HTTP Basic Auth)

Do you mean two different apps with http basic auth on the same host? It feels pretty niche, assuming they're both hosting nonpublic dhall sources.

Or a setup which needs to access endpoints of several projects of a GitLab instance

I haven't used self-hosted gitlab, but would you have credentials per-project? I would have thought you'd use the same credentials for multiple projects.

Either way, I'm not convinced we need to support this now. I think it's reasonable to say that per-origin credentials are the overwhelming majority, and it's also how netrc works.

I think it would be possible to support paths in the future by simply including the prefix in the key, i.e. mygitlab.com:443/project1/. Since origins can't have a slash in them, we would be able to distinguish origin-only rules from path-specific rules. So if we do need to support it in the future, the current scheme won't get in the way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean two different apps with http basic auth on the same host? It feels pretty niche, assuming they're both hosting nonpublic dhall sources.

Yes, I guess that's not the most common use case. Still, I am surprised how often I encounter those, especially as a workaround in organisations where keeping things private by default.

I haven't used self-hosted gitlab, but would you have credentials per-project? I would have thought you'd use the same credentials for multiple projects.

If you are a human user, yes, you have one account and authenticate with that. But if you are doing deployments within a larger organisation you normally use some token not connected to your identity. And if you need to access multiple projects throughout the deployment process you are likely using more than one. As an example, there is a long standing bug somewhere in the Docker bugtracker exactly because of this: The Docker daemon can only authenticate per-host, not per endpoint.

But maybe it is ok to add this in another proposal 🤔

(`using headers`), they are merged. If a header is specified in both locations,
the user configuration takes precedence over the inline headers. This allows
inline headers to be used as a fallback for compatibility with previous
versions of dhall or users without custom configuration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was nice if implementations inform the user somehow if there are conflicting headers; Maybe even with an option to turn this into an error. I am a bit worried that this might lead to subtle bugs otherwise.

Copy link
Contributor

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that you'll need to change two of the judgments in order to formalize this.

First, you'd need to change the judgment for resolving ordinary imports. Before the change it is currently this:

parent </> import₀ = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ import₀ ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

… and after the change you would need to "fork" it to create an additional rule for the case where we might obtain headers from an external configuration:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers
parent </> https://authority directory file using headers = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ https://authority directory file ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

parent </> import₀ = import₁
canonicalize(import₁) = child
referentiallySane(parent, child)
Γ(child) = e₀ using responseHeaders  ; Retrieve the expression, possibly
                                     ; binding any response headers to
                                     ; `responseHeaders` if child was a
                                     ; remote import
corsCompliant(parent, child, responseHeaders)  ; If `child` was not a remote
                                               ; import and therefore had no
                                               ; response headers then skip
                                               ; this CORS check
(Δ, parent, child) × Γ₀ ⊢ e₀ ⇒ e₁ ⊢ Γ₁
ε ⊢ e₁ : T
────────────────────────────────────  ; * child ∉ (Δ, parent)
(Δ, parent) × Γ₀ ⊢ import₀ ⇒ e₁ ⊢ Γ₁  ; * import₀ ≠ missing

Then you'd need to make a matching change to the judgment for imports that already specify explicit headers with using. Before the change it is currently:

(Δ, parent) × Γ₀ ⊢ requestHeaders ⇒ resolvedRequestHeaders ⊢ Γ₁
ε ⊢ resolvedRequestHeaders : H
H ∈ { List { mapKey : Text, mapValue : Text }, List { header : Text, value : Text } }
resolvedRequestHeaders ⇥ normalizedRequestHeaders
parent </> https://authority directory file using normalizedRequestHeaders = import
canonicalize(import) = child
referentiallySane(parent, child)
Γ₁(child) = e₀ using responseHeaders
  ; Append normalizedRequestHeaders to the above request's headers
corsCompliant(parent, child, responseHeaders)
(Δ, parent, child) × Γ₁ ⊢ e₀ ⇒ e₁ ⊢ Γ₂
ε ⊢ e₁ : T
──────────────────────────────────────────────────────────────────────────  ; * child ∉ Δ
(Δ, parent) × Γ₀ ⊢ https://authority directory file using requestHeaders ⇒ e₁ ⊢ Γ₂

… and after the change it would be:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers
ε ⊢ headers : List { mapKey : Text, mapValue : Text }
(Δ, parent) × Γ₀ ⊢ requestHeaders ⇒ resolvedRequestHeaders ⊢ Γ₁
ε ⊢ resolvedRequestHeaders : H
H ∈ { List { mapKey : Text, mapValue : Text }, List { header : Text, value : Text } }
headers # resolvedRequestHeaders ⇥ normalizedRequestHeaders
parent </> https://authority directory file using normalizedRequestHeaders = import
canonicalize(import) = child
referentiallySane(parent, child)
Γ₁(child) = e₀ using responseHeaders
  ; Append normalizedRequestHeaders to the above request's headers
corsCompliant(parent, child, responseHeaders)
(Δ, parent, child) × Γ₁ ⊢ e₀ ⇒ e₁ ⊢ Γ₂
ε ⊢ e₁ : T
──────────────────────────────────────────────────────────────────────────  ; * child ∉ Δ
(Δ, parent) × Γ₀ ⊢ https://authority directory file using requestHeaders ⇒ e₁ ⊢ Γ₂

@timbertson
Copy link
Collaborator Author

Thanks for the feedback.

@Gabriel439 I'll be honest, I can vaguely the judgement syntax but had no idea what to change, so thanks for the help :)

@@ -617,6 +617,23 @@ then you retrieve the expression from the canonicalized path and transitively
resolve imports within the retrieved expression:


Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME:~/.config}/dhall/headers.dhall") = headers
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Gabriel439 I changed this from your suggestion of:

Γ(env:DHALL_HEADERS ? "${XDG_CONFIG_HOME}/dhall/headers.dhall" ? ~/.config/dhall/headers.dhall) = headers

But it still doesn't quite look right to me. The semantics I want are that:

  • if DHALL_HEADERS is set, evaluate it
  • otherwise, let BASE be XDG_CONFIG_HOME, or ~/.config if XDG_CONFIG_HOME is not set
  • load $BASE/dhall/headers.dhall, if it exists

The main difference with what's written here is that if DHALL_HEADERS contained invalid dhall, it should fail, rather than falling back to the file. And contrasted with your original suggestion, it will only ever attempt to load at most one file, it won't try XDG_CONFIG_HOME and then fallback to ~/.config if that file doesn't typecheck, for example.

Am I being too pedantic? Is there a way to represent what I mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is binding headers to the whole expression, but we want to extract the origin's key from the headers map, which I don't think this pseudocode is doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I wrote it that way is to try to piggy back on the semantics of the ? judgment as possible, even though it's still playing a bit fast and loose with notation, since we don't actually define the meaning of Γ(a ? b)

However, the reason why I like to reuse the same intuition as ? is because in #1181 we standardized that if if the first file contains invalid Dhall then it won't fall back to the second file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh awesome, I hadn't heard about #1181 but I love it.

I still don't see where it's dereferencing the specific origin headers within the overall configuration? It looks like this pseudocode is loading a single Map Text Text and using it for every request, but we want it to look for the origin's key in the overall Map Text (Map Text Text).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, you'll need to fix that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm a little lost at how to do that. I want to introduce a siteHeaders binding for the value in the map (if any), should I add another function defined elsewhere? e.g.

    getKey(userHeaders, origin) = siteHeaders

Or should I use some sort of pattern matching, like:

headers =[ ..., { mapKey = origin, siteHeaders }, ... ]

I don't really know what's valid in this pseudocode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the real answer is: anything goes for the notation, as long as the reader can figure out what's going on and translate it straightforwardly to code.

I think pattern matching would be fine in this case

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I tried this but got stuck because I couldn't figure out how to denote that origin is an input value while siteHeaders is an output variable to bind:

[ …, { mapKey = origin, mapValue = siteHeaders }, … ]

So for now I've introduced getKey with an inline description. Improvements welcome :)

standard/imports.md Outdated Show resolved Hide resolved
Comment on lines +749 to +753
The toplevel map is known as the "origin header configuration", and the individual maps
which make up the keys of the toplevel map are known as the "per-origin headers".
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmhat this is at attempt to address your "two different maps" comment. Better name ideas welcome, but I agree it's worth giving them names

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's ok 👍 IMHO the names are not too important as long as everybody what we are talking about.

Copy link
Contributor

@Gabriella439 Gabriella439 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing this is missing is tests

The main test cases I'd suggest are:

  • A simple smoke test against https://httpbin.org/headers as Text where DHALL_HEADERS is set
  • A test where DHALL_HEADERS is set and the same headers are also set with the using keyword to test precedence
  • A header map with a malformed (but unused) origin in the header map

standard/imports.md Outdated Show resolved Hide resolved
@timbertson
Copy link
Collaborator Author

I think the main thing this is missing is tests

Sounds good to me! I had a look, but I don't understand how they get run, in order to try it out. Specifically:

  • what should I run in dhall-haskell to try out my new tests against the current implementation?
  • how do I control DHALL_HEADERS in the test suite?

The current instructions include:

"You should make it so that the environment variable DHALL_TEST_VAR is set to
the string "6 * 7".

Which implies there's no support for per-test envs.

Also I looked at httpbin, the headers endpoint isn't going to be great because I can't just extract a single header (if only it had a dhall response type 😉 ), and there are nondeterministic headers in the response:

$ dhall repl
⊢ https://httpbin.org/headers as Text

''
{
  "headers": {
    "Accept-Encoding": "gzip",
    "Host": "httpbin.org",
    "X-Amzn-Trace-Id": "Root=1-60f94b7c-3983f920755332d1505fe2fd"
  }
}
''

We can test a single header value by using the user-agent, as long as implementations allow that header to override any internal default they might be using:

⊢ https://httpbin.org/user-agent using (toMap { `user-agent` = "test" }) as Text

''
{
  "user-agent": "test"
}
''

@Gabriella439
Copy link
Contributor

@timbertson: I think just adding a comment in ./tests/README.md explaining what you want to do plus a comment in the relevant test would be enough. It's really an "anything goes" situation with regards to test instructions 🙂

The way that most people do the test suite is they run the tests and then look more closely at whichever test fails, so if the test for this has a comment reminding the user to set the correct environment variable just for that test then it's probably good enough.

@timbertson
Copy link
Collaborator Author

Thanks. I've got tests running now, but the import tests aren't what I expected. There are both A.dhall and B.dhall files, but dhall-haskell's Test/Import.hs doesn't seem to be actually checking the expected output (if I change one, it still passes). Should this also be using Tasty.Silver.goldenVsAction like some other test suites?

@timbertson
Copy link
Collaborator Author

OK, tests added. Since the environment variable is the test case in these tests, I've introduced the concept of an optional ${testName}ENV.dhall containing the variables you should set, rather than trying to describe it in prose.

I haven't added your suggested case "A header map with a malformed (but unused) origin in the header map", because I wasn't quite sure what you meant. Do you mean a map key which is Text but isn't a hostname:port format? Or do you mean a dhall expression of the wrong type?

@timbertson
Copy link
Collaborator Author

(the above commit is a rebase + squash, since the commit history was messy).

What's the protocol for merging? It sounds like I could merge in 24h (if I'm reading CONTRIBUTING right). But should I wait until dhall-lang/dhall-haskell#2236 is nearly ready to merge? I assume it's inconvenient to have dhall-haskell only partially implement the standard if there are other standards changes you want to release, particularly if 21.0.0 is just about to be released.

@Gabriella439
Copy link
Contributor

@timbertson: You're clear to merge if you want, regardless of whether it has been implemented in dhall-haskell. It's up to you if you want to wait longer (e.g. after the dhall-haskell PR is merged or after the 21.0.0 release is cut).

@timbertson
Copy link
Collaborator Author

Cool, I'll wait until 21.0.0 since there's no real rush 👍

@timbertson
Copy link
Collaborator Author

I got itchy, I thought v21 was going to somewhat more imminent 😉

The implementation in dhall-haskell is also functionally complete now, though it hasn't seen much feedback so it'll probably still require a bit of rework before it's ready to merge.

@timbertson timbertson merged commit 481b26d into dhall-lang:master Aug 10, 2021
@Gabriella439
Copy link
Contributor

@timbertson: No worries! I thought I would have time during my vacation to cut a release, but ended up waiting until I got back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants