Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode: Enhancements to content negotiation, serving precompressed content #4045

Merged
merged 10 commits into from
Mar 30, 2021

Conversation

ueffel
Copy link
Contributor

@ueffel ueffel commented Mar 2, 2021

This pull request partly addresses #2665

It adds:

  • the ability to configure the minimum_length of a response for encoding to happen via caddyfile
  • the ability to configure encoding preference on the server-side (prefer setting for encode)
  • a new interface Precompress, which modules can implement to provide the file extension suffix for the file_server to search for precompressed files

Differences to the previous iteration of the changes (master...ueffel:file_server-precompressed):
ueffel/caddy@file_server-precompressed...ueffel:file_server-precompressed_v2

  • module namespace for precompress modules changed to http.precompressed.*
  • new precompress modules for gzip, zstd, br
  • caddyfile directive and interface name changed to Precompressed

Edit:
Instead of introducing a new dependency to "github.com/gobwas/glob" now the content-types matching uses filepath.Match. Maybe the "validation" by matching with an empty string does not produce a ErrBadPattern for every bad patttern 🤔

@CLAassistant
Copy link

CLAassistant commented Mar 2, 2021

CLA assistant check
All committers have signed the CLA.

@ueffel ueffel force-pushed the file_server-precompressed_v2 branch from 3379bb5 to 7bc0930 Compare March 2, 2021 19:50
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks so much for working on this.

This is my first pass -- see what you think!

modules/caddyhttp/encode/br/brotli_precompressed.go Outdated Show resolved Hide resolved

// Only encode responses that are at least this many bytes long.
MinLength int `json:"minimum_length,omitempty"`

// Only encode responses that match at least one of the content-types.
Types []string `json:"types,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurred to me, would there be any value in letting the user configure a ResponseMatcher here? This could let them choose based on other headers (and status code) too, but I'm not sure how useful that is.

(To be clear, what you have is fine, but before we introduce a non-breakable change I want to get your thoughts on this. If the ResponseMatcher approach is feasible, that might be better since it gives more flexibility.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know a ResponseMatcher exists. I will have a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4021 is something you could reference re response matchers. Not sure if that's quite the direction we want to go, but it could be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

	encode zstd gzip {
		minimum_length 256
		prefer zstd gzip
		@text {
			status 2xx 4xx
			header Content-Type text/*
			header Content-Type application/json
			header Content-Type application/javascript
			header Content-Type application/xhtml+xml
			header Content-Type application/atom+xml
			header Content-Type application/rss+xml
			header Content-Type image/svg+xml
		}
		match @text
	}

something like this is possible now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm thinking whether it would work better like this, in this case, since it only makes sense to define match once for encode, unlike reverse_proxy's handle_response which can appear more than once:

encode [<encoders...>] {
	match [<matcher> [<args...]] {
		<matcher> <args...>
	}
}

Basically skip the named matcher stuff and directly read the args on the same line and/or block for the matcher. This shouldn't change the code much, and should still be reusable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Looks good 😄

Small thing, I think it might be better to make parseNamedResponseMatcher return the response matcher rather than shoving it into responseMatchers and read it back out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't touched parseNamedResponseMatcher intentionally. If it will be reusable in that state from your pull request, the changes required to use the hopefully soon merged implementation will be minimal.
Also it ensures right now, that match is only defined once in a encode block.

if len(enc.Types) == 0 {
enc.Types = []string{"*"} // backwards compatibility

// sane default for text-based content ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like sane defaults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the implementation via ResponseMatcher the default is:

header Content-Type text/*
header Content-Type application/json
header Content-Type application/javascript
header Content-Type application/xhtml+xml
header Content-Type application/atom+xml
header Content-Type application/rss+xml
header Content-Type image/svg+xml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the default to:

header Content-Type text/*
header Content-Type application/json*
header Content-Type application/javascript*
header Content-Type application/xhtml+xml*
header Content-Type application/atom+xml*
header Content-Type application/rss+xml*
header Content-Type image/svg+xml*

to account for any ;charset=xxx that may be appended to the header.

Copy link
Contributor

@xdevs23 xdevs23 Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any MIME types that could match these globs except for the ;charset=... addition?
Could we use a regexp instead (or is it too much of a performance hit, maybe?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regexps aren't supported by the ResponseMatcher type, yet. It would be nice the have a header_regexp directive from RequestMatchers as well, but I think this would be out of the scope of this PR. Maybe an additional one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should have a header_regexp matcher. If you want to do a follow-up with that, that'd be great 👍
I am a bit worried that it would be detrimental to performance for the encode defaults though, so a bit of benchmarking may be beneficial to see if it makes any appreciable difference.

modules/caddyhttp/encode/br/brotli_precompressed.go Outdated Show resolved Hide resolved
modules/caddyhttp/fileserver/caddyfile.go Outdated Show resolved Hide resolved
modules/caddyhttp/fileserver/staticfiles.go Outdated Show resolved Hide resolved
PrecompressRaw caddy.ModuleMap `json:"encodings,omitempty" caddy:"namespace=http.precompressed"`

// If the client has no strong preference, choose these encodings in order.
PrecompressOrder []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what the default is.

modules/caddyhttp/fileserver/staticfiles.go Outdated Show resolved Hide resolved
modules/caddyhttp/fileserver/staticfiles.go Outdated Show resolved Hide resolved
@mholt mholt added the under review 🧐 Review is pending before merging label Mar 2, 2021
@mholt mholt added this to the 2.x milestone Mar 2, 2021
@ueffel
Copy link
Contributor Author

ueffel commented Mar 2, 2021

Would you like changes to be in additional commit or should I rebase-edit changes into the existing commits?

@francislavoie
Copy link
Member

We squash-merge at the end, so do whatever works best for you for now 👍

@ueffel ueffel force-pushed the file_server-precompressed_v2 branch from 7bc0930 to b1fdbf0 Compare March 3, 2021 19:27
@ueffel ueffel force-pushed the file_server-precompressed_v2 branch from 8673868 to 67ab2d0 Compare March 7, 2021 16:26
@francislavoie
Copy link
Member

francislavoie commented Mar 7, 2021

Could you add an adapt test as well for this? See https://github.com/caddyserver/caddy/tree/master/caddytest/integration/caddyfile_adapt, both for encode and your file_server changes. Just for the coverage.

Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this is looking so good!! Nice work.

Just a couple nits. But I think we're good to go. Caddyfile tests that Francis mentioned can come either in this PR or a later one. Also, if you want to make sure godoc comments end with periods (or other proper punctuation), that'd be good for legibility when they are read on the docs website.

Almost there!


// Only encode responses that are at least this many bytes long.
MinLength int `json:"minimum_length,omitempty"`

Matcher *caddyhttp.ResponseMatcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a JSON tag 😊

@@ -332,6 +392,13 @@ type Encoding interface {
NewEncoder() Encoder
}

// Precompressed is a type which returns filename suffix of precomressed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Precompressed is a type which returns filename suffix of precomressed
// Precompressed is a type which returns filename suffix of precompressed

@ueffel ueffel force-pushed the file_server-precompressed_v2 branch from 9cd73eb to d8154e7 Compare March 13, 2021 14:50
@ueffel
Copy link
Contributor Author

ueffel commented Mar 13, 2021

Could you add an adapt test as well for this? See https://github.com/caddyserver/caddy/tree/master/caddytest/integration/caddyfile_adapt, both for encode and your file_server changes. Just for the coverage.

The file_server adapt test is giving me a headache. It automatically adds the "Caddyfile" with a OS-specific path separator as hidden file. So the test fails either on unix/mac or on windows, because in the adapt-txt file can only be one specific path separator. Should I use a placeholder for the path separator and added a replacement step in the step before the actual comparison is done?

@francislavoie
Copy link
Member

francislavoie commented Mar 13, 2021

Right... Dangit. Yeah, I think a string replacement for ".\\Caddyfile" to "./Caddyfile" should be enough, in caddyfile_adapt_test.go just before calling CompareAdapt. We already replace windows newlines with unix ones, so there's precedent for doing that.

Edit: Sorry - I think the replacement might need to happen inside CompareAdapt cause what you actually need to replace is the string in the adapt output, cause that's the variable. It's a bit less clean to do it in there, but 🤷‍♂️ good enough I think.

* added integration tests for new caddyfile directives
* improved various doc strings (punctuation and typos)
* added json tag for file_server precompress order and encode matcher
@ueffel ueffel force-pushed the file_server-precompressed_v2 branch from d8154e7 to 10a4f2c Compare March 13, 2021 16:11
@ueffel
Copy link
Contributor Author

ueffel commented Mar 13, 2021

Edit: Sorry - I think the replacement might need to happen inside CompareAdapt cause what you actually need to replace is the string in the adapt output, cause that's the variable. It's a bit less clean to do it in there, but 🤷‍♂️ good enough I think.

I just did it here, if that's ok? It seems to work, tests are not failing anymore.

@francislavoie
Copy link
Member

Yeah that seems okay to me 👍

@ueffel
Copy link
Contributor Author

ueffel commented Mar 19, 2021

So, I would consider this PR ready-to-merge or at least ready-for-final-review.

francislavoie
francislavoie previously approved these changes Mar 19, 2021
Copy link
Member

@francislavoie francislavoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent work! Thank you!

If you find the time, it would be great if you could update the docs for https://github.com/caddyserver/website/blob/master/src/docs/markdown/caddyfile/directives/encode.md and https://github.com/caddyserver/website/blob/master/src/docs/markdown/caddyfile/directives/file_server.md to cover the changes 📝

@francislavoie francislavoie changed the title Enhancements to content negotiation and encoding encode: Enhancements to content negotiation, serving precompressed content Mar 20, 2021
francislavoie added a commit to francislavoie/caddy that referenced this pull request Mar 20, 2021
Copy link
Member

@mholt mholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this looks good -- let's give it a try. Thanks for your persistence in helping to get this ready! Please contribute again sometime.

@mholt mholt merged commit f35a7fa into caddyserver:master Mar 30, 2021
@mholt mholt removed the under review 🧐 Review is pending before merging label Mar 30, 2021
@mholt mholt modified the milestones: 2.x, v2.4.0 Mar 30, 2021
@mholt mholt mentioned this pull request Jun 7, 2021
@ueffel ueffel deleted the file_server-precompressed_v2 branch March 12, 2022 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants