Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "precompressed" files without "base" files #5116

Open
TobiX opened this issue Oct 3, 2022 · 21 comments · May be fixed by #6284
Open

Support "precompressed" files without "base" files #5116

TobiX opened this issue Oct 3, 2022 · 21 comments · May be fixed by #6284
Labels
feature ⚙️ New feature or request good first issue 🐤 Good for newcomers help wanted 🆘 Extra attention is needed

Comments

@TobiX
Copy link
Contributor

TobiX commented Oct 3, 2022

As discussed in https://caddy.community/t/precompressed-files-without-base-files/17330, I was thinking about an extension to the precompressed feature: It would be nice to only have compressed files on disk and not have to keep the uncompressed files around...

Pro: Most clients support compression nowadays, so nothing changes for most clients
Contra: If some client not supporting compression comes around, Caddy need to decompress the file before delivering it to the client

@francislavoie suggested in the community thread that supporting this feature might impact performance due to the additional stat calls, so we probably need to keep that aspect in mind, too...

PS: While thinking about it a bit more, I could probably implement it as a custom file system which "synthesizes" the uncompressed files 😆

@francislavoie francislavoie added feature ⚙️ New feature or request good first issue 🐤 Good for newcomers labels Oct 4, 2022
@francislavoie
Copy link
Member

I'm thinking it could be a new boolean option called precompressed_allow_no_base for a lack of a better name.

@mholt
Copy link
Member

mholt commented Oct 4, 2022

How about prefer_precompressed?

@shade34321
Copy link

I'm not really sure what all is needed for this and might need help. But I wouldn't mind taking a stab at this.

@francislavoie
Copy link
Member

@shade34321 take a look at the fileserver code and at the docs. Look for precompressed in the codebase for the key places to look at.

@shade34321
Copy link

I've taken a look and I think this is going to take me a bit longer to understand before making a fix. So if someone else wants to work on it they can but I'll keep chugging along until I see a fix. Just wanted to update everybody.

@mholt mholt added the help wanted 🆘 Extra attention is needed label Oct 28, 2022
@titanventura
Copy link

Would like to take this up !

@mholt
Copy link
Member

mholt commented Nov 29, 2022

Sounds good! Let us know if you have any questions.

@titanventura
Copy link

I have a question. What approach does caddy currently take, for clients that do not support compression ?

@titanventura
Copy link

would be better if i can connect on call with a person who can explain about this. I'm not sure if this is common in open-source communities. But i feel it is essential.

@francislavoie
Copy link
Member

francislavoie commented Dec 26, 2022

We follow the Accept-Encoding header. If a client doesn't declare it can handle compressed content, then Caddy will not compress it.

For file_server's precompressed files, that's why it's useful to have an uncompressed copy on file (most of the time).

This feature request would require decompressing the content before sending it if the client doesn't support compression, if the "no base" option is used.

@ueffel
Copy link
Contributor

ueffel commented Dec 26, 2022

I have a question. What approach does caddy currently take, for clients that do not support compression ?

I suggest you just read the code and maybe run through it with a debugger, starting here:

func (fsrv *FileServer) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {

More specifically look at this part:

var file fs.File
var etag string
// check for precompressed files
for _, ae := range encode.AcceptedEncodings(r, fsrv.PrecompressedOrder) {
precompress, ok := fsrv.precompressors[ae]
if !ok {
continue
}
compressedFilename := filename + precompress.Suffix()
compressedInfo, err := fs.Stat(fsrv.fileSystem, compressedFilename)
if err != nil || compressedInfo.IsDir() {
fsrv.logger.Debug("precompressed file not accessible", zap.String("filename", compressedFilename), zap.Error(err))
continue
}
fsrv.logger.Debug("opening compressed sidecar file", zap.String("filename", compressedFilename), zap.Error(err))
file, err = fsrv.openFile(compressedFilename, w)
if err != nil {
fsrv.logger.Warn("opening precompressed file failed", zap.String("filename", compressedFilename), zap.Error(err))
if caddyErr, ok := err.(caddyhttp.HandlerError); ok && caddyErr.StatusCode == http.StatusServiceUnavailable {
return err
}
file = nil
continue
}
defer file.Close()
w.Header().Set("Content-Encoding", ae)
w.Header().Del("Accept-Ranges")
w.Header().Add("Vary", "Accept-Encoding")
// don't assign info = compressedInfo because sidecars are kind
// of transparent; however we do need to set the Etag:
// https://caddy.community/t/gzipped-sidecar-file-wrong-same-etag/16793
etag = calculateEtag(compressedInfo)
break
}

The call to encode.AcceptedEncodings normally returns an ordered list of accepted encodings in the request (Accept-Encoding) and then checks if they are present in the file system. Since you ask for the case of no compression, this loop will be skipped entirely. That means the variable file will still be nil, which makes you enter this case and open the actual file to serve:

// no precompressed file found, use the actual file
if file == nil {
fsrv.logger.Debug("opening file", zap.String("filename", filename))
// open the file
file, err = fsrv.openFile(filename, w)
if err != nil {
if herr, ok := err.(caddyhttp.HandlerError); ok &&
herr.StatusCode == http.StatusNotFound {
return fsrv.notFound(w, r, next)
}
return err // error is already structured
}
defer file.Close()
etag = calculateEtag(info)
}

@titanventura
Copy link

Sure. Thank you all for adding more information ! Will look at it.

@titanventura
Copy link

But i don't understand why we need a new boolean option called prefer_precompressed for this. If caddy finds an sidecar/precompressed file that does not have it's corresponding uncompressed file, we can decompress it on the fly (for clients that do not support compression) without the need for a boolean option right ?

@francislavoie
Copy link
Member

@titanventura did you read the discussion in the community forum thread? The reasoning is covered there. And reading the code, it should be clear why we do it that way currently.

@titanventura
Copy link

I feel this feature can be implemented without prefer_precompressed option. Because, we can implement it in a way so that, if the uncompressed file is NOT present and the client does NOT accept the encoding, we can decide with decompressing the file on the fly since we have no other option to serve the client right ? Please correct me if i'm wrong, im a newbie to the open source realm. I would love to get on call to discuss things in more detail (if that works with you).

@francislavoie
Copy link
Member

francislavoie commented Dec 27, 2022

As I wrote before, currently the code is set up such that if the original file does not exist (checked first), a 404 is immediately returned and no additional system calls are made. This is optimal to "fail fast" and assumes that no compressed version of the file will exist.

If we change the code (without a new option) then additional system calls would be made on all requests to missing files which can be seen as a performance regression, especially if someone is making tons of requests to brute force trying to find some files (but admittedly in that case you should probably have some rate limiting in place to stop that from happening).

My argument is that most users of precompressed will probably have both versions on disk, as is commonly done when using JS build tooling to precompress assets. So I think that performance regression to allow for not having the uncompressed copy should be opt-in.

@titanventura
Copy link

Ahh I get it now. This has made things much more clear. Thank you so much @francislavoie . Will try and work on it now that i've understood the issue !

@singhalkarun
Copy link
Contributor

singhalkarun commented Aug 14, 2023

Hi @francislavoie @mholt I would love to spend some time solving this. Sharing my understanding below of the above discussion before I start to work on it,

Issue: If there's only a compressed copy of file on server and and the client doesn't accept the compressed file, it returns 404.

Solution:

  1. We can allow decompressing of the compressed files on the fly if the precompressed file doesn't exist and the client doesn't accept the compressed files.

    Issue: It can lead to performance regression issues.

  2. We can make it as an opt-in feature so that the server administrator can specify that he wants to decompress and serve files maybe for specific routes (if the precompressed file doesn't exist) and keep returning 404 if not opted for assuming that the prescompressed file must exist.

Please correct me if I have some flaws in understanding.

@francislavoie
Copy link
Member

Sounds correct @singhalkarun.

@singhalkarun
Copy link
Contributor

Sounds correct @singhalkarun.

Thanks for confirming @francislavoie . I am starting up on this. I will keep you posted.

@3v0k4
Copy link

3v0k4 commented Apr 24, 2024

I'd love to get this one done. I already started working on it.

@3v0k4 3v0k4 linked a pull request Apr 30, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request good first issue 🐤 Good for newcomers help wanted 🆘 Extra attention is needed
Projects
None yet
8 participants