Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ETag #191

Closed
mholt opened this issue Jul 18, 2015 · 11 comments · Fixed by #707
Closed

Implement ETag #191

mholt opened this issue Jul 18, 2015 · 11 comments · Fixed by #707

Comments

@mholt
Copy link
Member

mholt commented Jul 18, 2015

ETags aren't my favorite caching mechanism because all the heavy lifting is done by the server, but it might be worth looking into doing them anyway. Do we do strong ETags or weak ones? Should this be customizable from the Caddyfile? What are the default settings? How do we cache the computed ETags?

The Go standard library supports ETag and If-Modified-Since headers, but we need to do our part to take advantage of them.

@mholt mholt added the discussion 💬 The right solution needs to be found label Jul 18, 2015
@mholt mholt mentioned this issue Jul 18, 2015
@nudzo
Copy link

nudzo commented Jul 26, 2015

Let's get inspired by nginx having its ETags looking like: "554b73dc-6f0d" 1st part is last modification timestamp and 2nd is file size. Both you have just when opening file, no computation needed, so caching this makes no sense.

@mholt
Copy link
Member Author

mholt commented Jul 26, 2015

That's a good idea. I'm down for simple and effective. I need to look into this more, but why is the modified date and file size both used? Are both needed?

@nudzo
Copy link

nudzo commented Jul 26, 2015

In fact last_mod should change anytime you write something... but that's in seconds... in one second you can write few hundreds megs. Found it here: http://serverfault.com/questions/690341/algorithm-behind-nginx-etag-generation

@mholt
Copy link
Member Author

mholt commented Jul 26, 2015

That makes sense. I need to figure out how to best integrate this with a "cache" or "caching" directive (which one?)

@nudzo
Copy link

nudzo commented Jul 26, 2015

As far as I remember ETag is not conditionally send on caching headers send by client. It can be send together with modified headers. Primary idea in RFCs was to send MD5 of content, but that's heavy to compute especially for bigger files. Btw nginx generate this just for static content and when you look at docs, it is just on/off tag in context of http or server or location.

@mholt
Copy link
Member Author

mholt commented Jul 26, 2015

So maybe it's a good idea to just send ETag from within the file server by default. That way, the client may use it but it doesn't have to. I'll need to investigate more if other caching related headers should be used similarly.

@nudzo
Copy link

nudzo commented Jul 26, 2015

It's quite tricky. Client (browser) would work OK, but there would be some Squiq (whatever proxy) in between and there can be confusion. It is worth to be able switch on/off in config and may be best by URI. ETag comes usually with something like expires in nginx, that add/append_to header Cache-Control: max-age=86400 (1d for example). If you are doing site deploy once a day, then tell explicitly you static content can change at most once a day. Then there are public and private modifiers, that affects proxies behaviour. Hence the name private is evaluated in browser only public also on proxies.

@coolaj86
Copy link
Contributor

coolaj86 commented Nov 5, 2015

I would suggest a two-part E-TAG using the stat object and running a fast hash (such as BLAKE2) against it.

Part 1:

  • mtime - modified at
  • ctime - created at
  • mode - permissions
  • size - number of bytes in the file

Part 2 (optional):

Optimistic Approach: If it is suspected that the file has changed (because the values above don't match), read the file disk and hash it. You could use a sparse hash rather than a complete hash to save disk access / cpu usage.

Pessimistic Approach: Hash the whole file every time.

Create the etag

etag := 'W/"' + part1[0:8] + '-' + part2[0:8] + '"'

Here are the docs

Unfortunately go doesn't return ctime ... that's annoying.

I'm optimistic that Name(), Size(), Mode(), and ModTime() are good enough. Since this isn't an issue of cryptographic security (or at least I think if you've got someone on your server fudging bad files to cause cache validation errors, you've got WAY bigger problems), it's a tradeoff between being perfectly accurate and accurate in most cases (reverting a change to a file which causes a different date, and usually a different file size).

@coolaj86
Copy link
Contributor

coolaj86 commented Nov 5, 2015

Point me at the right file and there's a strong likelihood I can get you a pull request.

@mark-kubacki
Copy link

Hashing files will be a no-go on installations with lots of, or some very large files: It will slow down the start considerably.

@weingart
Copy link

I've got the start of something for this. Hope to submit before the weekend is out. :)

@mholt mholt added the in progress 🏃‍♂️ Being actively worked on label Mar 21, 2016
@jupiter jupiter mentioned this issue Apr 2, 2016
@mholt mholt closed this as completed in #707 Apr 3, 2016
@mholt mholt removed in progress 🏃‍♂️ Being actively worked on discussion 💬 The right solution needs to be found labels Apr 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants