Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Last-Modified when ETag is present breaks caching. #122

Closed
raxoft opened this issue May 6, 2022 · 7 comments
Closed

Adding Last-Modified when ETag is present breaks caching. #122

raxoft opened this issue May 6, 2022 · 7 comments

Comments

@raxoft
Copy link
Contributor

raxoft commented May 6, 2022

System Information

  • OS: Ubuntu 18.04.6 LTS
  • Ruby: 2.6.3p62
  • Version: 0.7.45

Description

Adding Last-Modified when ETag is present is a bug which prevents caching from working properly.

Rack App to Reproduce

# config.ru
require 'rack'

App = Proc.new do |env|
   [200,
     {   "Content-Type" => "text/html".freeze,
         "Content-Length" => "16".freeze },
     ['Hello from Rack!'.freeze]  ]
end

use Rack::ConditionalGet
use Rack::ETag, 'public'

run App

Testing code

Simply run the app and point the browser to http://localhost:3000. Open the developer console, make sure "disable cache" is not checked, then start hitting reload and watch the result codes.

Expected behavior

The example application should return 304 every time when reloaded in the browser.

Actual behavior

When run under puma, this works as expected. When run under iodine, it doesn't. The result code is 200 most of the time (the 304 is returned only if you hit the time window with the same time, which IIRC is 2 seconds due to iodine being lazy about updating Last-Modified time).

The reason is that iodine attaches the Date and Last-Modified headers if they are not present in the response headers. But adding the latter is wrong in case the ETag header is already present.

Having both ETag and Last-Modified in the response requires the client to send both If-Modified-Since and If-None-Match in the following request. This in turn requires the Rack::ConditionalGet to validate both validators, and since the Last-Modified is not present at that point, the conditional get must fail.

To quote the relevant sections from RFC2616, section 13.3.4:

If both an entity tag and a Last-Modified value have been provided by the origin server, [HTTP/1.1 clients] SHOULD use both validators in cache-conditional requests. This allows both HTTP/1.0 and HTTP/1.1 caches to respond appropriately.

An HTTP/1.1 origin server, upon receiving a conditional request that includes both a Last-Modified date (e.g., in an If-Modified-Since or If-Unmodified-Since header field) and one or more entity tags (e.g., in an If-Match, If-None-Match, or If-Range header field) as cache validators, MUST NOT return a response status of 304 (Not Modified) unless doing so is consistent with all of the conditional header fields in the request.

The fix is simple, simply don't add the Last-Modified header in case the ETag is already present.

@raxoft
Copy link
Contributor Author

raxoft commented May 6, 2022

Actually, the question is if iodine should add the Last-Modified header to a response it knows nothing about at all, giving the client the option to send the conditional request when the application didn't intend this. But in the ETag case it is definitely wrong.

@boazsegev
Copy link
Owner

Hi @raxoft ,

Thank you for opening this issue.

A quick question about a workaround: will setting the Last-Modified header manually to a correct value not solve the issue? (I will test this later, but Iodine should only add the Last-Modified header if it is missing)...

I will read the RFC again to understand the issue better, as I might not understand the situation as well I would have hoped to (it's been a while since I reviewed the specifications).

You are right to ask if it is helpful for iodine to supply default header values when developers don't supply them. However, changing this will be a breaking change for those who currently rely on the default behavior (unless the default behavior is a bug, in which case change is good)...

...currently the default behavior assumes that dynamic requests shouldn't be cached.

To re-asses the default behavior, may I ask a few questions about your use case?

  1. Why do you use the Rack::ETag and Rack::ConditionalGet middleware? Are you serving static data using Rack (rather than a proxy / using iodine's static server)?

  2. Is a Last-Modified header data unavailable to you (i.e., is the etag header a hash of the resulting dynamic data)?

Thanks again.
Bo.

@boazsegev
Copy link
Owner

I've been reading more and I think you are right about the Last-Modified header... it probably should not be set by default.

I was previously misreading the RFC where it stated that:

HTTP/1.1 origin servers: ... SHOULD send a Last-Modified value if it is feasible to send one...

I was ignoring the rest of the sentence where they write:

unless the risk of a breakdown in semantic transparency that could result from using this date in an If-Modified-Since header would lead to serious problems.

I'd still love to read more about your use-case to better understand how I can improve iodine... but it's probably better to remove the default Last-Modified header anyway.

Thanks again!

@raxoft
Copy link
Contributor Author

raxoft commented May 6, 2022

Hi, thanks for looking at this so quickly.

First to your original question - adding Last-Modified to the response is not practical, because if I would do that, it would disable Rack::ETag from doing anything, since it checks both Last-Modified and ETag headers and doesn't add ETag if one is present. That would essentially mean I would have to implement ETag myself, which kind of defeats the purpose of the middleware.

As for the use case itself - we use the ETag for dynamic responses from our API where the content may remain unchanged, but we can't rely on timestamps which have just one second precision. The data we transfer are large enough that it pays off to provide the ETag to avoid retransmitting the data the client already has in its cache.

Overall, I agree with the conlusion you came to - I think that adding the Last-Modified header to dynamic responses in general can cause troubles. In my case it prevents caching, but without ETag, it could actually cause false positive hits in hierarchy of caches. Let's the application provide this header if it wants (for example, sinatra has nice support for both Last-Modified and ETag headers which exit early IIRC in case of a match) and not mess with it at all at iodine's level.

Thanks.

boazsegev added a commit that referenced this issue May 6, 2022
@boazsegev
Copy link
Owner

Hi @raxoft ,

I pushed a fix and was wondering if you wanted to test it or request any more changes before I publish a release and close the issue...?

Kindly,
Bo.

P.S.

As for the use case itself - we use the ETag for dynamic responses from our API where the content may remain unchanged...

Would it make sense for you if Iodine itself tested the response's etag against the request's if-none-match or implemented some sort of callback to that effect (in effect, pulling the etag middleware logic into the server / C layer)?

I am starting to think that for version 0.8.x automatic recognition of cached responses could improve server performance by minimizing memory usage and avoiding system calls (write) for responses that were cached by the client.

Adding a callback could drop requests before the response is generated, freeing up more user resources.

I didn't have this use case for dynamic responses, but it makes sense to me that others might.

@raxoft
Copy link
Contributor Author

raxoft commented May 7, 2022

Thanks for quick fix. Tested it right now, works great as expected.

BTW, seeing the diff, in the hex_val macro, you can do the (c) | 32 trick already in the condition, saving you two tests and couple jumps.

Regarding moving the conditional get functionality into iodine itself... Hmm, it's kind of a tricky question. If you moved the functionality of both middlewares to iodine, it might help speeding up the etag generation. OTOH, it might complicate things so much that it will just cause extra overhead for those who don't need it.

Other than that, I don't think it's that useful (that is, if you were to move just the conditional get part alone). The biggest savings when using these headers come from not having to generate the response in the first place when possible, and that's already possible to do in the application. For example, sinatra has the last_modified and etag helpers which not only set the response headers, but halt with 304 immediately in case of a match, allowing you to skip bulk of the response generation. Of course, it assumes that you have to have some reliable value available which you can use for these validators. For example, for generated but static pages, you can use the app start time with the last_modified helper. But for truly dynamic responses it is often too complicated to generate something like that ahead of the actual content generation, thus just generating the content and computing the etag from that is usually the way to go.

Also, if you were to move the etag generation to iodine, the configurability of the whole stack would become tricky. I mean, with middleware you can easily add etag generation just for some routes in the stack, but now it would either be all or nothing, or perhaps require some special headers meant only for iodine to control that (on top of the Cache-Control headers), which sounds too complicated.

In either case, in my opinion it would be a completely new feature, so I think you can close this issue for now and add this feature to your 'think-about-it' list instead.

Thanks.

@boazsegev
Copy link
Owner

Hi @raxoft ,

Thanks for the thought out response and the tips.

I just released the update.

As for the new feature, I will think about it. Yes, it's much better if the app performs its own tests in advance, but I think the server could save some bandwidth and memory by testing if response.headers["etag"] == request.headers["etag"] (even if the logic is more than that simple conditional)... so far I was simply saving cached data to disk and allowing the static server to deal with the caching validation (this had the additional benefits related to how much of the data was live in memory).

Anyway, thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants