Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New response status code vs. Cache-Control: no-cache, patch #16

Open
toomim opened this issue Oct 30, 2019 · 15 comments
Open

New response status code vs. Cache-Control: no-cache, patch #16

toomim opened this issue Oct 30, 2019 · 15 comments

Comments

@toomim
Copy link
Member

toomim commented Oct 30, 2019

In order to prevent existing caches from trying to cache a patch, the current spec requires a patch to specify Cache-Control: no-cache, patch.

However, I just discovered that the same problem was solved by HTTP Range Requests by inventing a new status code:

   Partial responses are indicated by a distinct status code to not be
   mistaken for full responses by caches that might not implement the
   feature.

Should we switch to using a new status code for patch responses as well? This seems nicer than saying "Cache-Control: no-cache, patch", which implies, in rather circuitous fashion "Don't cache this. Oh, but you can cache this as a patch."

@mitar
Copy link
Member

mitar commented Oct 30, 2019

Why shouldn't braid patches be cached? They are cached given the version, no? We should just make ETag == Version.

@toomim
Copy link
Member Author

toomim commented Oct 30, 2019

They should only be cached if the proxy understands the Braid protocol.

Imagine a braid-aware client requests a HTML page, and the request goes through a CDN that doesn't support braid. Then the origin server will send a patch, or stream of patches, through the CDN.

But if a non-braid client then requests that page via the CDN, the CDN could respond with a patch instead of the full resource. The non-braid client would then try to display a patch instead of HTML.

I'm not entirely sure that this happens with CDNs, but it seems likely, and mirrors the way the Range Requests spec has been written.

@toomim
Copy link
Member Author

toomim commented Oct 30, 2019

Also, as for ETags, they unfortunately do not == Version, because they depend on Content-Encoding, not just the response body. In other words, if the network channel gzips the response body, it'll have a different ETag, even though it's the same Version.

@mitar
Copy link
Member

mitar commented Oct 30, 2019

Interesting fact about ETags, but not sure if it matters. I am realizing it does not really help with the fact that a new client still gets wrong contents, in your scenario. But I am not so sure if you are right about the scenario:

But if a non-braid client then requests that page via the CDN, the CDN could respond with a patch instead of the full resource. The non-braid client would then try to display a patch instead of HTML.

I do not thin this is possible, because content-type of the patch and non-patch is different.

So if braid-aware client wants a patch at version X, it should issue GET with version header and content type set to some patch format. If it wants state at X, it should issue GET with version header and content type set to the format in which it wants the state (JSON or HTML). And braid-aware server will provide in the list of content types it accepts both the non-patch and patch content types.

And all of those responses can be cached, because they are immutable. The only thing which is not immutable is asking for a resource or a patch without a version (in which case you get the latest state/patch).

So I do not see a problem here?

@toomim
Copy link
Member Author

toomim commented Oct 30, 2019

content-type of the patch and non-patch is different

Actually, in the current spec—as in range requests—the Content-Type remains intact when you switch to a patch or a Range Response.

Example (Range Request):

     HTTP/1.1 206 Partial Content
     Date: Wed, 15 Nov 1995 06:25:24 GMT
     Last-Modified: Wed, 15 Nov 1995 04:58:08 GMT
     Content-Range: bytes 21010-47021/47022
     Content-Length: 26012
     Content-Type: image/gif

     ... 26012 bytes of partial image data ...

Example (current Braid spec):

  Request:
      GET /animated-braid.gif
      Subscribe

  Response:
      OK 200
      Subscribe
      Cache-Control: no-cache, patch
      Patch-Type: braid(bytes, sync9(array))

      Content-Type: image/gif                             -- Frame 1
      Patch-Type: braid(bytes, sync9(array))
      Content-Length: 1239
      
      [100:200] = <binary data>

      Content-Type: image/gif                             -- Frame 2
      Patch-Type: braid(bytes, sync9(array))
      Content-Length: 62638

      [348:887] = <binary data>

I remember thinking about this fairly hard before making this design decision, but don't remember OTOH the specific use-cases that requires it. If I thought for a bit they might come back to mind...

In the current spec, the thing that distinguishes a request for patches vs. full content is the presence of any of these headers:

  • Subscribe
  • Version
  • Parents

I'm not sure if existing CDNs would distinguish requests containing these headers, or not, in caching. I bet this is spelled out somewhere in the HTTP specs...

@mitar
Copy link
Member

mitar commented Oct 30, 2019

Oh, I missed that. That is a different design decision that what I was thinking/expecting, so maybe I skipped over that.

Hm, I would love to understand why you would prefer this. I mean, in my view :

  • I can request a content type to get state.
  • We have default patch format, braid-patch, we can use for all patches as their content type. The braid-patch might then apply or not apply to given content type based on its parser type.

Those two are different to me.

I'm not sure if existing CDNs would distinguish requests containing these headers, or not, in caching. I bet this is spelled out somewhere in the HTTP specs...

There might be some caching option to tell which headers to use for caching?

@toomim
Copy link
Member Author

toomim commented Oct 31, 2019

I'm not sure what you mean by this:

Hm, I would love to understand why you would prefer this. I mean, in my view :

  • I can request a content type to get state.
  • We have default patch format, braid-patch, we can use for all patches as their content type. The braid-patch might then apply or not apply to given content type based on its parser type.

Those two are different to me.

@mitar
Copy link
Member

mitar commented Oct 31, 2019

I am continuing the discussion from above. That we should use two different content types for state vs. patch. And that for patch we have a default content type we can use: the braid-patch one. So not sure why to have same content type and have this issue you are describing here?

@mitar
Copy link
Member

mitar commented Oct 31, 2019

After more investigation here, you are right, when using range-requests you use same content-type as the full content is. And in that case partial content status code should be used to fix the caching problem you are describing.

But when fetching a particular version, you would not be using really range requests. I mean, you can, especially with our new range units, but that would still be just fetching a subset of the state. So GET + Version header + Range header would return you a subset of the state at given version. That is neat. But that is not the patch which made the previous version into this version.

To that get that patch, I would suggest you issue a request where in Accept you ask for patch content type. Then you get patch which brought you to that version. That means you could do:

  • GET + Accept: text/html to get HTML contents of the document.
  • GET + Accept: application/json to get JSON of the state used to render that document.
  • GET + Accept: application/json+patch to get a range patch of how that JSON state came up to be. (I have yet to add +patch suffix to the PR I just made.)

And there will be no caching issues with all of that.

@mitar
Copy link
Member

mitar commented Nov 4, 2019

I think we can use Vary header and require Braid HTTP client to always list Version and Parents there?

@alficles
Copy link

alficles commented Jan 25, 2020

I'm not sure how the cache-control solution solves the problem here. Consider: https://tools.ietf.org/html/rfc7234#section-5.2.3

   The Cache-Control header field can be extended through the use of one
   or more cache-extension tokens, each with an optional value.  A cache
   MUST ignore unrecognized cache directives.

That means any cache that has no idea the patch CC exists MUST ignore it. That will allow the proxy to cache the response and return it to an unsuspecting client.

@toomim
Copy link
Member Author

toomim commented Jan 26, 2020

@alficles, that's why the proposed Cache-Control header value is Cache-Control: no-cache, patch.

A cache that doesn't understand patch, but does understand no-cache, will only obey the no-cache part, which is fine.

A cache that does understand patch will know that it can override the no-cache directive because it can cache it as a patch.

However, we seem to have moved on to simply using a 209 Subscription response status code instead. That's what's in the current spec, although it looks like we haven't updated this Github issue yet with the resolution.

@alficles
Copy link

Ugh. This is why I shouldn't make comments on Friday afternoon. Sorry about that. no-cache, patch works just fine.

I've got some other concerns about the 209 code, though. Adoption is going to be a real bear, because some proxies have a tendency to block what they don't understand. In any case, I think whatever Connection-oriented solution we find for #62 will greatly inform this decision here.

@toomim
Copy link
Member Author

toomim commented Jan 27, 2020

Mmm, interesting point about proxies blocking what they don't understand! Again, I'm glad to have your expertise engaged to point out issues like this.

This sounds like a question we might be able to test empirically. On the one hand, Cache-Control: no-cache might not be supported globally (since it was introduced at some point after HTTP v0.9 if I recall correctly), and on the other, a 209 response code might be dropped entirely.

Running some tests on a variety of real networks should help us learn what to do about this.

@toomim
Copy link
Member Author

toomim commented Dec 31, 2020

Unfortunately, I've found evidence that the status code 209 trick fails to prevent Chrome from caching.

Specifically, I ran into this issue: https://stackoverflow.com/questions/27513994/chrome-stalls-when-making-multiple-requests-to-same-resource

To reproduce this, I did this:

  1. Sent a GET Subscribe request to a server. Server responds with 209.
  2. Sent a PUT request to the server.

The PUT request gets delayed by 20 seconds, because Chrome thinks they are operating on the same cache, and locks its cache for 20 seconds.

This issue goes away if you add "Cache-Control no-cache, no-transform". This suggests that some implementations might need the Cache-Control headers after all. We should test other browsers and proxies before making a final call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants