-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove chunked encoding from patches #75
Conversation
LGTM. Finally, per HTTP/1.1 spec, GET responses that don't include Content-Length (like in this draft) are assumed to be delineated by termination of the stream. Until that happens, proxies are free to withhold the output, buffering it up until TCP EOF that "never" comes here - potentially lagging the subscription stream a lot. Conversely they never do that with TE: Chunked, as they hand off chunk-by-chunk to the client. Meaning top-level chunking should be probably a recommendation. Of note is that the top-level chunking can split the inner version/patches at an arbitrary boundary (contrary to what my mail said) and clients must not depend on chunk boundaries to denote version chunks. This is due the way how it interacts with gzip encoding (gzip first, then chunk the output stream). Meaning a client receiving the feed has to demux the top level chunking first, and then split out version/patches blocks by keeping parser state. |
@@ -302,14 +302,12 @@ Table of Contents | |||
|
|||
|
|||
In order to distinguish each patch within a Version, we need to know | |||
the length of the patch. To know the length of the patch, each patch | |||
must include one of the following headers: | |||
the length of each patch. To know the length of each past, it must |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
past
typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you. Fixed.
@katuma Thanks for this additional insight:
If this is true, we should add this recommendation to the spec, but my instinct is to test it first, so that we know that it is actually necessary. Presumably this issue would also affect SSE, however I just checked the SSE Specification and see no mention of Transfer-Encoding: chunked. Any idea why?
Do you think that this needs to be clarified in the spec? I estimate the spec is already unambiguous when it says:
...and:
Although perhaps we could go ahead and uppercase MUST, to make it clear that it's required. I'll do that now. |
@toomim I'm not really sure spec wise because the issue of unknown-entity length blocking is underspecified in there across the board. SSE itself vaguely discourages Chunked-Encoding, perhaps too due to the way it would split events on unfortunate boundaries potentially (fe packing multiple events in a single chunk): Authors are also cautioned that HTTP chunking can have unexpected negative effects on the reliability of this protocol. Where possible, chunking should be disabled for serving event streams unless the rate of messages is high enough for this not to matter. However practice runs against the spec - once there's a proxy with proxy_buffering/fastcgi_buffering inbetween (enabled by default in nginx and similar software), it will stall, until TE: Chunked is used or all proxies on the way disable buffering. So there's definitelly pros/cons: With chunking you potentially can shoot yourself in the foot via wrong chunk message splitting, but have better chances of traversing proxies - especially the ones you have no control over in PaaS. |
Yea, not sure why would we even suggest chunked to be used with Braid? What is the reason for that? |
When a GET request with |
Why do you say "should"? Chunked encoding is certainly an option, but I think implementers can choose what's best for themselves. |
Should we say then that chunks have to be split by patches? |
Probably because I don't know the full range of possibilities :) "That's what I've seen so far" would have been more accurate. |
The idea of Transfer-Encoding is that it only applies to the information encoded over the network -- it encodes no information the entity itself. For instance, you can use It is nice if the Transfer-Encoding is separate from the representation of the entity, because then you can apply different Transfer-Encodings to the same entity. Any time we force a particular encoding to be used with Braid messages, we are limiting the range of possibilities for our implementors. Do we have a reason to tie their arms in a straight-jacket? |
One reason is efficiency. I was looking at the number of bytes used if patches aligned with chunks, and it could be significant if the amount of data per patch is small (such as a single integer value, for instance):
vs
(Note that these numbers are, as you say, transport-level values visible only via netcat, for instance. The content-length header would be visible by anything able to see the response). |
@canadaduane Don't do this - mixing entity transport layer with entity contents. HTTP/2 for instance is implicitly chunked (frame serves same role as chunks in TE: Chunked), but making chunk boundaries part of entity would make implementations incredibly awkward, if not outright impossible as framing (and chunking) is usually not exposed on higher API levels. There's also complex interaction with compression (you'd have to Z_FLUSH every frame) I'm starting to see the wisdom of why transport encodings are deliberately ambiguous even in places where details may matter (such as buffering gotchas). People might be tempted to bind implementation behavior to a transport, only to be burned by it later when trying to interop (I sure did). |
@canadaduane Ah, so you are concerned about the space that headers take up? First, HTTP/2 has header compression which will handle that. So this would only be an issue with people using HTTP/1 that care about efficiency, but don't want to upgrade. But even for these folks, I'd recommend just altering the protocol for separating patches and versions (e.g. instead of But also, even after specifying the content-length, we still have to specify the |
@toomim I'm not sure if HTTP/2 has compression for headers inside the entity body (of Subscribe: stream). That said, just plain compression will elide most of the protocol verbosity, it's hardly an issue. Performance wise, the biggest bottleneck ATM seems to be opposite direction - streaming versions to the server, due to sheer amount of individual requests. To which extent this is a problem needs to be simulated. Worst case, extend protocol to streaming multipart POST (to which extent that suffers buffering issues I have no idea yet). |
The current spec only describes how to encode Braid on HTTP 1. In a future round, we'll specify how to use HTTP/2 messages natively. When we do so, we will be able to send each version's headers, and each patches headers, using a native H2 "headers" frame: https://tools.ietf.org/html/rfc7540#section-6.2 This will let us benefit from HPACK header compression anywhere we want. :)
This is also solved with HTTP/2, which streams multiple requests over the same TCP connection. An additional request in H2 has no more overhead than any other message. It's just a couple frames in the TCP stream. |
FWIW I was re-reading the spec today and having either one of "Content-Length" or "Transfer-Encoding: chunked" in a patch was a point of confusion. Simplifying to just "Content-Length", if possible, would be an improvement IMO. |
Just to chime in here, I also agree that patches probably shouldn't be (internally) streamed. Or at least, I can't think of a motivating example where the length of a patch won't be known before the patch is sent. There may be cases where we want to start sending a patch before we know its size (eg replace the content of this image file with this new binary content). But my vote is also to leave it out of the spec until someone justifies its inclusion. |
It sounds like we have consensus to merge this PR, and remove chunked encoding from patches. Please speak now if you disagree. Do we also want to remove chunked encoding as a way of distinguishing the end-points of versions? I think it would make sense to do so, following @josephg's principle of "leave it out until someone justifies its inclusion." Consider that if you did want to stream a large version without knowing its length, you could always break it up into a sequence of intermediate versions... or stream a single version with a bunch of patches. |
We've had no objections, so I'm merging this PR now. It also sounds like we should write one up to removed chunked encoding from versions. |
Although
Content-Encoding: chunked
could conceivably be used to distinguish the starts and ends of patches, Karel points out that it's strange to have two levels of chunked encoding, and we don't actually have any motivating examples where one would needchunked
instead of the simplerContent-Length: N
.In HTTP, chunked encoding is useful when you don't know the full length of a message body ahead of time, for instance when you want to stream a large file. However, I'm not aware of any situations in which we need to stream a large patch.
Until we run across such situations, it seems reasonable to just remove
Transfer-Encoding: chunked
from the spec for Patches.(This PR does not remove
Transfer-Encoding: chunked
from Versions; just Patches.)