Remove chunked encoding from patches #75

toomim · 2021-01-19T08:27:04Z

Although Content-Encoding: chunked could conceivably be used to distinguish the starts and ends of patches, Karel points out that it's strange to have two levels of chunked encoding, and we don't actually have any motivating examples where one would need chunked instead of the simpler Content-Length: N.

In HTTP, chunked encoding is useful when you don't know the full length of a message body ahead of time, for instance when you want to stream a large file. However, I'm not aware of any situations in which we need to stream a large patch.

Until we run across such situations, it seems reasonable to just remove Transfer-Encoding: chunked from the spec for Patches.

(This PR does not remove Transfer-Encoding: chunked from Versions; just Patches.)

katuma · 2021-01-19T08:50:29Z

LGTM.

Finally, per HTTP/1.1 spec, GET responses that don't include Content-Length (like in this draft) are assumed to be delineated by termination of the stream. Until that happens, proxies are free to withhold the output, buffering it up until TCP EOF that "never" comes here - potentially lagging the subscription stream a lot. Conversely they never do that with TE: Chunked, as they hand off chunk-by-chunk to the client. Meaning top-level chunking should be probably a recommendation.

Of note is that the top-level chunking can split the inner version/patches at an arbitrary boundary (contrary to what my mail said) and clients must not depend on chunk boundaries to denote version chunks.

This is due the way how it interacts with gzip encoding (gzip first, then chunk the output stream). Meaning a client receiving the feed has to demux the top level chunking first, and then split out version/patches blocks by keeping parser state.

mitar · 2021-01-20T10:14:16Z

draft-toomim-httpbis-braid-http-03.txt

@@ -302,14 +302,12 @@ Table of Contents


   In order to distinguish each patch within a Version, we need to know
-   the length of the patch.  To know the length of the patch, each patch
-   must include one of the following headers:
+   the length of each patch.  To know the length of each past, it must


past typo?

Yes, thank you. Fixed.

toomim · 2021-01-21T22:34:06Z

@katuma Thanks for this additional insight:

Finally, per HTTP/1.1 spec, GET responses that don't include Content-Length (like in this draft) are assumed to be delineated by termination of the stream. Until that happens, proxies are free to withhold the output, buffering it up until TCP EOF that "never" comes here - potentially lagging the subscription stream a lot. Conversely they never do that with TE: Chunked, as they hand off chunk-by-chunk to the client. Meaning top-level chunking should be probably a recommendation.

If this is true, we should add this recommendation to the spec, but my instinct is to test it first, so that we know that it is actually necessary. Presumably this issue would also affect SSE, however I just checked the SSE Specification and see no mention of Transfer-Encoding: chunked. Any idea why?

Of note is that the top-level chunking can split the inner version/patches at an arbitrary boundary (contrary to what my mail said) and clients must not depend on chunk boundaries to denote version chunks.

Do you think that this needs to be clarified in the spec? I estimate the spec is already unambiguous when it says:

   In order to distinguish each patch within a Version, we need to know
   the length of each patch.  To know the length of each patch, it must
   include the following header:

         Content-Length: N

   This length determines where this patch ends, and next begins.

...and:

   To send multiple updates, a server concatenates multiple
   sub-responses into a single response body.  Each sub-response must
   contain its own headers and body.  Each sub-response must have a
   known length, which means it must contain one of the following
   headers:

      - Content-Length: N
      - Transfer-Encoding: chunked
      - Patches: N

   Each sub-response must have both headers and a body.  The body may be
   zero-length.

Although perhaps we could go ahead and uppercase MUST, to make it clear that it's required. I'll do that now.

katuma · 2021-01-22T00:39:11Z

@toomim I'm not really sure spec wise because the issue of unknown-entity length blocking is underspecified in there across the board. SSE itself vaguely discourages Chunked-Encoding, perhaps too due to the way it would split events on unfortunate boundaries potentially (fe packing multiple events in a single chunk):

Authors are also cautioned that HTTP chunking can have unexpected negative effects on the reliability of this protocol. Where possible, chunking should be disabled for serving event streams unless the rate of messages is high enough for this not to matter.

However practice runs against the spec - once there's a proxy with proxy_buffering/fastcgi_buffering inbetween (enabled by default in nginx and similar software), it will stall, until TE: Chunked is used or all proxies on the way disable buffering.

So there's definitelly pros/cons: With chunking you potentially can shoot yourself in the foot via wrong chunk message splitting, but have better chances of traversing proxies - especially the ones you have no control over in PaaS.

mitar · 2021-01-22T02:37:34Z

Yea, not sure why would we even suggest chunked to be used with Braid? What is the reason for that?

canadaduane · 2021-01-22T05:55:20Z

Yea, not sure why would we even suggest chunked to be used with Braid? What is the reason for that?

When a GET request with Subscribe: header is made, the response should include Transport-Encoding: Chunked as part of the protocol for streaming patches.

toomim · 2021-01-22T06:04:17Z

Why do you say "should"? Chunked encoding is certainly an option, but I think implementers can choose what's best for themselves.

mitar · 2021-01-22T06:14:17Z

Should we say then that chunks have to be split by patches?

canadaduane · 2021-01-22T06:37:08Z

Why do you say "should"?

Probably because I don't know the full range of possibilities :) "That's what I've seen so far" would have been more accurate.

toomim · 2021-01-22T06:42:39Z

The idea of Transfer-Encoding is that it only applies to the information encoded over the network -- it encodes no information the entity itself.

For instance, you can use Transfer-Encoding: gzip for a raw HTML file, which doesn't say that the HTML file, as a HTTP resource, is gzipped or even related to gzip. It just says that the network transport is speaking in the gzip dialect when it sends the resource.

It is nice if the Transfer-Encoding is separate from the representation of the entity, because then you can apply different Transfer-Encodings to the same entity. Any time we force a particular encoding to be used with Braid messages, we are limiting the range of possibilities for our implementors. Do we have a reason to tie their arms in a straight-jacket?

canadaduane · 2021-01-22T06:56:42Z

Do we have a reason to tie their arms in a straight-jacket?

One reason is efficiency. I was looking at the number of bytes used if patches aligned with chunks, and it could be significant if the amount of data per patch is small (such as a single integer value, for instance):

vs

15
content-length: 2


2
10

15
content-length: 3

3
101

(Note that these numbers are, as you say, transport-level values visible only via netcat, for instance. The content-length header would be visible by anything able to see the response).

katuma · 2021-01-22T07:10:40Z

@canadaduane Don't do this - mixing entity transport layer with entity contents. HTTP/2 for instance is implicitly chunked (frame serves same role as chunks in TE: Chunked), but making chunk boundaries part of entity would make implementations incredibly awkward, if not outright impossible as framing (and chunking) is usually not exposed on higher API levels. There's also complex interaction with compression (you'd have to Z_FLUSH every frame)

I'm starting to see the wisdom of why transport encodings are deliberately ambiguous even in places where details may matter (such as buffering gotchas). People might be tempted to bind implementation behavior to a transport, only to be burned by it later when trying to interop (I sure did).

toomim · 2021-01-22T07:42:47Z

@canadaduane Ah, so you are concerned about the space that headers take up?

First, HTTP/2 has header compression which will handle that.

So this would only be an issue with people using HTTP/1 that care about efficiency, but don't want to upgrade. But even for these folks, I'd recommend just altering the protocol for separating patches and versions (e.g. instead of Content-Length: 100, just say 100). We can define the ASCII to be whatever we want! We don't need to muck around with the transfer-encoding.

But also, even after specifying the content-length, we still have to specify the version, parents, merge-type, etc. headers. So any proposal that eliminates content-type as a header still needs a way to express these other headers.

katuma · 2021-01-22T07:57:08Z

@toomim I'm not sure if HTTP/2 has compression for headers inside the entity body (of Subscribe: stream). That said, just plain compression will elide most of the protocol verbosity, it's hardly an issue.

Performance wise, the biggest bottleneck ATM seems to be opposite direction - streaming versions to the server, due to sheer amount of individual requests. To which extent this is a problem needs to be simulated. Worst case, extend protocol to streaming multipart POST (to which extent that suffers buffering issues I have no idea yet).

toomim · 2021-01-22T08:49:43Z

@toomim I'm not sure if HTTP/2 has compression for headers inside the entity body (of Subscribe: stream).

The current spec only describes how to encode Braid on HTTP 1. In a future round, we'll specify how to use HTTP/2 messages natively. When we do so, we will be able to send each version's headers, and each patches headers, using a native H2 "headers" frame: https://tools.ietf.org/html/rfc7540#section-6.2

This will let us benefit from HPACK header compression anywhere we want. :)

Performance wise, the biggest bottleneck ATM seems to be opposite direction - streaming versions to the server, due to sheer amount of individual requests. To which extent this is a problem needs to be simulated. Worst case, extend protocol to streaming multipart POST (to which extent that suffers buffering issues I have no idea yet).

This is also solved with HTTP/2, which streams multiple requests over the same TCP connection. An additional request in H2 has no more overhead than any other message. It's just a couple frames in the TCP stream.

canadaduane · 2021-01-26T20:55:46Z

FWIW I was re-reading the spec today and having either one of "Content-Length" or "Transfer-Encoding: chunked" in a patch was a point of confusion. Simplifying to just "Content-Length", if possible, would be an improvement IMO.

josephg · 2021-02-16T09:13:36Z

Just to chime in here, I also agree that patches probably shouldn't be (internally) streamed. Or at least, I can't think of a motivating example where the length of a patch won't be known before the patch is sent.

There may be cases where we want to start sending a patch before we know its size (eg replace the content of this image file with this new binary content). But my vote is also to leave it out of the spec until someone justifies its inclusion.

toomim · 2021-02-16T10:18:48Z

It sounds like we have consensus to merge this PR, and remove chunked encoding from patches. Please speak now if you disagree.

Do we also want to remove chunked encoding as a way of distinguishing the end-points of versions? I think it would make sense to do so, following @josephg's principle of "leave it out until someone justifies its inclusion."

Consider that if you did want to stream a large version without knowing its length, you could always break it up into a sequence of intermediate versions... or stream a single version with a bunch of patches.

toomim · 2021-02-26T19:56:26Z

We've had no objections, so I'm merging this PR now. It also sounds like we should write one up to removed chunked encoding from versions.

Remove chunked encoding from patches

0d2a658

mitar reviewed Jan 20, 2021

View reviewed changes

fix typo

245d4c2

mitar approved these changes Jan 21, 2021

View reviewed changes

Upcase "MUST" in patch/version length header.

947e7b4

toomim mentioned this pull request Feb 14, 2021

Remove chunked encoding from patches #88

Closed

toomim added this to the Braid-HTTP 03 milestone Feb 14, 2021

toomim added the discuss at meeting label Feb 22, 2021

toomim mentioned this pull request Feb 26, 2021

Patches as a separate specification? #99

Open

toomim merged commit d882d35 into master Feb 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove chunked encoding from patches #75

Remove chunked encoding from patches #75

toomim commented Jan 19, 2021

katuma commented Jan 19, 2021 •

edited

Loading

mitar Jan 20, 2021

toomim Jan 21, 2021

toomim commented Jan 21, 2021

katuma commented Jan 22, 2021 •

edited

Loading

mitar commented Jan 22, 2021

canadaduane commented Jan 22, 2021

toomim commented Jan 22, 2021 •

edited

Loading

mitar commented Jan 22, 2021

canadaduane commented Jan 22, 2021

toomim commented Jan 22, 2021 •

edited

Loading

canadaduane commented Jan 22, 2021 •

edited

Loading

katuma commented Jan 22, 2021 •

edited

Loading

toomim commented Jan 22, 2021

katuma commented Jan 22, 2021

toomim commented Jan 22, 2021 •

edited

Loading

canadaduane commented Jan 26, 2021

josephg commented Feb 16, 2021

toomim commented Feb 16, 2021

toomim commented Feb 26, 2021

Remove chunked encoding from patches #75

Remove chunked encoding from patches #75

Conversation

toomim commented Jan 19, 2021

katuma commented Jan 19, 2021 • edited Loading

mitar Jan 20, 2021

Choose a reason for hiding this comment

toomim Jan 21, 2021

Choose a reason for hiding this comment

toomim commented Jan 21, 2021

katuma commented Jan 22, 2021 • edited Loading

mitar commented Jan 22, 2021

canadaduane commented Jan 22, 2021

toomim commented Jan 22, 2021 • edited Loading

mitar commented Jan 22, 2021

canadaduane commented Jan 22, 2021

toomim commented Jan 22, 2021 • edited Loading

canadaduane commented Jan 22, 2021 • edited Loading

katuma commented Jan 22, 2021 • edited Loading

toomim commented Jan 22, 2021

katuma commented Jan 22, 2021

toomim commented Jan 22, 2021 • edited Loading

canadaduane commented Jan 26, 2021

josephg commented Feb 16, 2021

toomim commented Feb 16, 2021

toomim commented Feb 26, 2021

katuma commented Jan 19, 2021 •

edited

Loading

katuma commented Jan 22, 2021 •

edited

Loading

toomim commented Jan 22, 2021 •

edited

Loading

toomim commented Jan 22, 2021 •

edited

Loading

canadaduane commented Jan 22, 2021 •

edited

Loading

katuma commented Jan 22, 2021 •

edited

Loading

toomim commented Jan 22, 2021 •

edited

Loading