Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: use chunked encoding to get partial results out faster #23940

Merged
merged 1 commit into from
Nov 8, 2018

Conversation

robbat2
Copy link
Contributor

@robbat2 robbat2 commented Sep 5, 2018

Some operations can take a long time to have their complete result.

If a RGW operation does not set a content-length header, the RGW
frontends (CivetWeb, Beast) buffer the entire request so that a
Content-Length header can be sent.

If the RGW operation takes long enough, the buffering time may exceed
keepalive values, and because no bytes have been sent in the connection,
the connection will be reset.

If a HTTP response header contains neither Content-Length or chunked
Transfer-Encoding, HTTP keep-alive is not possible.

To fix the issue within these requirements, use chunked
Transfer-Encoding for the following operations:

  • RGWCopyObj_ObjStore_S3 **
  • RGWDeleteMultiObj_ObjStore_S3 **
  • RGWGetUsage_ObjStore_S3
  • RGWListBucketMultiparts_ObjStore_S3
  • RGWListBucket_ObjStore_S3
  • RGWListBuckets_ObjStore_S3
  • RGWListMultipart_ObjStore_S3

RGWCopyObj & RGWDeleteMultiObj specifically use send_partial_response
for long-running operations, and are the most impacted by this issue,
esp. for large inputs. RGWCopyObj attempts to send a Progress header
during the copy, but it's not actually passed on to the client until the
end of the copy, because it's buffered by the RGW frontends!

The HTTP/1.1 specification REQUIRES chunked encoding to be supported,
and the specification does NOT require "chunked" to be included in the
"TE" request header.

This patch has one side-effect: this causes many more small IP packets.
When combined with high-latency links this can increase the apparent
deletion time due to round trips and TCP slow start. Future improvements
to the RGW frontends are possible in two seperate but related ways:

  • The FE could continue to push more chunks without waiting for the ACK
    on the previous chunk, esp. while under the TCP window size.
  • The FE could be patched for different buffer flushing behaviors, as
    that behavior is presently unclear (packets of 200-500 bytes seen).

Performance results:

  • Bucket with 5M objects, index sharded 32 ways.
  • Index on SSD 3x replicas, Data on spinning disk, 5:2
  • Multi-delete of 1000 keys, with a common prefix.
  • Cache of index primed by listing the common prefix immediately before
    deletion.
  • Timing data captured at the RGW.
  • Timing t0 is the TCP ACK sent by the RGW at the end of the response
    body.
  • Client is ~75ms away from RGW.
    BEFORE:
    Time to first byte of response header: 11.3 seconds.
    Entire operation: 11.5 seconds.
    Response packets: 17
    AFTER:
    Time to first byte of response header: 3.5ms
    Entire operation: 16.36 seconds
    Response packets: 206

Backport: mimic, luminous
Issue: http://tracker.ceph.com/issues/12713
Signed-off-by: Robin H. Johnson rjohnson@digitalocean.com

Some operations can take a long time to have their complete result.

If a RGW operation does not set a content-length header, the RGW
frontends (CivetWeb, Beast) buffer the entire request so that a
Content-Length header can be sent.

If the RGW operation takes long enough, the buffering time may exceed
keepalive values, and because no bytes have been sent in the connection,
the connection will be reset.

If a HTTP response header contains neither Content-Length or chunked
Transfer-Encoding, HTTP keep-alive is not possible.

To fix the issue within these requirements, use chunked
Transfer-Encoding for the following operations:

* RGWCopyObj_ObjStore_S3 **
* RGWDeleteMultiObj_ObjStore_S3 **
* RGWGetUsage_ObjStore_S3
* RGWListBucketMultiparts_ObjStore_S3
* RGWListBucket_ObjStore_S3
* RGWListBuckets_ObjStore_S3
* RGWListMultipart_ObjStore_S3

RGWCopyObj & RGWDeleteMultiObj specifically use send_partial_response
for long-running operations, and are the most impacted by this issue,
esp. for large inputs. RGWCopyObj attempts to send a Progress header
during the copy, but it's not actually passed on to the client until the
end of the copy, because it's buffered by the RGW frontends!

The HTTP/1.1 specification REQUIRES chunked encoding to be supported,
and the specification does NOT require "chunked" to be included in the
"TE" request header.

This patch has one side-effect: this causes many more small IP packets.
When combined with high-latency links this can increase the apparent
deletion time due to round trips and TCP slow start. Future improvements
to the RGW frontends are possible in two seperate but related ways:
- The FE could continue to push more chunks without waiting for the ACK
  on the previous chunk, esp. while under the TCP window size.
- The FE could be patched for different buffer flushing behaviors, as
  that behavior is presently unclear (packets of 200-500 bytes seen).

Performance results:
- Bucket with 5M objects, index sharded 32 ways.
- Index on SSD 3x replicas, Data on spinning disk, 5:2
- Multi-delete of 1000 keys, with a common prefix.
- Cache of index primed by listing the common prefix immediately before
  deletion.
- Timing data captured at the RGW.
- Timing t0 is the TCP ACK sent by the RGW at the end of the response
  body.
- Client is ~75ms away from RGW.
BEFORE:
Time to first byte of response header: 11.3 seconds.
Entire operation: 11.5 seconds.
Response packets: 17
AFTER:
Time to first byte of response header: 3.5ms
Entire operation: 16.36 seconds
Response packets: 206

Backport: mimic, luminous
Issue: http://tracker.ceph.com/issues/12713
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
@liewegas liewegas added the rgw label Sep 5, 2018
@mattbenjamin
Copy link
Contributor

mattbenjamin commented Sep 5, 2018

@robbat2 hey robin, just to clarify

  1. this works with the beast fe as well as civetweb?
  2. do you expect that different flushing behaviors will lead to shorter total response times?
    (i.e., 17->206 pkts is a lot)
    Matt

@mattbenjamin mattbenjamin self-assigned this Sep 5, 2018
@robbat2
Copy link
Contributor Author

robbat2 commented Sep 6, 2018

@mattbenjamin

  1. Beast definitely exhibits the same problem (proven w/ ceph-nano), but I didn't confirm that the fix solved it on Beast, as I did development on Luminous and forward-ported the fix, and couldn't spin up a test environment to the scale where the fix would show up in performance tests. The Chunked encoding itself should work fine as a couple of the Swift operations use it.
  2. Yes, different flushing behavior will definitely shorten the total response time: by having less (but larger) packets or simply more packets in flight (if it doesn't wait for the TCP ACK before sending more).

robbat2 pushed a commit to robbat2/ceph that referenced this pull request Sep 14, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
cbodley pushed a commit to ceph/ceph-ci that referenced this pull request Sep 17, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph/ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
cbodley pushed a commit to ceph/ceph-ci that referenced this pull request Sep 18, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph/ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
pdvian pushed a commit to pdvian/ceph that referenced this pull request Oct 2, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
pdvian pushed a commit to pdvian/ceph that referenced this pull request Oct 2, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
batrick pushed a commit to batrick/ceph that referenced this pull request Oct 5, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Resolves: rhbz#1635259

Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
(cherry picked from commit 5eb6686)
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Oct 29, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Resolves: rhbz#1635259

Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
(cherry picked from commit 5eb6686)
@stale
Copy link

stale bot commented Nov 5, 2018

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
If you are a maintainer or core committer, please follow-up on this issue to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@stale stale bot added the stale label Nov 5, 2018
@cbodley
Copy link
Contributor

cbodley commented Nov 5, 2018

please don't kill this, friendly robot

@stale stale bot removed the stale label Nov 5, 2018
@mattbenjamin
Copy link
Contributor

mattbenjamin commented Nov 8, 2018

@mattbenjamin mattbenjamin self-requested a review November 8, 2018 03:00
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, passed rgw suite run

@mattbenjamin mattbenjamin merged commit 92a19f0 into ceph:master Nov 8, 2018
ivancich pushed a commit to ivancich/ceph-fork that referenced this pull request Nov 30, 2018
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Resolves: rhbz#1635259

Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
(cherry picked from commit 5eb6686)
@cbodley
Copy link
Contributor

cbodley commented May 7, 2019

this tracker issue never made it to 'pending backport', so this fix fell through the cracks. i updated the tracker, and raised a mimic backport in #28014

thmour pushed a commit to thmour/ceph that referenced this pull request Jun 7, 2019
For HTTP responses sent with chunked-encoding, and greater than 1MiB in
size, the chunk-size field was being printed wrong.

Specifically, the chunk-size field was being sent with a mangled or
missing trailer of '\r\n'.

This bug manifested as HTTP clients being unable to read the response:
Chrome generates ERR_INCOMPLETE_CHUNKED_ENCODING
Python/boto generates httplib.LineTooLong: got more than 65536 bytes when reading chunk size

The wrong variable was being used to determine the size of the buffer
used for the chunk-size field.

Fix it by using the correct variable, and rename the variables to
clearly reflect their purpose.

Prior to PR#23940, this would only have been seen in some Swift
operations. PR#23940 changed some S3 operations to also use chunked
encoding to get responses sent faster, and made the bug easier to
detect. It was initially reported for a ListBucket call with a high
max-keys argument.

Backport: luminous, mimic
Reference: https://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1
Reference: ceph#23940
Fixes: http://tracker.ceph.com/issues/35990
Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit 3b86448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants