Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ext/fetch): consume body using ops #16038

Merged
merged 11 commits into from Oct 4, 2022

Conversation

marcosc90
Copy link
Contributor

@marcosc90 marcosc90 commented Sep 26, 2022

This PR improves performance of InnerBody.consume:

  • Added *read_all ops to read full request/response body
  • Use ops directly instead of ReadableStream when possible
  • Use content-length header on HTTP requests to allocate the full buffer

128kb binary POST (ext/http)

this patch (http/read_all) Main (http) Main (flash) Node
req/s 20905.82 9363.97 10176.78 14455.24

POST JSON (ext/http)

this patch (http/read_all) Main (http) Main (flash) Node
req/s 47419.98 30791.74 - 34162.04

Fetch

Consuming a 2gb ArrayBuffer through fetch

Node / Undici

User time (seconds): 1.20
System time (seconds): 1.35
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.72
Maximum resident set size (kbytes): 4287288

Main

User time (seconds): 1.35
System time (seconds): 0.90
Percent of CPU this job got: 79%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:02.84
Maximum resident set size (kbytes): 2163648

This patch

User time (seconds): 0.22
System time (seconds): 0.77
Percent of CPU this job got: 55%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.79
Maximum resident set size (kbytes): 2135224

Measured with: /usr/bin/time --verbose {command}

@marcosc90 marcosc90 changed the title perf(ext/fetch): read directly into finalBuffer when possible perf(ext/fetch): read directly into finalBuffer and use ops Sep 26, 2022
@marcosc90
Copy link
Contributor Author

ci/bench failure is fixed by #16055

@marcosc90 marcosc90 changed the title perf(ext/fetch): read directly into finalBuffer and use ops perf(ext/fetch): consume body using ops Sep 27, 2022
@marcosc90
Copy link
Contributor Author

marcosc90 commented Sep 27, 2022

Did a huge refactor, code is much cleaner now and it also grealty improves performance for requests where content-length is not present.

fetch

body size: 2gb

this patch Main Deno 1.25.4 Node 18.7
chunked 1105ms 3850ms 3865ms 3000s
content-length 895ms 2089ms 3336ms 1830ms

measured the time of .arrayBuffer with console.time

console.time('body');
await res.arrayBuffer();
console.timeEnd('body');

@marcosc90 marcosc90 force-pushed the perf-body-read-into-buffer branch 4 times, most recently from bdc7c9c to e572aae Compare September 29, 2022 08:16
@marcosc90
Copy link
Contributor Author

Towards #16046

@lucacasonato lucacasonato mentioned this pull request Sep 29, 2022
30 tasks
@lucacasonato
Copy link
Member

lucacasonato commented Sep 29, 2022

Thank you for starting work on this. I think this is a good direction, but I would like to see some changes to the implementation to make it align with #16046 more closely.

Namely, I think this optimization should be decoupled from InnerBody. What I mean with this, is that the rid is a properties of the underlying ReadableStream, not the InnerBody. If we change these properties to be properties of the ReadableStream, then any stream annotated will this property will be opted into a fast path, rather than just the streams inside of an InnerBody created by fetch itself. In addition, the knownExactLength is a property of the underlying Resource, not of the ReadableStream. As such it should be stored on the resource Rust side, not JS side.

Specifically:

  • All resource backed ReadableStream should be branded with a [_rid] property.
    • The response body for fetch is such an object, so it should be branded.
    • Resource backed ReadableStream objects can be created from a rid using the readableStreamForRid function in ext/web/06_streams.js
  • InnerBody#consume should check if the body is a resource backed ReadableStream and if so call op_read_all(rid) to read the entire stream in one go.
    • The getReadableStreamRid function in ext/web/06_streams.js can perform this brand check.
    • This entire "read all" codepath can be moved into a readableStreamCollectIntoUint8Array function in ext/web/06_streams.js that can handle any ReadableStream, including non resource backed streams. The entire fast & slow path can be contained within this function.
  • Make the knownExactLength a property of the Rust side resource, and don't send it to JS.
    • This way other resources that may have a fixed known size can also make use of this optimization.
    • The exact length only needs to be known Rust side, as the underlying buffer for op_read_all is allocated in Rust.
    • To do this, add a new fn size_hint(&self): (u64, Option<u64>); method to the deno_core::Resource trait. This method returns to the caller a hint about how large the resource is in the form (minimum, maybe_maximum). The trait method can have a default implementation that returns (0, None), as this bound is valid for any resource.
    • Add a custom Resource::size_hint implementation to FetchResponseBodyResource that returns the exact length if known up front (from the reqwest::Response::content_length() method).
    • Update op_read_all to use the size_hint method on the resource to pre-allocate an underlying buffer with the correct size if possible. If no size hint is available, we can fall back to a slower allocation heavy path in Rust, that will still be faster than a read-all implemented in JS.
  • In a follow up, make this work with HttpStreamResource.
    • This requires that HttpStreamResource implements the Resource::read/Resource::read_return trait method, and use the regular op_read instead of a specialized op_http_read.
    • A custom Resource::size_hint implementation for HttpStreamResource is also necessary then.

With these changes the improvements would be noticeable for all resource backed streams, not just Request and Response returned from fetch or Deno.serveHttp. The following would also get a nice speed up for example (because file.readable is a resource backed stream):

const file = await Deno.open("./my_very_large_file.txt");
const resp = new Response(file.readable);
const txt = await resp.text();

@marcosc90 marcosc90 marked this pull request as draft September 29, 2022 17:31
@marcosc90 marcosc90 marked this pull request as ready for review September 30, 2022 08:49
@marcosc90
Copy link
Contributor Author

@lucacasonato PTAL

core/resources.rs Outdated Show resolved Hide resolved
ext/fetch/lib.rs Outdated Show resolved Hide resolved
ext/web/06_streams.js Outdated Show resolved Hide resolved
ext/web/06_streams.js Show resolved Hide resolved
@@ -168,6 +169,26 @@ async fn op_read(
resource.read_return(buf).await.map(|(n, _)| n as u32)
}

#[op]
async fn op_read_all(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. #16115 will allow us to get rid of multiple allocations in this op in a follow up.

ext/http/lib.rs Outdated

fn size_hint(&self) -> (u64, Option<u64>) {
// TLS max packet size is 16kb, so we use that as a default.
(16 * 1024 + 256, self.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the Request::size_hint() here, as described above.

@lucacasonato
Copy link
Member

Very close. Great refactor @marcosc90

ext/fetch/22_body.js Outdated Show resolved Hide resolved
Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there was one additional comment that I forgot to press "submit" on last time 🤦

This is also what my earlier comment regarding Request::size_hint was referring to. Sorry!

ext/http/lib.rs Outdated
Comment on lines 189 to 201
let content_length = if request
.headers()
.get(hyper::header::CONTENT_ENCODING)
.is_none()
{
request
.headers()
.get(hyper::header::CONTENT_LENGTH)
.and_then(|v| v.to_str().ok())
.and_then(|v| v.parse::<u64>().ok())
} else {
None
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with a call to Request::size_hint(). This will return the lower and upper bound for the body size. You can return this upper and lower bound from HttpStreamResource::size_hint().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @lucacasonato, PTAL


let mut buffer = Vec::with_capacity(size);
loop {
let tmp = ZeroCopyBuf::new_temp(vec![0u8; 64 * 1024]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the tmp could be moved out of the loop and only allocated once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error[E0382]: use of moved value: `tmp`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aapoalas the whole allocation + copy can go away for known size bodies when #16115 lands

Copy link
Member

@lucacasonato lucacasonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me too, massive effort @marcosc90!

@lucacasonato lucacasonato merged commit 569287b into denoland:main Oct 4, 2022
@marcosc90 marcosc90 deleted the perf-body-read-into-buffer branch October 4, 2022 13:49
@marcosc90
Copy link
Contributor Author

🚀
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants