Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple HTTP cleanups and followup to #72 #76

Merged
merged 10 commits into from
Nov 28, 2022
Merged

Conversation

apoelstra
Copy link
Owner

Tightens the mutex lifetime to improve performance. Basically we read all the data into a buffer and then unlock the mutex ASAP. We could be a bit more memory-efficient by using serde_json::from_reader to parse directly from the socket, but (a) that would make it harder to enforce the Content-length header; and (b) it'd hold the mutex for longer than necessary.

This commit also splits out the HttpParseError into several variants, which unfortunately makes this a breaking change. I think I can move that commit into its own PR if we want to get a minor rev out, but I don't think we care too much about that since this crate is pretty far to the edge of most dependency trees.

Adds a fuzz harness but doesn't integrate it into CI. Fixes a couple bugs I found while fuzzing locally.

@apoelstra
Copy link
Owner Author

cc @raphjaph does this do what you want?
cc @tcharding for review

@apoelstra
Copy link
Owner Author

This PR does not recover the pre-#72 performance. I'm investigating.

@apoelstra
Copy link
Owner Author

Honestly I'm not sure what to make of this. It looks like for some reason Bitcoin Core takes ~10ms to reply to getblock calls prior to the socket reuse patch, and ~60ms after. This is visible with debug=1 and logtimemicros=1 on a Core node, e.g. as

2022-11-19T19:31:06.854923Z [rpc] ThreadRPCServer method=getblock user=__cookie__
2022-11-19T19:31:06.895467Z [libevent:debug] evhttp_add_header: key: Content-Type val: application/json

where that add_header is the header of its own response.

It has nothing to do with the mutex on our end, or any sort of buffering, as near I can tell. I have tried it with the tightened mutex scope, with socket pools of various sizes, and it seems like it's quick to reply to the first RPC call on a connection, and slow to reply to every one after that. Although there is some noise so it's hard to be certain.

@apoelstra
Copy link
Owner Author

Ok, I can get most of the performance back by both:

  • Storing the BufReader rather than creating a new one on every request. (This is actually a bug, since when doing so, we potentially read further extra data from the stream that we simply drop.)
  • Calling serde_json::from_reader rather than reading the data into a buffer then calling serde_json::from_slice

I don't know why the latter is necessary when it wasn't under the old "create a new connection every time" model, and I don't know why I need to make both changes when only the latter should provide any visible change under normal circumstances.

Copy link
Collaborator

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor questions but apart from that looks good to me.

let http_response = get_line(&mut reader, request_deadline)?;
if http_response.len() < 12 || !http_response.starts_with("HTTP/1.1 ") {
return Err(Error::HttpParseError);
let http_response = get_line(&mut reader, Instant::now() + self.timeout)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason not to use request_deadline here? If there is one can be have a code comment please?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant to change request_deadline everywhere, and it's a rebasing mistake that it hasn't been changed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

});
},
Some(n) => {
let mut buffer = Vec::with_capacity(INITIAL_RESP_ALLOC as usize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the buffer have capacity n?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. I went back and forth a couple times on this, but now that we bound n I think we should use it.

struct TcpStream;

#[cfg(fuzzing)]
mod impls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to put the implementation of the fuzzed TcpStream into its own module?

I'm new to Rust so forgive me if this is a silly question.

Copy link
Collaborator

@tcharding tcharding Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that the #[cfg(fuzzing)] is applied to the whole module scope.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that makes sense.

The TcpStream is declared outside of the impls mod so it doesn't have have to be imported explicitly in the simple_http mod.

@raphjaph
Copy link
Contributor

Looks good to me.

I don't 100% understand how the fuzzing works but have never seen that so it was cool to see that in the wild.

@apoelstra
Copy link
Owner Author

Thanks @raphjaph !

Unfortunately I am going to rebase this and change a few things (hopefully not a huge code diff). So I may need you to review again.

@apoelstra
Copy link
Owner Author

@tcharding I have pushed 4 new commits which address your nits and get us some further perf improvements.

Weirdly this is still not as fast as the old "new connection for every request" logic, even though it's way more intelligent about buffering, but we're within a factor 2 now, and we're able to completely pin my CPU (the old code was 50/50 Core vs icboc, this code is 30/70, so the implication is that we're doing more processing and that's why we're slower ... but why?) so I'm okay with it.

@apoelstra
Copy link
Owner Author

apoelstra commented Nov 26, 2022

I think the performance hit is caused by serde-rs/json#160 (comment) ... but I don't have a good fix because reading from the socket into memory prior to starting parsing is actually slower in this case. Edit the prior code did this, but line-by-line and doing utf8-decoding on each line before reading more. I suspect that it simply accidentally was pumping various buffers at the right resonance (along with the frequent socket connections triggering the kernel to wake up bitcoind at just the right time), and there is no principled or reproducable way to get that performance back.

I think we should just leave this be until somebody complains, then maybe we need to write our own json deserializer or something.

Copy link
Collaborator

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 2380d90

@raphjaph
Copy link
Contributor

ACK 2380d90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants