Simple HTTP cleanups and followup to #72 #76

apoelstra · 2022-11-19T16:54:57Z

Tightens the mutex lifetime to improve performance. Basically we read all the data into a buffer and then unlock the mutex ASAP. We could be a bit more memory-efficient by using serde_json::from_reader to parse directly from the socket, but (a) that would make it harder to enforce the Content-length header; and (b) it'd hold the mutex for longer than necessary.

This commit also splits out the HttpParseError into several variants, which unfortunately makes this a breaking change. I think I can move that commit into its own PR if we want to get a minor rev out, but I don't think we care too much about that since this crate is pretty far to the edge of most dependency trees.

Adds a fuzz harness but doesn't integrate it into CI. Fixes a couple bugs I found while fuzzing locally.

There is only one case left, which probably should not be an error at all.

Also eliminate an allocation when parsing the header.

apoelstra · 2022-11-19T16:55:16Z

cc @raphjaph does this do what you want?
cc @tcharding for review

apoelstra · 2022-11-19T17:32:06Z

This PR does not recover the pre-#72 performance. I'm investigating.

apoelstra · 2022-11-19T19:49:10Z

Honestly I'm not sure what to make of this. It looks like for some reason Bitcoin Core takes ~10ms to reply to getblock calls prior to the socket reuse patch, and ~60ms after. This is visible with debug=1 and logtimemicros=1 on a Core node, e.g. as

2022-11-19T19:31:06.854923Z [rpc] ThreadRPCServer method=getblock user=__cookie__
2022-11-19T19:31:06.895467Z [libevent:debug] evhttp_add_header: key: Content-Type val: application/json

where that add_header is the header of its own response.

It has nothing to do with the mutex on our end, or any sort of buffering, as near I can tell. I have tried it with the tightened mutex scope, with socket pools of various sizes, and it seems like it's quick to reply to the first RPC call on a connection, and slow to reply to every one after that. Although there is some noise so it's hard to be certain.

apoelstra · 2022-11-19T22:44:35Z

Ok, I can get most of the performance back by both:

Storing the BufReader rather than creating a new one on every request. (This is actually a bug, since when doing so, we potentially read further extra data from the stream that we simply drop.)
Calling serde_json::from_reader rather than reading the data into a buffer then calling serde_json::from_slice

I don't know why the latter is necessary when it wasn't under the old "create a new connection every time" model, and I don't know why I need to make both changes when only the latter should provide any visible change under normal circumstances.

tcharding

Couple minor questions but apart from that looks good to me.

tcharding · 2022-11-20T23:20:50Z

src/simple_http.rs

-        let http_response = get_line(&mut reader, request_deadline)?;
-        if http_response.len() < 12 || !http_response.starts_with("HTTP/1.1 ") {
-            return Err(Error::HttpParseError);
+        let http_response = get_line(&mut reader, Instant::now() + self.timeout)?;


What's the reason not to use request_deadline here? If there is one can be have a code comment please?

I think I meant to change request_deadline everywhere, and it's a rebasing mistake that it hasn't been changed.

tcharding · 2022-11-20T23:27:30Z

src/simple_http.rs

+                });
+            },
+            Some(n) => {
+                let mut buffer = Vec::with_capacity(INITIAL_RESP_ALLOC as usize);


Shouldn't the buffer have capacity n?

Yeah, I think so. I went back and forth a couple times on this, but now that we bound n I think we should use it.

raphjaph · 2022-11-23T20:38:49Z

src/simple_http.rs

+struct TcpStream;
+
+#[cfg(fuzzing)]
+mod impls {


What is the reason to put the implementation of the fuzzed TcpStream into its own module?

I'm new to Rust so forgive me if this is a silly question.

So that the #[cfg(fuzzing)] is applied to the whole module scope.

thanks, that makes sense.

The TcpStream is declared outside of the impls mod so it doesn't have have to be imported explicitly in the simple_http mod.

raphjaph · 2022-11-23T21:05:24Z

Looks good to me.

I don't 100% understand how the fuzzing works but have never seen that so it was cool to see that in the wild.

apoelstra · 2022-11-24T15:04:08Z

Thanks @raphjaph !

Unfortunately I am going to rebase this and change a few things (hopefully not a huge code diff). So I may need you to review again.

When we create/destroy BufReaders we potentially lose data on the socket. Avoid doing this by keeping the BufReader permanently.

…ate buffer

apoelstra · 2022-11-26T15:34:32Z

@tcharding I have pushed 4 new commits which address your nits and get us some further perf improvements.

Weirdly this is still not as fast as the old "new connection for every request" logic, even though it's way more intelligent about buffering, but we're within a factor 2 now, and we're able to completely pin my CPU (the old code was 50/50 Core vs icboc, this code is 30/70, so the implication is that we're doing more processing and that's why we're slower ... but why?) so I'm okay with it.

apoelstra · 2022-11-26T15:51:44Z

I think the performance hit is caused by serde-rs/json#160 (comment) ... but I don't have a good fix because reading from the socket into memory prior to starting parsing is actually slower in this case. Edit the prior code did this, but line-by-line and doing utf8-decoding on each line before reading more. I suspect that it simply accidentally was pumping various buffers at the right resonance (along with the frequent socket connections triggering the kernel to wake up bitcoind at just the right time), and there is no principled or reproducable way to get that performance back.

I think we should just leave this be until somebody complains, then maybe we need to write our own json deserializer or something.

tcharding

ACK 2380d90

raphjaph · 2022-11-28T11:07:53Z

ACK 2380d90

apoelstra added 6 commits November 19, 2022 14:37

simple_http: tighten mutex lifetime

36708d6

simple_http: split HttpParseError into several variants

7640a4b

There is only one case left, which probably should not be an error at all.

simple_http: fix crash when receiving non-ascii HTTP hello

e6eaedd

simple_http: handle large or missing Content-length headers correctly

2a0903e

Also eliminate an allocation when parsing the header.

simple_http: mock out TcpStream when fuzzing

f44a535

add fuzzing harness for simple_http

8d2ec83

apoelstra mentioned this pull request Nov 20, 2022

Add Socks5 Proxy rust-bitcoin/rust-bitcoincore-rpc#249

Open

tcharding reviewed Nov 20, 2022

View reviewed changes

raphjaph reviewed Nov 23, 2022

View reviewed changes

apoelstra added 4 commits November 26, 2022 14:33

simple_http: store the BufReader rather than bare socket

20da231

When we create/destroy BufReaders we potentially lose data on the socket. Avoid doing this by keeping the BufReader permanently.

simple_http: add type ascriptions; fix incorrect mutexguard drop logic

c20033e

simple_http: deserialize json from the reader, don't use an intermedi…

72bea34

…ate buffer

simple_http: eliminate confused get_line function; reuse buffer

2380d90

tcharding approved these changes Nov 28, 2022

View reviewed changes

apoelstra merged commit 78cff1c into master Nov 28, 2022

apoelstra deleted the 2022-11--simple-http branch November 28, 2022 13:59

darosior mentioned this pull request Dec 7, 2022

simple_http: actually reuse sockets #80

Closed

andrewtoth mentioned this pull request Feb 12, 2023

Use REST interface for calling get_raw_transaction ordinals/ord#1636

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple HTTP cleanups and followup to #72 #76

Simple HTTP cleanups and followup to #72 #76

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

tcharding left a comment

tcharding Nov 20, 2022

apoelstra Nov 21, 2022

tcharding Nov 21, 2022

tcharding Nov 20, 2022

apoelstra Nov 21, 2022

raphjaph Nov 23, 2022

tcharding Nov 23, 2022 •

edited

raphjaph Nov 24, 2022

raphjaph commented Nov 23, 2022

apoelstra commented Nov 24, 2022

apoelstra commented Nov 26, 2022

apoelstra commented Nov 26, 2022 •

edited

tcharding left a comment

raphjaph commented Nov 28, 2022

Simple HTTP cleanups and followup to #72 #76

Simple HTTP cleanups and followup to #72 #76

Conversation

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

apoelstra commented Nov 19, 2022

tcharding left a comment

Choose a reason for hiding this comment

tcharding Nov 20, 2022

Choose a reason for hiding this comment

apoelstra Nov 21, 2022

Choose a reason for hiding this comment

tcharding Nov 21, 2022

Choose a reason for hiding this comment

tcharding Nov 20, 2022

Choose a reason for hiding this comment

apoelstra Nov 21, 2022

Choose a reason for hiding this comment

raphjaph Nov 23, 2022

Choose a reason for hiding this comment

tcharding Nov 23, 2022 • edited

Choose a reason for hiding this comment

raphjaph Nov 24, 2022

Choose a reason for hiding this comment

raphjaph commented Nov 23, 2022

apoelstra commented Nov 24, 2022

apoelstra commented Nov 26, 2022

apoelstra commented Nov 26, 2022 • edited

tcharding left a comment

Choose a reason for hiding this comment

raphjaph commented Nov 28, 2022

tcharding Nov 23, 2022 •

edited

apoelstra commented Nov 26, 2022 •

edited