Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

THRIFT-5324: reset http client buffer after flush #2293

Merged
merged 1 commit into from Dec 15, 2020

Conversation

i512
Copy link
Contributor

@i512 i512 commented Dec 14, 2020

Client: go
THttpClient did not reset it's internal buffer when http client returned
an error, leaving the whole or partially read message in the buffer.
Now we reset the buffer in defer.

Details: https://issues.apache.org/jira/browse/THRIFT-5324

@dcelasun
Copy link
Member

I've just discovered the same bug today and fixed it in my fork :) Your commit includes a test, so I'll merge yours.

@@ -197,6 +197,10 @@ func (p *THttpClient) Flush(ctx context.Context) error {
// Close any previous response body to avoid leaking connections.
p.closeResponse()

// Request might not have been fully read by http client.
// Reset so we don't send the remains on next call.
defer p.requestBuffer.Reset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a nil check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks!

@i512 i512 force-pushed the THRIFT-5324 branch 3 times, most recently from 97a6725 to 1ad9d79 Compare December 14, 2020 22:00
@i512 i512 requested a review from dcelasun December 14, 2020 22:02
Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some nitpicking comments in the test code. None of them is blocking.

}
trans, err := NewTHttpPostClient("http://" + addr.String())
if err != nil {
t.Fatalf("Unable to connect to %s: %s", addr.String(), err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nitpicking and also a problem already exists in the old test code, but using %s on errors is the same as using %v on errors except that %s handles it much more poorer when error is nil (https://play.golang.org/p/yHovIp3m37B). I'd rather just use %v with errors instead.


_, err = trans.Write(write1)
if err != nil {
t.Fatalf("Failed to write to transport: %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the comment above, instead of using %s with err.Error(), just use %v with err.

cancel()
err = trans.Flush(ctx)
if err == nil {
t.Fatalf("Expected flush error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no formatting verbs so this can be t.Fatal instead of t.Fatalf.

if !bytes.Equal(data, write2) {
t.Fatalf("Received unexpected data: %s", data)
}
trans.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to right after you checked for error on Line 121 as defer trans.Close(). You have a lot of t.Fatalf that could exit the test early without closing the opened trans (this is test code so this is nitpicking).


data = data[:n]
if !bytes.Equal(data, write2) {
t.Fatalf("Received unexpected data: %s", data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions here are:

  1. Use %q instead of %s (sometimes when the test failed it's because you read zero data, with %s the error message will show as Received unexpected data: , while with %q it shows as Received unexpected data: "" instead, which makes it much more obvious that the failure is due to empty string).
  2. Also print out what's the expected data in the failure message (e.g. t.Fatalf("Received data expected %q, got %q", write2, data))

Client: go
THttpClient did not reset it's internal buffer when http client returned
an error, leaving the whole or partially read message in the buffer.
Now we reset the buffer in defer.
@dcelasun dcelasun merged commit 4461728 into apache:master Dec 15, 2020
@dcelasun
Copy link
Member

Merged, thanks for the PR!

@c0va23 c0va23 deleted the THRIFT-5324 branch December 15, 2020 08:39
@fishy
Copy link
Member

fishy commented Dec 15, 2020

@NIHERASE I think the added test is flaky. If you scroll to the bottom of https://travis-ci.org/github/apache/thrift/jobs/749896929, there are a lot of data races caused by this added test, and also the test failed (looks like the failure is caused by the data race, but I'm not 100% sure),

@fishy
Copy link
Member

fishy commented Dec 16, 2020

Saw the same test failure on one of the previous travis run on this PR (not on the latest version) as well: https://travis-ci.org/github/apache/thrift/jobs/749705982

@fishy
Copy link
Member

fishy commented Dec 16, 2020

OK I cannot reproduce this issue with go1.15.6 locally. Looking through travis log we were using go 1.10.8 to run that test, and I do can reproduce this issue with go1.10.8 almost 100% locally.

My current understanding is that in the test we are relying on canceled context to make the http request fail:

This works fine in go1.15.6. In go1.10.8 although it did make the Flush call next line fail, it doesn't really cause the write loop of the http client to stop correctly. So after you reset the requestBuffer inside Flush, the write loop of the http client still tries to read from requestBuffer in order to write the request, and that caused both data race and the test failure.

I think the big, underlying problem is that when making the http request inside Flush, we passed the requestBuffer to the http request (

req, err := http.NewRequest("POST", p.url.String(), p.requestBuffer)
). This doesn't make a copy of the buffer anywhere, so the ownership of the buffer becomes muddy: THttpClient thinks it owns the data and can call its Reset, while the http request also thinks it owns the data and it's the only one reading/writing from it.

I think the proper fix would be that we need to make a copy of the buffer before sending it to http request. It will hurt performance, though. @dcelasun what do you think?

fishy added a commit to fishy/thrift that referenced this pull request Dec 16, 2020
Client: go

The fix in apache#2293 doesn't work for
go1.10.8 due to the possibility of data races. This exposes a bigger,
underlying issue regarding the ownership of the request buffer in
THttpClient between THttpClient itself and the http request it creates.
Instead of reset and reuse the same buffer, always give up the ownership
of it and create a new buffer after each Flush call.
fishy added a commit to fishy/thrift that referenced this pull request Dec 16, 2020
Client: go

The fix in apache#2293 doesn't work for
go1.10.8 due to the possibility of data races. This exposes a bigger,
underlying issue regarding the ownership of the request buffer in
THttpClient between THttpClient itself and the http request it creates.
Instead of reset and reuse the same buffer, always give up the ownership
of it and create a new buffer after each Flush call.
@dcelasun
Copy link
Member

Replied on the other PR.

fishy added a commit that referenced this pull request Dec 16, 2020
Client: go

The fix in #2293 doesn't work for
go1.10.8 due to the possibility of data races. This exposes a bigger,
underlying issue regarding the ownership of the request buffer in
THttpClient between THttpClient itself and the http request it creates.
Instead of reset and reuse the same buffer, always give up the ownership
of it and create a new buffer after each Flush call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants