Skip to content

Changed type of connCancelMap, removed timeout read conn#40364

Open
Zyqsempai wants to merge 1 commit intocockroachdb:masterfrom
Zyqsempai:25585-use-close-read-instead-of-read-timeout
Open

Changed type of connCancelMap, removed timeout read conn#40364
Zyqsempai wants to merge 1 commit intocockroachdb:masterfrom
Zyqsempai:25585-use-close-read-instead-of-read-timeout

Conversation

@Zyqsempai
Copy link
Copy Markdown

No description provided.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Aug 30, 2019

CLA assistant check
All committers have signed the CLA.

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Hi, can you please check if i am on a right way?

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work getting this done. I definitely think you're on the right track. Since we're in the release stability period, I would prefer to hold off on merging this into master until the release branch is cut.

Reviewed 2 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 793 at r1 (raw file):

// TestReadTimeoutConn asserts that a readTimeoutConn performs reads normally
// and exits with an appropriate error when exit conditions are satisfied.
func TestReadTimeoutConnExits(t *testing.T) {

The comment and name of this test probably need some modification.


pkg/sql/pgwire/server.go, line 129 at r1 (raw file):

)

type closeableReadConn struct {

I don't think we need a new type here.


pkg/sql/pgwire/server.go, line 135 at r1 (raw file):

// cancelChanMap keeps track of channels that are closed after the associated
// cancellation function has been called and the cancellation has taken place.
type cancelChanMap map[chan struct{}]*closeableReadConn

Rather than have the value be the actual connection, I would keep it as a function (not necessarily a context.CancelFunc, but could be a context.CancelFunc that calls CloseRead as well as ctxCancel created in ServeConn)


pkg/sql/pgwire/server.go, line 423 at r1 (raw file):

	draining := s.mu.draining
	if !draining {
		ctx, _ = contextutil.WithCancel(ctx)

nit: There's no need to derive a cancellable context if we're not going to use a cancel function.


pkg/sql/pgwire/server.go, line 425 at r1 (raw file):

		ctx, _ = contextutil.WithCancel(ctx)
		done := make(chan struct{})
		tcpConn := conn.(*closeableReadConn)

This will panic if the connection is not a closeableReadConn. I'm also not sure I see where a closeableReadConn is initialized. This might also not work with tls.Conns. I think we're going to have to do some manual introspection and unfortunately keep around the readTimeoutConn in the odd case that we have a connection type that doesn't handle CloseRead (although is that likely? Seems like we just need to care about tcp and tls transitively)

@Zyqsempai
Copy link
Copy Markdown
Author

pkg/sql/pgwire/server.go, line 135 at r1 (raw file):

// cancelChanMap keeps track of channels that are closed after the associated
// cancellation function has been called and the cancellation has taken place.
type cancelChanMap map[chan struct{}]*closeableReadConn

Rather than have the value be the actual connection, I would keep it as a function (not necessarily a context.CancelFunc, but could be a context.CancelFunc that calls CloseRead as well as ctxCancel created in ServeConn)

@asubiotto Thanks for you comment.
Related to this one, I have only one question, to make closeRead, we should operate the instance of the connection, that's why we cannot do that with just an abstract function, how do you see that should be implemented?

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Zyqsempai)


pkg/sql/pgwire/server.go, line 426 at r1 (raw file):

		done := make(chan struct{})
		tcpConn := conn.(*closeableReadConn)
		s.mu.connCancelMap[done] = tcpConn

Regarding your comment, we could keep the context.CancelFunc type and this here would be:

s.mu.connCancelMap[done] = func() {
    cancel()
    tcpConn.closeRead()
}

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Hey, I have one more, and I hope last question.
It's about casting tls.Conn to net.TCPConn or net.Conn, is there some best practices or alredy exists solutions?
Coz, the underlying conn in tls is private, and I see no way to get it.

@asubiotto
Copy link
Copy Markdown
Contributor

asubiotto commented Sep 16, 2019

That's a good point. Looks like CloseRead is explicitly not supported: golang/go#8579. I think we'll have to keep the readTimeoutConn for tls.Conns around except for when we know we have a TCPConn.

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto PTAL

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 793 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

The comment and name of this test probably need some modification.

I think this still needs to happen, but now we should test both the correct behavior of a readTimeoutConn (as the test used to do) as well as a net.TCPConn. Also, this test should fail (timeout) as is, right?


pkg/sql/pgwire/server.go, line 426 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Regarding your comment, we could keep the context.CancelFunc type and this here would be:

s.mu.connCancelMap[done] = func() {
    cancel()
    tcpConn.closeRead()
}

Once you feel like you've addressed a reviewer's comment, you can comment Done or click the Acknowledge button to mark these comments as resolved.


pkg/sql/pgwire/server.go, line 392 at r2 (raw file):

			return true
		}
		for _, closeFunc := range connCancelMap {

nit: I think it's fine to keep the cancel variable name. We should at least keep the comment


pkg/sql/pgwire/server.go, line 424 at r2 (raw file):

		default:
			var cancel context.CancelFunc
			ctx, cancel = contextutil.WithCancel(ctx)

I think it doesn't hurt to derive this cancel function above the switch statement and also call cancel in the case of *net.TCPConn

@asubiotto
Copy link
Copy Markdown
Contributor

@Zyqsempai when making changes that should be in an existing commit in a PR, please amend the existing commit and force push rather than adding a new one, since this makes it easier for the reviewer (reviewable handles amended commits well).

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 1e9012c to 94d0df8 Compare September 17, 2019 13:19
Copy link
Copy Markdown
Author

@Zyqsempai Zyqsempai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/pgwire/conn_test.go, line 793 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think this still needs to happen, but now we should test both the correct behavior of a readTimeoutConn (as the test used to do) as well as a net.TCPConn. Also, this test should fail (timeout) as is, right?

Agree, do you have any idea how to test TCP properly?


pkg/sql/pgwire/server.go, line 129 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I don't think we need a new type here.

Done.


pkg/sql/pgwire/server.go, line 135 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Rather than have the value be the actual connection, I would keep it as a function (not necessarily a context.CancelFunc, but could be a context.CancelFunc that calls CloseRead as well as ctxCancel created in ServeConn)

Done.


pkg/sql/pgwire/server.go, line 425 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

This will panic if the connection is not a closeableReadConn. I'm also not sure I see where a closeableReadConn is initialized. This might also not work with tls.Conns. I think we're going to have to do some manual introspection and unfortunately keep around the readTimeoutConn in the odd case that we have a connection type that doesn't handle CloseRead (although is that likely? Seems like we just need to care about tcp and tls transitively)

Done.


pkg/sql/pgwire/server.go, line 426 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Once you feel like you've addressed a reviewer's comment, you can comment Done or click the Acknowledge button to mark these comments as resolved.

Done.


pkg/sql/pgwire/server.go, line 424 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think it doesn't hurt to derive this cancel function above the switch statement and also call cancel in the case of *net.TCPConn

Done.

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 793 at r1 (raw file):

Previously, Zyqsempai (Boris Popovschi) wrote…

Agree, do you have any idea how to test TCP properly?

Nothing should change, the difference is that cancel() should not have any effect, so we should CloseRead() instead and make sure that the test completes as expected.


pkg/sql/pgwire/conn_test.go, line 853 at r3 (raw file):

	default:
	}
	cancel()

I would insert a small sleep before canceling here (maybe one millisecond), to be relatively certain the connection is performing the second read.

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Soo, it looks like our approach is not working, it seems that CloseRead works not as expected.
https://gist.github.com/Zyqsempai/999482dbdc70fb66aabd3cb74a835aa4
I created this separated test, i tried a lot of variants, but it's doesn't work anyway, CloseRead doesn't interrupt existing Read i have no idea why, please, check out this gist maybe i miss something.

@asubiotto
Copy link
Copy Markdown
Contributor

@Zyqsempai, I think it's because you're calling CloseRead on the client, rather than the server (i.e. the goroutine). The read timeout conn test works fine because the same ctx is captured by the goroutine, but when you call net.Dial from the main goroutine, that's a different connection instance than the one obtained on the reading side through ln.Accept. I would recommend passing the connection back to the main goroutine through a channel and calling CloseRead on that.

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 94d0df8 to 70ec504 Compare September 23, 2019 15:15
@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Nice catch, thanks, looks like it's finally works.

@Zyqsempai Zyqsempai marked this pull request as ready for review September 23, 2019 15:16
@Zyqsempai Zyqsempai changed the title [WIP] Changed type of connCancelMap, removed timeout read conn Changed type of connCancelMap, removed timeout read conn Sep 25, 2019
@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto PTAL

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Any updates here?

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay! :lgtm: minus the small testing nit

Reviewed 1 of 2 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 796 at r4 (raw file):

	defer leaktest.AfterTest(t)()
	// Cannot use net.Pipe because deadlines are not supported.
	ln, err := net.Listen("tcp", util.TestAddr.String())

nit: I would keep these as util.TestAddr.Network in this test (but keep "tcp" in the TCP test)

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 70ec504 to 82d0a6f Compare November 4, 2019 16:59
@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Great, thank you. Done

@asubiotto
Copy link
Copy Markdown
Contributor

@Zyqsempai looks like there's a race in TestConnClose (make testrace PKG=./pkg/sql/pgwire TESTS=TestConnClose) and some lint failures (make lint) that need to be fixed. Could you also update the commit message to remove WIP, prefix with the package name and add a brief description?

@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Sure

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch 2 times, most recently from a1a86d5 to 32b3ebb Compare November 6, 2019 16:33
@Zyqsempai
Copy link
Copy Markdown
Author

@asubiotto Looks like finally Done, PTAL!

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just some final comments about errors

Reviewed 2 of 2 files at r5.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

	case conn := <-connChan:
		tcpConn := conn.(*net.TCPConn)
		_ = tcpConn.CloseRead()

Let's check this error and Fatal if != nil


pkg/sql/pgwire/server.go, line 424 at r5 (raw file):

		case *net.TCPConn:
			s.mu.connCancelMap[done] = func() {
				_ = c.CloseRead()

I think we should at least log this error (as a warning)

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @Zyqsempai)


pkg/sql/pgwire/server.go, line 424 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I think we should at least log this error (as a warning)

log.Warningf(...)

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 32b3ebb to 670bc0b Compare November 11, 2019 17:22
Copy link
Copy Markdown
Author

@Zyqsempai Zyqsempai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)


pkg/sql/pgwire/conn_test.go, line 793 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Nothing should change, the difference is that cancel() should not have any effect, so we should CloseRead() instead and make sure that the test completes as expected.

Done.


pkg/sql/pgwire/conn_test.go, line 853 at r3 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I would insert a small sleep before canceling here (maybe one millisecond), to be relatively certain the connection is performing the second read.

Done.


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Let's check this error and Fatal if != nil

Done


pkg/sql/pgwire/server.go, line 424 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

log.Warningf(...)

Done

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 853 at r3 (raw file):

Previously, Zyqsempai (Boris Popovschi) wrote…

Done.

I forgot about this, but it doesn't look like the sleep was added to the tcp test. I would time.Sleep(500 * time.Microsecond) before calling CloseRead.


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

Previously, Zyqsempai (Boris Popovschi) wrote…

Done

I don't think this is right, I meant something like:

if err := tcpConn.CloseRead(); err != nil {
    t.Fatalf(...)
}

We should also keep the read of the errChan and the error check that was removed.


pkg/sql/pgwire/server.go, line 424 at r5 (raw file):

Previously, Zyqsempai (Boris Popovschi) wrote…

Done

nit: you can use a statement like:

if err := c.CloseRead(); err != nil {
    log.Warningf(ctx, "...")
}

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 670bc0b to 0778abe Compare November 12, 2019 07:34
Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I don't think this is right, I meant something like:

if err := tcpConn.CloseRead(); err != nil {
    t.Fatalf(...)
}

We should also keep the read of the errChan and the error check that was removed.

One last thing: we need to add

	if err := <-errChan; err != <expected error> {
		t.Fatalf("unexpected error: %v", err)
	}

Just to assert that the error is what we expect after this switch case

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)

Copy link
Copy Markdown
Author

@Zyqsempai Zyqsempai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

One last thing: we need to add

	if err := <-errChan; err != <expected error> {
		t.Fatalf("unexpected error: %v", err)
	}

Just to assert that the error is what we expect after this switch case

Sorry, I lost the idea, which error do we expect and where?

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)


pkg/sql/pgwire/conn_test.go, line 916 at r5 (raw file):

Previously, Zyqsempai (Boris Popovschi) wrote…

Sorry, I lost the idea, which error do we expect and where?

After the tpcConn is CloseRead, it will send back any err it got from Read on the errChan. It would be good to assert that it's what we expect (not sure what that is yet though, probably io.EOF?)

@Zyqsempai Zyqsempai force-pushed the 25585-use-close-read-instead-of-read-timeout branch from 0778abe to 3ad3bbf Compare November 14, 2019 15:49
Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure not to create a newReadTimeoutConn in serveImpl, and just use the naked TCPConn if that's the underlying type, otherwise we won't benefit from the ability to use CloseRead (we'll still be waking up every now and then).

Reviewed 1 of 1 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto and @Zyqsempai)

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-noremind Bots won't notify about PRs with X-noremind

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants