New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hang when Websocket Read is interrupted #108

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@tuxcanfly
Copy link
Contributor

tuxcanfly commented Jul 3, 2014

Updated to catch the quit signal and break out of the loop. Tested that ^C shutsdown btcwallet as expected, with the same env in which it was hanging earlier.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 3, 2014

This helps when the SIGINT is received between messages but not when a message is already being read.

@jrick

This comment has been minimized.

Copy link
Member

jrick commented Jul 3, 2014

I'll need to take another look at how this is implemented by btcd, but I almost half wonder whether it would makes sense to not increment/decrement the rpcServer WaitGroup for this goroutine to receive the websocket client messages.

If that change is made, then the goroutine that reads from wsc.allRequests (which WebsocketClientRead sends to, and closes when finished) should be switched from a for-range to a for-select to include not just that channel, but also the rpcServer quit chan.

@jrick

This comment has been minimized.

Copy link
Member

jrick commented Jul 3, 2014

btcd does this the same way as in your PR, so this may be ok, but I'll let @davecgh comment on which design would be better. Personally I think the btcd design can still result in hangs if you time it right.

edit: nvm, I'm wrong. The wait group that btcd increments for the "inHandler" is specific for the client, not the whole server.

@davecgh

This comment has been minimized.

Copy link
Member

davecgh commented Jul 3, 2014

This looks correct to me.

@jrick

This comment has been minimized.

Copy link
Member

jrick commented Jul 3, 2014

Not sure how it can be. If the server has not begun shutting down (so the select falls through to the default case) and ReadMessage is called and blocks, this can block the shutdown of the entire server until the client disconnects or sends an invalid websocket frame.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 3, 2014

Yes, as I mentioned in the earlier comment, there's still an issue when the timing is right and the server gets blocked on ReadMessage.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 4, 2014

I removed WebsocketClientRead from the waitgroup and it doesn't hang on ReadMessage anymore. But again with the right timing it hangs on RequestHandler, as per the following goroutine stacktrace:

goroutine 50 [chan receive]:
github.com/conformal/btcrpcclient.FutureNotifyReceivedResult.Receive(0x18b122c0, 0x0, 0x0)
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcrpcclient/notify.go:823 +0x7e
github.com/conformal/btcrpcclient.(*Client).NotifyReceived(0x18cfc5b0, 0xb6291e60, 0x1, 0x1, 0x0, 0x0)
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcrpcclient/notify.go:901 +0x62
main.(*Account).NewAddress(0x18aee1e0, 0x0, 0x0, 0x0, 0x0)
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/account.go:619 +0x3d5
main.GetNewAddress(0xb74eeae0, 0x18e74300, 0x0, 0x0, 0x0, 0x0)
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/rpcserver.go:1490 +0x293
main.(*rpcServer).RequestHandler(0x18a7e200)
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/rpcserver.go:995 +0xc8
created by main.(*rpcServer).Start
        /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/rpcserver.go:300 +0x84

I tried a similar workaround, i.e. remove RequestHandler from the waitgroup also and it works but not sure if that's the correct way.

@jrick

This comment has been minimized.

Copy link
Member

jrick commented Jul 4, 2014

Two things:

First, that looks like a btcrpcclient hang.

Second, you modified RequestHandler from a for-select, which reads from s.requests and s.quit, to a for-range that only reads from s.requests. Don't do this. It will now never close, since the s.requests channel is not closed anywhere.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 4, 2014

OK, after removing the RequestHandler from waitgroup I saw that it never received s.quit so I removed it form the for-select. I've reverted the changeset.

I tried that because I was still facing a hang, here's the complete goroutine dump: https://gist.github.com/tuxcanfly/ad54fdab0a90306cf491

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 7, 2014

It's a hack but the last commit seems to help, I didn't find any hang on RequestHandler or FutureNotifyReceivedResult in my tests. Tried guarding the client calls with client.Disconnected in the caller itself, but it didn't help so I had to put it in NewAddress.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 7, 2014

As discussed on IRC, there's still one failing case here, although its less frequent:

https://gist.github.com/tuxcanfly/ef0bc10ddebacbfff9f9

account.go Outdated
@@ -617,8 +617,10 @@ func (a *Account) NewAddress() (btcutil.Address, error) {
if err != nil {
return nil, err
}
if err := client.NotifyReceived([]btcutil.Address{addr}); err != nil {
return nil, err
if !client.Disconnected() {

This comment has been minimized.

@jrick

jrick Jul 8, 2014

Member

I just pushed a btcrpclient fix that causes requests to error when the client has already shutdown, so this check shouldn't be necessary (and isn't really right to begin with, a disconnected client could reconnect and the request would then be reissued).

This comment has been minimized.

@tuxcanfly

tuxcanfly Jul 8, 2014

Author Contributor

Yup, it was a hack. Removed.

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 8, 2014

With the latest rpcclient, most hangs seem to have been solved. Only issue I'm seeing is:

goroutine 53 [chan send]:
main.(*rpcServer).RequestHandler(0x18b20360)
    /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/rpcserver.go:1055 +0x240
created by main.(*rpcServer).Start
    /home/tuxcanfly/Work/golang/src/github.com/conformal/btcwallet/rpcserver.go:305 +0x87

which seems to be fixed by:

- r.response <- handlerResponse{result, jsonErr}
+ select {
+ case r.response <- handlerResponse{result, jsonErr}:
+ case <-s.quit:
+ break out
+ }

I guess sending to the response chan was blocked because the receiver had already shutdown.

AcctMgr.Release()
case <-s.quit:
break out
default:

This comment has been minimized.

@jrick

jrick Jul 8, 2014

Member

I would just make this an empty default case and let it fall through, then we don't lose another indent.

// WebsocketClientGateway and WebsocketClientRead
// are synced via wsc.allRequests, so just include
// one in the waitgroup
// This is necessary to prevent a hang in ReadMessage

This comment has been minimized.

@jrick

jrick Jul 8, 2014

Member

This comment could use some love. I'd put the explanation about the multi case select in WebsocketClientGateway inside of that method, rather than here, and here just say something to the effect of:

// WebsocketClientRead is intentionally not run with the waitgroup
// so it is ignored during shutdown.  This is to prevent a hang during
// shutdown where the goroutine is blocked on a read of the
// websocket connection if the client is still connected.
go s.WebsocketClientRead(wsc)

s.wg.Add(3)
go s.WebsocketClientGateway(wsc)
go s.WebsocketClientRespond(wsc)
go s.WebsocketClientSend(wsc)
if err := json.Unmarshal(request, &raw); err != nil {
if !wsc.authenticated {
// Disconnect immediately.
for {

This comment has been minimized.

@jrick

jrick Jul 8, 2014

Member

And here I would say:

// A for-select with a read of the quit channel is used instead of a for-range
// to provide clean shutdown.  This is necessary due to WebsocketClientRead
// (which sends to the allRequests chan) not closing allRequests during shutdown
// if the remote websocket client is still connected.
for {
...

@jrick jrick closed this in 9036d36 Jul 8, 2014

@tuxcanfly tuxcanfly deleted the tuxcanfly:fix-ws-read-hang branch Jul 8, 2014

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 9, 2014

Sorry for nagging but I found at least one instance of a possible hangup. In my tests, this was rare so consider it low priority.

https://gist.github.com/tuxcanfly/629e4b9db61441ffebd6

I tried out a fix by using a goroutine to fetch the response and it seems to help:

tuxcanfly@a439399

@tuxcanfly

This comment has been minimized.

Copy link
Contributor Author

tuxcanfly commented Jul 18, 2014

jrick added a commit to jrick/btcwallet that referenced this pull request Nov 15, 2016

Add 33 byte checksum to seed.
This is only being done for compatibility with older dcrwallet
versions that refuse to decode word list seeds that are not 33 words
long when entering them from the console (the gRPC API accepts seeds
as simply a byte array so long as they are within the correct length
range).

Fixes btcsuite#108.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment