Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch goroutine leak #45

Closed
immesys opened this issue Aug 7, 2016 · 12 comments
Closed

dispatch goroutine leak #45

immesys opened this issue Aug 7, 2016 · 12 comments
Assignees
Labels

Comments

@immesys
Copy link

immesys commented Aug 7, 2016

I'm following through the getting started for rpc, and I have managed to work out most of my questions, but I have one that I can't quite work out.

Say I have some schema like

interface Database { # the main interface
  dolargequery @0 () -> (cursor :Cursor);
} 
interface Cursor {
  next @0 () -> ( data :Data );
}

And say the client calls dolargequery, the server creates some go struct that, for arguments sake, occupies a bunch of memory and implements Cursor. The server sends the cursor capability back to the client.

Now my question is how do I idiomatically control the lifetime of the cursor struct? Presumably the server is holding on to a reference to it so that when it receives calls on the capability it can execute them, but how does the client "free" the capability so that the underlying struct I passed to Cursor_ServerToClient can be freed?

I did some experiments with printing in finalizers (which is always a bit shady) but it doesn't seem that the structure I pass to Cursor_ServerToClient becomes unreferenced even after the rpc connection's Wait() returns.

How does it all work?

@immesys
Copy link
Author

immesys commented Aug 7, 2016

I have worked out why resources are not being freed, and perhaps this is a bug.

when dolargequery calls Cursor_ServerToClient, server.New spawns a goroutine for dispatch(). As long as this goroutine is alive, the original cursor struct is captured by the Method closures and remains alive. The problem is that even after the remote disconnects, and rpc.Conn.Wait() returns EOF, the dispatch goroutine remains alive, blocked in the select.

Am I doing something wrong that server.dispatch() never ends?

@immesys
Copy link
Author

immesys commented Aug 7, 2016

I have created a standalone reproducer here. Basically if you run rpc/example_test, it has the same behavior: goroutines persist after the connection is closed.

@immesys immesys changed the title question: ownership dispatch goroutine leak Aug 7, 2016
@zombiezen zombiezen added the bug label Aug 9, 2016
@zombiezen
Copy link
Contributor

Drat. The RPC package probably needs some shuffling of internals: the monitor goroutine approach is prone to these sorts of leaks. I'll try to get to this soon, but realistically it may be weeks or months.

@immesys
Copy link
Author

immesys commented Aug 9, 2016

If possible, would you mind explaining how you would approach it? (Or approach a workaround?) I wanted to use the RPC for a project, but I can't really afford to leak a goroutine for every capability.
One thing I considered was passing a context into X_ServerToClient and when that context ends, calling Cancel. It would likely be a different context to the request context, perhaps one that gets cancelled when the rpc connection terminates.

What happens if the client attempts to use a capability after it's dispatch() goroutine has ended? How do other implementations of capnproto rpc release capabilities?

@zombiezen
Copy link
Contributor

You should be able to call Close() to stop the goroutine.. is that not the case?

@zombiezen
Copy link
Contributor

zombiezen commented Aug 9, 2016

Oh okay, I think I see the issue. I haven't had a chance to sit down and reproduce this myself yet, but I'd imagine that Conn.Close and its ilk leak the currently exported capabilities and forgets to call Close() on them. The fix would be to call Close on all capabilities here: https://github.com/zombiezen/go-capnproto2/blob/master/rpc/rpc.go#L162

@zombiezen zombiezen self-assigned this Aug 12, 2016
@zombiezen
Copy link
Contributor

I've found a couple different issues here, stay tuned. I'll try to fix this over the weekend.

@immesys
Copy link
Author

immesys commented Aug 12, 2016

Awesome, much appreciated.

@zombiezen
Copy link
Contributor

Thanks so much for the great repro case, it really helped! I've made a slew of fixes around capability lifetimes and goroutine leaks. There's still more improvements that I'd like to make to the rpc package to improve resource consumption — it hasn't seen too much production usage yet AFAIK. If you find any other issues, please let me know so I can take a look.

@immesys
Copy link
Author

immesys commented Aug 15, 2016

Thanks for fixing it so fast! I love it when open source projects are so responsive. I plan to use it in a production system, and we have a lot of infrastructure for testing performance/resource problems, so I'll let you know how it goes.

@zombiezen
Copy link
Contributor

zombiezen commented Aug 30, 2016

As a follow-up here, I am experimenting with trying to reduce the number of goroutines and make the locking semantics more clear. It's stable, but it's uncovering a long tail of weird edge cases. I wish the RPC protocol wasn't so geared around a single-threaded implementation sometimes. I'll publish the branch when it's more stable.

@zombiezen
Copy link
Contributor

FYI, this work has now been merged in. I highly recommend pulling from master: all sorts of subtle issues have been resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants