Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NettyResponseFuture never completes because netty provider uses closed channel #415

Closed
hvesalai opened this issue Nov 8, 2013 · 7 comments
Assignees
Milestone

Comments

@hvesalai
Copy link

hvesalai commented Nov 8, 2013

Using version 1.7.21, a NettyResponseFuture never gets completed because the provider uses a channel that has already been closed and handled by closeChannel(...).

If I have read the source properly, what we have here is a classical race condition:

  1. [in client thread] doConnect(...) is called and a cached channel is selected.

    At this point, the attachment connected to the channel is still DiscardEvent

    NettyAsyncHttpProvider.java#L913

    doConnect(...) checks that the connection isOpen(), which it still is.

    NettyAsyncHttpProvider.java#L923

  2. [in I/O thread] The channel is closed by the remote end and channelClosed(...) is called. Since the attachment is DiscardEvent, nothing is done.

    NettyAsyncHttpProvider.java#L1341

  3. [in client thread] doConnect(...) continues using the now closed channel by attaching a NettyResponseFuture to the channel.

    NettyAsyncHttpProvider.java#L937

  4. [in client thread] doConnect(...) calls writeRequest(...) which checks if the channel is closed, which it is.

    NettyAsyncHttpProvider.java#L447

    However, writeRequest(...) does nothing about the fact, relying on channelClosed(...) to handle the situation. But, as channelClosed(...) has already been executed at step 2, the abort(...) or done(...) methods will never get called, resulting in that the listeners of the future will never get notified.

@javateck
Copy link

good analysis, in my test, I have thread pool and latch in my stress client side, callback will simply do latch.countDown(), and I constantly see my threads stuck, meaning some callbacks never being called. Wondering how this basic is missed.

@slandelle
Copy link
Contributor

Once again, please share your test case, as you claim to have one where you can easily reproduce.

@hvesalai
Copy link
Author

@slandelle, are you asking for a test case to reproduce a race condition? Have you read the analysis, what do you think about it?

@slandelle
Copy link
Contributor

@hvesalai Read briefly, but hadn't dug in yet. But that's typically the kind of stuff that's easier to fix if you have a test case to reproduce. @javateck claims he has a very obvious one, so I'd like to get my hands on it...

@ghost ghost assigned slandelle Jan 18, 2014
@slandelle
Copy link
Contributor

@hvesalai What a mess! Thanks for your analysis. I'm afraid I'll only be able to come up with an ugly fix for AHC 1. Will try to come up with something better for AHC 2.

@slandelle
Copy link
Contributor

I've just pushed an imperfect fix on 1.7.X branch.

Limitations are:

  • there's always a time window between the moment a Channel is fetched from the pool and the moment a future is attached so that channelClosed can handle it
  • Netty HttpRequest is built again if retry fails and we finally have to go with a new Channel
  • it's ugly

Will first port the logic as is on AHC2, but we'll try to come up with something better (maybe use Netty custom events).

@slandelle
Copy link
Contributor

Still not perfect, but chances are much much slower now (just a few simple instructions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants