Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteBuf Memory Leak #466

Closed
dfjones opened this issue Jan 18, 2016 · 7 comments
Closed

ByteBuf Memory Leak #466

dfjones opened this issue Jan 18, 2016 · 7 comments
Labels
Milestone

Comments

@dfjones
Copy link

dfjones commented Jan 18, 2016

In production, I found indications of memory leaks and I believe I have traced it down to the following situation.

In a project using the RxNetty 0.4.x client through Netflix Ribbon/Hystrix, on a box with high CPU contention, it appears that memory can leak in the case where the Hystrix timeout fires before the client can deliver the ByteBuf to the subscriber. I believe this is a sort of race condition in a CPU starved environment. Imagine a race between the Hystrix timer thread and the RxNetty I/O thread, if the timer thread wins, eventually the RxNetty thread gets CPU time and produces a ByteBuf that is never consumed.

I noticed that there are safeguards in the form of timeouts in UnicastContentSubject, however I believe there is a bug in establishing these timeouts when a new pooled connection is created.

I was able to reproduce this issue locally in a test that reproduces the situation described above. I've augmented the latest 0.4.x RxNetty code to add additional logging/debugging information. I'm posting what I have here, starting with the ByteBuf leak report that is attached:
bytebuf-mem-leak-report.txt

The last touch of the ByteBuf comes from onNext in an augmented UnicastContentSubject shown here:

@Override
    public void onNext(final T t) {
        state.lastMem = memAddress(t);
        LOG.info("onNext called state={} buf={}:{}", state, t.toString(), state.lastMem, new Exception("trace"));
        // Retain so that post-buffer, the ByteBuf does not get released.
        // Release will be done after reading from the subject.
        ReferenceCountUtil.retain(t);
        BufferUntilSubscriber<T> bufferedSubject = (BufferUntilSubscriber<T>) state.bufferedObserver;
        touch(t, "bufferedObserver is next and bufferedObserver.hasObservers()= "
                 + bufferedSubject.hasObservers()
                 + " and UnicastContentSubject = " + this
                 + " and this.state.state = " + state.state
                 + " and state.timeoutScheduled = " + state.timeoutScheduled
                 + " and noTimeout = " + noTimeout
                 + " and state.releaseSubscription = " + state.releaseSubscription
                 + " and buf =" + t.toString() + ":" + state.lastMem
                 + " and state =" + state.toString());
        state.bufferedObserver.onNext(t);

        // Schedule timeout once and when not subscribed yet.
        if (state.casTimeoutScheduled() && state.state == State.STATES.UNSUBSCRIBED.ordinal()) {
            LOG.info("scheduling release state={} buf={}:{}", state, t, state.lastMem, new Exception("trace"));
            // Schedule timeout after the first content arrives.
            state.setReleaseSubscription(timeoutScheduler.subscribe(new Action1<Long>() {
                @Override
                public void call(Long aLong) {
                    LOG.info("timeout for state={} buf={}:{}", state, t, memAddress(t));
                    disposeIfNotSubscribed();
                }
            }));
        }
    }

Using this information, I'm including some log entries that hopefully will contain some useful trace information (some log statements along the way, I've added). See the attached log file:

rxnetty-mem-leak-trace-3-filtered.txt

If there is any more information I can provide, please let me know. I'm also willing to contribute a PR to fix this issue, but I'm at a loss of what the "right" fix should be. Always creating UnicastContentSubject with a subscription timeout seems to fix the issue, but I'm guessing that is undesirable. I haven't been able to figure out why the addition of the timeout in RequestProcessingOperator isn't being called in this situation, but my hunch is that the actual fix lies in there.

@NiteshKant
Copy link
Member

@dfjones thanks for the report. You mentioned that you were able to repro this with a test, do you mind sharing the test?

@dfjones
Copy link
Author

dfjones commented Jan 18, 2016

@NiteshKant I can't share that test because it relies on too much internal stuff. However, I will try to create an equivalent reproduction of the issue using just RxNetty components. I'll let you know if I'm able to do this or not.

@NiteshKant
Copy link
Member

Awesome, thanks!

@dfjones
Copy link
Author

dfjones commented Jan 19, 2016

@NiteshKant Good news, I was able to reproduce the memory leak with a test using just RxNetty. I'll link the modified files here. If it is helpful, the full branch with all the extra debug logging/trace info is here: https://github.com/Squarespace/RxNetty/tree/memleak-debug

I created my test in the rxnetty-examples project. I used a modified HelloWorldServer because it appears the response needs to be sent with chunked transfer encoding.

ByteBufMemLeakTest.java
HelloWorldServer.java

@dfjones
Copy link
Author

dfjones commented Jan 19, 2016

Forgot to attach a leak report from this new test:
rxnetty-leak-report.txt

dfjones added a commit to Squarespace/RxNetty that referenced this issue Jan 20, 2016
…nto the ClientRequestResponseConverter so that it can create a UnicastContentSubject with a timeout. This fix addresses issue ReactiveX#466
@dfjones
Copy link
Author

dfjones commented Jan 20, 2016

@NiteshKant I put up the PR linked above. With this patch in place, the test I've shared here no longer produces a memory leak after running for a few minutes. Reading through the code...I couldn't see a reason not to set a timeout on creation of UnicastContentSubject if the client was configured to use one. Please let me know your thoughts. Happy to revise this as necessary. Thanks!

@NiteshKant NiteshKant added the bug label Jan 20, 2016
NiteshKant added a commit that referenced this issue Jan 20, 2016
Potential Fix for Memory Leak Issue #466
@NiteshKant NiteshKant added this to the 0.4.16 milestone Jan 20, 2016
@NiteshKant
Copy link
Member

Fixed via #468

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants