Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor performance in OmmConsumerImpl.unregister(handle) #247

Closed
dlipofsky opened this issue Aug 17, 2023 · 9 comments
Closed

poor performance in OmmConsumerImpl.unregister(handle) #247

dlipofsky opened this issue Aug 17, 2023 · 9 comments

Comments

@dlipofsky
Copy link

When large number of handles are registered then calls to OmmConsumerImpl.unregister(handle) become very slow.

I traced this to WlItemHandler.removeUserRequestFromClosedStream(WlRequest) which
iterates over a LinkedList using an integer index, calling LinkedList.get(i) for each index.
Since get(i) is $O(N)$ on a LinkedList (each time it is called it has to walk down the list to
get to that index), just this step is $O(N^{2})$. If it finds a match it then calls
LinkedList.remove(i) which again has to walk down the entire list to find that index and remove it.

Calling this on every handle in a set of 379233 handles took 48 minutes!
Doing the whole list makes it $O(N^{3})$
99% of the CPU time is in WlItemHandler.removeUserRequestFromClosedStream(WlRequest).

If you use an iterator you can just walk the list just once, turning the method into $O(N)$. The performance improvement will be enormous. Basically change this

    private int removeUserRequestFromClosedStream(WlRequest wlRequest)
    {
        int ret = ReactorReturnCodes.SUCCESS;
        
        // remove from _userStreamIdListToRecover list
        for (int i = 0; i < _userStreamIdListToRecover.size(); i++)
        {
            int listStreamId = _userStreamIdListToRecover.get(i);
        
            if (listStreamId == wlRequest.requestMsg().streamId())
            {
                _userStreamIdListToRecover.remove(i);
                break;
            }
        }
        
        // remove from _statusMsgDispatchList list
        for (int i = 0; i < _statusMsgDispatchList.size(); i++)
        {
            StatusMsg statusMsg = _statusMsgDispatchList.get(i);
        
            if (statusMsg.streamId() == wlRequest.requestMsg().streamId())
            {
                _statusMsgDispatchList.remove(i);
                _statusMsgPool.add(statusMsg);
                break;
            }
        }        
    ...

to

    private int removeUserRequestFromClosedStream(WlRequest wlRequest)
    {
        int ret = ReactorReturnCodes.SUCCESS;

        // remove from _userStreamIdListToRecover list
        Iterator<Integer> iter1 = _userStreamIdListToRecover.iterator();
        while (iter1.hasNext())
        {
            int listStreamId = iter1.next();
            if (listStreamId == wlRequest.requestMsg().streamId())
            {
                iter1.remove();
                break;
            }
        }

        // remove from _statusMsgDispatchList list
        Iterator<StatusMsg> iter2 = _statusMsgDispatchList.iterator();
        while (iter2.hasNext())
        {
            StatusMsg statusMsg = iter2.next();
            if (statusMsg.streamId() == wlRequest.requestMsg().streamId())
            {
                iter2.remove();
                _statusMsgPool.add(statusMsg);
                break;
            }
        }
    ...

or in the first case you could even just do

        _userStreamIdListToRecover.remove(wlRequest.requestMsg().streamId());
    ...

and let Java do the iterate and remove for you.

There are many other of examples of this same problem in this class, and they really all should be fixed.

However rather than making these code changes it might be better to consider a different Collection type. You can use LinkedHashSet if want to preserve order and don't want duplicates (and I assume you never want duplicates of handles; the break in the original list appears to assume no duplicates) or just HashSet if you don't care about order (and it would have a better memory footprint than the linked version). Both would give $O(1)$ performance (constant time) for these operations and thus be even faster. If you really need to access by index or you do want duplicates you could use ArrayList - this would certainly be the safest and easiest change with minimal behavior difference, and you could probably fix the whole class with just a find-and-replace; the remove call for ArrayList would be slower than either HashSet implementation but still much faster than removing from a LinkedList by index (it's technically O(N) but it's a very fast O(N) because it uses arraycopy to shift the internal array; over all it would still be much better than LinkedList). ArrayList will also have the best memory footprint of any of these. I'd recommend ArrayList; LinkedList only makes sense if the dominate behavior is inserting to the middle or head of the list, or removing from the middle or head of the list, and you need to allow duplicates.

@L-Karchevska
Copy link
Contributor

@dlipofsky Thank you for your performance improvement suggestions! We will look into them as to what extent they might affect the logic and make improvements in one of the future releases.

@dlipofsky
Copy link
Author

More data: a single call to OmmConsumerImpl.unregister(handle) can take over 6 minutes if there are 357000 registered RICs and the one you unregister is the last one added.

@L-Karchevska
Copy link
Contributor

@dlipofsky Thanks for more information!

@dlipofsky
Copy link
Author

I was able to compile a few alternatives and test it myself.
My first suggested fix (replacing the indexed loop with an iterator) is not a full solution.
In the case of a single call to unregister the last of 357000 registered RICs the
time was reduced from 6 minutes to 50 ms. Sounds great, right?
But 50 ms is still slow, and unregistering all 357000 handles still took 31 minutes.
In the case of unregistering them all, I was already doing it in the optimal
order (starting from what would be at the beginning of the linked list)
and so it was not doing that much indexed iterating.

The ArrayList solution won't work everywhere. It works for _userStreamIdListToRecover but not for _statusMsgDispatchList because in that case there are method calls that are specific to LinkedList.

I also noticed some of these lists are empty in my use case

  • _userStreamIdListToRecover size=0
  • _statusMsgDispatchList size=269611
  • _pendingRequestByIdTable size=0
  • _pendingRequestByNameTable size=1 with inner list size=269611

And 269611 is a significant difference from the 356278 open handles I have (these are RICs I registered and did not receive a StreamState CLOSED message).

I noticed a lot of pooling. Pooling is generally a bad idea (especially since Java 1.5) unless the objects are expensive to create. Modern generational GC expects mosts objects to be short lived, and so pooling intefers with it operating at its most efficient. You spend a lot of time adding and removing objects from the pool. It introduces the risk of a memory leak. You add risk of bugs if objects are not properly cleared, and add extra time to clear the objects. In a multi-threaded app you add a synchronization bottleneck and a risk of deadlock, or a race condition if not properly synchronized. It's better to let Java manage the memory for you. It would probably be much faster if you strip out all the pooling.

@L-Karchevska
Copy link
Contributor

@dlipofsky Thank you for more detailed analysis and suggestions! We will consider them in our own investigation. Regarding pooling, the objects that are pooled, say, on the ValueAdd level turn out to be expensive to create on the fly indeed.

@dlipofsky
Copy link
Author

I see for example the pooling of LinkedList (line 88 of WlItemHandler: the outer LinkedList is the pool and the inner one is objects being pooled)

LinkedList<LinkedList<WlRequest>> _pendingRequestListPool = new LinkedList<LinkedList<WlRequest>>();

LinkedList is definitely not something that should be pooled.

I see pooling on WlRequest, whose constructor looks like just setting simple fields.

and on StatusMsg on line 110; this the the one that attracted my attention in the profiler as using significant time adding it back to the pool

LinkedList<StatusMsg> _statusMsgPool = new LinkedList<StatusMsg>();

It looks like pooling was the default design choice throughout the SDK, and really it should be used only when profiling indicates it is necessary.

@vlevendel
Copy link
Contributor

This is included with tag, Real-Time-SDK-2.1.3.L1. Please let us know if you have further questions.

@dlipofsky
Copy link
Author

@vlevendel , We are currently using com.refinitiv.ema:ema:3.7.1.0 and it looks like the latest version is 3.7.3.0. Is there fix in there? How does it relate to the SDK version you mentioned? Thank you.

@vlevendel
Copy link
Contributor

vlevendel commented Jan 26, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants