poor performance in OmmConsumerImpl.unregister(handle) #247

dlipofsky · 2023-08-17T14:25:53Z

When large number of handles are registered then calls to OmmConsumerImpl.unregister(handle) become very slow.

I traced this to WlItemHandler.removeUserRequestFromClosedStream(WlRequest) which
iterates over a LinkedList using an integer index, calling LinkedList.get(i) for each index.
Since get(i) is $O(N)$ on a LinkedList (each time it is called it has to walk down the list to
get to that index), just this step is $O(N^{2})$. If it finds a match it then calls
LinkedList.remove(i) which again has to walk down the entire list to find that index and remove it.

Calling this on every handle in a set of 379233 handles took 48 minutes!
Doing the whole list makes it $O(N^{3})$
99% of the CPU time is in WlItemHandler.removeUserRequestFromClosedStream(WlRequest).

If you use an iterator you can just walk the list just once, turning the method into $O(N)$. The performance improvement will be enormous. Basically change this

    private int removeUserRequestFromClosedStream(WlRequest wlRequest)
    {
        int ret = ReactorReturnCodes.SUCCESS;
        
        // remove from _userStreamIdListToRecover list
        for (int i = 0; i < _userStreamIdListToRecover.size(); i++)
        {
            int listStreamId = _userStreamIdListToRecover.get(i);
        
            if (listStreamId == wlRequest.requestMsg().streamId())
            {
                _userStreamIdListToRecover.remove(i);
                break;
            }
        }
        
        // remove from _statusMsgDispatchList list
        for (int i = 0; i < _statusMsgDispatchList.size(); i++)
        {
            StatusMsg statusMsg = _statusMsgDispatchList.get(i);
        
            if (statusMsg.streamId() == wlRequest.requestMsg().streamId())
            {
                _statusMsgDispatchList.remove(i);
                _statusMsgPool.add(statusMsg);
                break;
            }
        }        
    ...

to

    private int removeUserRequestFromClosedStream(WlRequest wlRequest)
    {
        int ret = ReactorReturnCodes.SUCCESS;

        // remove from _userStreamIdListToRecover list
        Iterator<Integer> iter1 = _userStreamIdListToRecover.iterator();
        while (iter1.hasNext())
        {
            int listStreamId = iter1.next();
            if (listStreamId == wlRequest.requestMsg().streamId())
            {
                iter1.remove();
                break;
            }
        }

        // remove from _statusMsgDispatchList list
        Iterator<StatusMsg> iter2 = _statusMsgDispatchList.iterator();
        while (iter2.hasNext())
        {
            StatusMsg statusMsg = iter2.next();
            if (statusMsg.streamId() == wlRequest.requestMsg().streamId())
            {
                iter2.remove();
                _statusMsgPool.add(statusMsg);
                break;
            }
        }
    ...

or in the first case you could even just do

        _userStreamIdListToRecover.remove(wlRequest.requestMsg().streamId());
    ...

and let Java do the iterate and remove for you.

There are many other of examples of this same problem in this class, and they really all should be fixed.

However rather than making these code changes it might be better to consider a different Collection type. You can use LinkedHashSet if want to preserve order and don't want duplicates (and I assume you never want duplicates of handles; the break in the original list appears to assume no duplicates) or just HashSet if you don't care about order (and it would have a better memory footprint than the linked version). Both would give $O(1)$ performance (constant time) for these operations and thus be even faster. If you really need to access by index or you do want duplicates you could use ArrayList - this would certainly be the safest and easiest change with minimal behavior difference, and you could probably fix the whole class with just a find-and-replace; the remove call for ArrayList would be slower than either HashSet implementation but still much faster than removing from a LinkedList by index (it's technically O(N) but it's a very fast O(N) because it uses arraycopy to shift the internal array; over all it would still be much better than LinkedList). ArrayList will also have the best memory footprint of any of these. I'd recommend ArrayList; LinkedList only makes sense if the dominate behavior is inserting to the middle or head of the list, or removing from the middle or head of the list, and you need to allow duplicates.

The text was updated successfully, but these errors were encountered:

L-Karchevska · 2023-08-17T15:19:34Z

@dlipofsky Thank you for your performance improvement suggestions! We will look into them as to what extent they might affect the logic and make improvements in one of the future releases.

dlipofsky · 2023-08-17T22:33:33Z

More data: a single call to OmmConsumerImpl.unregister(handle) can take over 6 minutes if there are 357000 registered RICs and the one you unregister is the last one added.

L-Karchevska · 2023-08-18T09:26:26Z

@dlipofsky Thanks for more information!

dlipofsky · 2023-08-24T19:12:41Z

I was able to compile a few alternatives and test it myself.
My first suggested fix (replacing the indexed loop with an iterator) is not a full solution.
In the case of a single call to unregister the last of 357000 registered RICs the
time was reduced from 6 minutes to 50 ms. Sounds great, right?
But 50 ms is still slow, and unregistering all 357000 handles still took 31 minutes.
In the case of unregistering them all, I was already doing it in the optimal
order (starting from what would be at the beginning of the linked list)
and so it was not doing that much indexed iterating.

The ArrayList solution won't work everywhere. It works for _userStreamIdListToRecover but not for _statusMsgDispatchList because in that case there are method calls that are specific to LinkedList.

I also noticed some of these lists are empty in my use case

_userStreamIdListToRecover size=0
_statusMsgDispatchList size=269611
_pendingRequestByIdTable size=0
_pendingRequestByNameTable size=1 with inner list size=269611

And 269611 is a significant difference from the 356278 open handles I have (these are RICs I registered and did not receive a StreamState CLOSED message).

I noticed a lot of pooling. Pooling is generally a bad idea (especially since Java 1.5) unless the objects are expensive to create. Modern generational GC expects mosts objects to be short lived, and so pooling intefers with it operating at its most efficient. You spend a lot of time adding and removing objects from the pool. It introduces the risk of a memory leak. You add risk of bugs if objects are not properly cleared, and add extra time to clear the objects. In a multi-threaded app you add a synchronization bottleneck and a risk of deadlock, or a race condition if not properly synchronized. It's better to let Java manage the memory for you. It would probably be much faster if you strip out all the pooling.

L-Karchevska · 2023-08-25T09:50:44Z

@dlipofsky Thank you for more detailed analysis and suggestions! We will consider them in our own investigation. Regarding pooling, the objects that are pooled, say, on the ValueAdd level turn out to be expensive to create on the fly indeed.

dlipofsky · 2023-08-25T13:33:10Z

I see for example the pooling of LinkedList (line 88 of WlItemHandler: the outer LinkedList is the pool and the inner one is objects being pooled)

LinkedList<LinkedList<WlRequest>> _pendingRequestListPool = new LinkedList<LinkedList<WlRequest>>();

LinkedList is definitely not something that should be pooled.

I see pooling on WlRequest, whose constructor looks like just setting simple fields.

and on StatusMsg on line 110; this the the one that attracted my attention in the profiler as using significant time adding it back to the pool

LinkedList<StatusMsg> _statusMsgPool = new LinkedList<StatusMsg>();

It looks like pooling was the default design choice throughout the SDK, and really it should be used only when profiling indicates it is necessary.

vlevendel · 2023-12-28T15:18:14Z

This is included with tag, Real-Time-SDK-2.1.3.L1. Please let us know if you have further questions.

dlipofsky · 2024-01-26T19:36:21Z

@vlevendel , We are currently using com.refinitiv.ema:ema:3.7.1.0 and it looks like the latest version is 3.7.3.0. Is there fix in there? How does it relate to the SDK version you mentioned? Thank you.

vlevendel · 2024-01-26T22:43:33Z

@dlipofsky 2.1.3.L1 is the same as EMA/ETA 3.7.3. This information is found in the CHANGELOG files for each API Flavor:
https://github.com/Refinitiv/Real-Time-SDK/blob/master/Cpp-C/CHANGELOG.md
https://github.com/Refinitiv/Real-Time-SDK/blob/master/CSharp/CHANGELOG.md
https://github.com/Refinitiv/Real-Time-SDK/blob/master/Java/CHANGELOG.md

vlevendel closed this as completed Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor performance in OmmConsumerImpl.unregister(handle) #247

poor performance in OmmConsumerImpl.unregister(handle) #247

dlipofsky commented Aug 17, 2023

L-Karchevska commented Aug 17, 2023

dlipofsky commented Aug 17, 2023

L-Karchevska commented Aug 18, 2023

dlipofsky commented Aug 24, 2023

L-Karchevska commented Aug 25, 2023

dlipofsky commented Aug 25, 2023

vlevendel commented Dec 28, 2023

dlipofsky commented Jan 26, 2024

vlevendel commented Jan 26, 2024 •

edited

poor performance in OmmConsumerImpl.unregister(handle) #247

poor performance in OmmConsumerImpl.unregister(handle) #247

Comments

dlipofsky commented Aug 17, 2023

L-Karchevska commented Aug 17, 2023

dlipofsky commented Aug 17, 2023

L-Karchevska commented Aug 18, 2023

dlipofsky commented Aug 24, 2023

L-Karchevska commented Aug 25, 2023

dlipofsky commented Aug 25, 2023

vlevendel commented Dec 28, 2023

dlipofsky commented Jan 26, 2024

vlevendel commented Jan 26, 2024 • edited

vlevendel commented Jan 26, 2024 •

edited