Deadlock with OmmConsumer #218

charanyarajagopalan · 2022-11-08T02:05:32Z

To give some background, we are creating an EMA consumer, and then opening item streams (100 items batched in each ReqMsg) with the consumer. We simply stop seeing new messages a few minutes after the consumer is established, without any exceptions or errors. This happens while the loop of opening item streams is ongoing. Debugging shows that there is potentially a deadlock situation occurring, as illustrated by the stack traces below. Is there some kind of rate/limit as to how batched requests should be created (we have a single Machine ID and hence a single EMA consumer). Version used is 3.6.7.1

deadlock-trace.txt

charanyarajagopalan · 2022-11-08T06:11:01Z

Update: I increased the batch size to 5000, which means our item list gets through in 3 batched requests, I am still running into the same issue, though not every single time.

dmykhailishen · 2022-11-08T14:45:43Z

@charanyarajagopalan, Thank you for reporting this issue. We will investigate it and fix it in future releases.

charanyarajagopalan · 2022-11-11T07:18:51Z

Update: Switched to using the USER_DISPATCH operational model instead of API_DISPATCH. This helps control preventing message dispatch from being triggered during the creation of the item streams, and prevents deadlock from happening.

L-Karchevska · 2022-11-11T17:07:52Z

@charanyarajagopalan Thank you for additional details!
The stack traces indeed indicate a potential deadlock. The USER_DISPATCH mode eliminates the issue because it makes both requests sending and callbacks be executed on the same thread. However, as for now, we were not able to reproduce the deadlock in our environment. Could you send a code example that reproduces the issue?

charanyarajagopalan · 2022-11-21T03:20:16Z

@L-Karchevska thanks, https://github.com/charanyarajagopalan/issue-218-repro
Will need credentials updated into application.yml to run.

wasin-waeosri · 2022-12-07T10:05:35Z

Hello

I can replicate the same kind of deadlock with EMA Java 3.6.7 L2 by modifying the Consumer ex100_MP_Streaming example to subscribe 10K invalids RICs as follows:

for(int index =0; index <=10000; index++){
   System.out.println("Subscribing TEST_" + index + ".RIC");
   consumer.registerClient(reqMsg.serviceName("ELEKTRON_DD").name("TEST_"+ index + ".RIC"), appClient);
}
System.out.println("Done");

Result: The example shows one "The record could not be found" and stop. The "Done" message is not printed. The stack trace shows a similar pattern as the client above:

Found one Java-level deadlock:
=============================
"main":
  waiting for ownable synchronizer 0x00000006044dda40, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "pool-2-thread-1"
"pool-2-thread-1":
  waiting for ownable synchronizer 0x0000000604b0d398, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "main"

Java stack information for the threads listed above:
===================================================
"main":
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x00000006044dda40> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11/AbstractQueuedSynchronizer.java:917)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11/AbstractQueuedSynchronizer.java:1240)
	at java.util.concurrent.locks.ReentrantLock.lock(java.base@11/ReentrantLock.java:267)
	at com.refinitiv.eta.valueadd.reactor.ReactorChannel.submit(ReactorChannel.java:767)
	at com.refinitiv.ema.access.SingleItem.rsslSubmit(ItemCallbackClient.java:3009)
	at com.refinitiv.ema.access.SingleItem.open(ItemCallbackClient.java:2869)
	at com.refinitiv.ema.access.ItemCallbackClient.registerClient(ItemCallbackClient.java:2192)
	at com.refinitiv.ema.access.OmmBaseImpl.registerClient(OmmBaseImpl.java:532)
	at com.refinitiv.ema.access.OmmConsumerImpl.registerClient(OmmConsumerImpl.java:255)
	at com.refinitiv.ema.examples.training.consumer.series100.ex100_MP_Streaming.Consumer.main(Consumer.java:65)
"pool-2-thread-1":
	at jdk.internal.misc.Unsafe.park(java.base@11/Native Method)
	- parking to wait for  <0x0000000604b0d398> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(java.base@11/LockSupport.java:194)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(java.base@11/AbstractQueuedSynchronizer.java:885)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.base@11/AbstractQueuedSynchronizer.java:917)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@11/AbstractQueuedSynchronizer.java:1240)
	at java.util.concurrent.locks.ReentrantLock.lock(java.base@11/ReentrantLock.java:267)
	at com.refinitiv.ema.access.ItemCallbackClient.removeFromMap(ItemCallbackClient.java:2477)
	at com.refinitiv.ema.access.SingleItem.remove(ItemCallbackClient.java:2932)
	at com.refinitiv.ema.access.ItemCallbackClient.processStatusMsg(ItemCallbackClient.java:1893)
	at com.refinitiv.ema.access.ItemCallbackClient.defaultMsgCallback(ItemCallbackClient.java:1630)
	at com.refinitiv.eta.valueadd.reactor.Reactor.sendDefaultMsgCallback(Reactor.java:2787)
	at com.refinitiv.eta.valueadd.reactor.Reactor.sendAndHandleDefaultMsgCallback(Reactor.java:2896)
	at com.refinitiv.eta.valueadd.reactor.WlItemHandler.callbackUser(WlItemHandler.java:3029)
	at com.refinitiv.eta.valueadd.reactor.WlItemHandler.readMsg(WlItemHandler.java:2002)
	at com.refinitiv.eta.valueadd.reactor.Watchlist.readMsg(Watchlist.java:302)
	at com.refinitiv.eta.valueadd.reactor.Reactor.processRwfMessage(Reactor.java:4662)
	at com.refinitiv.eta.valueadd.reactor.Reactor.performChannelRead(Reactor.java:5001)
	at com.refinitiv.eta.valueadd.reactor.Reactor.dispatchAll(Reactor.java:7674)
	at com.refinitiv.ema.access.OmmBaseImpl.rsslReactorDispatchLoop(OmmBaseImpl.java:1759)
	at com.refinitiv.ema.access.OmmBaseImpl.run(OmmBaseImpl.java:1889)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11/Thread.java:834)

Found 1 deadlock.

I did a quick test with EMA Java 3.6.0 L1 and the same code. It works fine, the example shows 10k status messages for "The record could not be found" and then "Done" message.

vlevendel · 2023-01-07T00:36:47Z

@charanyarajagopalan This is addressed with tag Real-Time-SDK-2.0.8.L1. Please let us know if there are further concerns.

vlevendel closed this as completed Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock with OmmConsumer #218

Deadlock with OmmConsumer #218

charanyarajagopalan commented Nov 8, 2022 •

edited

Loading

charanyarajagopalan commented Nov 8, 2022

dmykhailishen commented Nov 8, 2022

charanyarajagopalan commented Nov 11, 2022

L-Karchevska commented Nov 11, 2022

charanyarajagopalan commented Nov 21, 2022

wasin-waeosri commented Dec 7, 2022

vlevendel commented Jan 7, 2023

Deadlock with OmmConsumer #218

Deadlock with OmmConsumer #218

Comments

charanyarajagopalan commented Nov 8, 2022 • edited Loading

charanyarajagopalan commented Nov 8, 2022

dmykhailishen commented Nov 8, 2022

charanyarajagopalan commented Nov 11, 2022

L-Karchevska commented Nov 11, 2022

charanyarajagopalan commented Nov 21, 2022

wasin-waeosri commented Dec 7, 2022

vlevendel commented Jan 7, 2023

charanyarajagopalan commented Nov 8, 2022 •

edited

Loading