RtpsSendQueue is not scalable #3794

jrw972 · 2022-10-11T17:02:53Z

Problem

The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems.

Solution

Do a final deduplication after all submessages have been enqueued.

dds/DCPS/transport/rtps_udp/MetaSubmessage.cpp

tests/unit-tests/dds/DCPS/transport/rtps_udp/TransactionalRtpsSendQueue.cpp

dds/DCPS/transport/rtps_udp/TransactionalRtpsSendQueue.h

Problem ------- The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems. Solution -------- Do a final deduplication after all submessages have been enqueued.

simpsont-oci · 2022-10-12T14:39:35Z

dds/DCPS/transport/rtps_udp/TransactionalRtpsSendQueue.cpp

+
+  ACE_Guard<ACE_Thread_Mutex> guard(mutex_);
+  while (active_transaction_count_ != 0) {
+    condition_variable_.wait(thread_status_manager_);


This worries me, since it means that the writing thread (in this case, the transport's single EventDispatcher thread) will block, potentially preventing it from running other scheduled events. The wait will probably be small (I think the transport's reactor thread is the only thing using transactions), but this may have unintended side effects when the transport thread is particularly busy (esp within a single transaction) or as threading design changes within the transport framework.

That's fair. I'll change it so the periodic timer will mark the queue ready for harvest and the last transaction to finish will actually drain the queue.

I think this puts a large number of writes (probably the majority?) back onto the transport thread, which is also not great. Perhaps if the send queue is busy (mid-transaction), the datalink just sets a flag that the next scheduling of flush_send_queue should be immediate (have zero delay) and then a successful flush can clear that flag. That should keep the writes on the eventdispatcher thread.

I updated it so that the event thread attempts to harvest the send queue. If successful, it will send. If it is not successful, then other transactions are still in progress. In this case, the other threads will harvest the queue and a flush_send_queue will be scheduled. I'm using a SporadicEvent here, but the delay is always zero. So let me know if there is a more correct way of just scheduling an event for immediate execution.

Not sure if you saw this TSAN issue from the last run since the actions were semi-hidden for some reason: https://github.com/objectcomputing/OpenDDS/actions/runs/3242904729/jobs/5321460350 ... looks like something is maybe up with the fsq_vec_ indexing (maybe fsq_vec_size_ is out-of-sync with fsq_vec_?).

They were. See dbd54c2

dds/DCPS/transport/rtps_udp/MetaSubmessage.cpp

dds/DCPS/transport/rtps_udp/RtpsUdpDataLink.cpp

jrw972 requested review from mitza-oci and simpsont-oci October 11, 2022 17:02

jrw972 self-assigned this Oct 11, 2022

mitza-oci requested changes Oct 12, 2022

View reviewed changes

RtpsSendQueue is not scalable

a62dfac

Problem ------- The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems. Solution -------- Do a final deduplication after all submessages have been enqueued.

jrw972 force-pushed the send-queue-refactor branch from 8015cf5 to a62dfac Compare October 12, 2022 14:02

simpsont-oci reviewed Oct 12, 2022

View reviewed changes

dds/DCPS/transport/rtps_udp/MetaSubmessage.cpp Outdated Show resolved Hide resolved

From review

659bbc2

mitza-oci approved these changes Oct 12, 2022

View reviewed changes

mitza-oci requested a review from simpsont-oci October 12, 2022 21:21

jrw972 added 2 commits October 12, 2022 21:46

From review

791449f

From CI

dbd54c2

simpsont-oci reviewed Oct 13, 2022

View reviewed changes

dds/DCPS/transport/rtps_udp/RtpsUdpDataLink.cpp Show resolved Hide resolved

From review

e1070f8

simpsont-oci approved these changes Oct 13, 2022

View reviewed changes

mitza-oci merged commit b8c2690 into OpenDDS:master Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RtpsSendQueue is not scalable #3794

RtpsSendQueue is not scalable #3794

jrw972 commented Oct 11, 2022

simpsont-oci Oct 12, 2022

jrw972 Oct 12, 2022

simpsont-oci Oct 12, 2022

jrw972 Oct 13, 2022

simpsont-oci Oct 13, 2022

jrw972 Oct 13, 2022

RtpsSendQueue is not scalable #3794

RtpsSendQueue is not scalable #3794

Conversation

jrw972 commented Oct 11, 2022

Problem

Solution

simpsont-oci Oct 12, 2022

Choose a reason for hiding this comment

jrw972 Oct 12, 2022

Choose a reason for hiding this comment

simpsont-oci Oct 12, 2022

Choose a reason for hiding this comment

jrw972 Oct 13, 2022

Choose a reason for hiding this comment

simpsont-oci Oct 13, 2022

Choose a reason for hiding this comment

jrw972 Oct 13, 2022

Choose a reason for hiding this comment