New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RtpsSendQueue is not scalable #3794
Conversation
tests/unit-tests/dds/DCPS/transport/rtps_udp/TransactionalRtpsSendQueue.cpp
Outdated
Show resolved
Hide resolved
tests/unit-tests/dds/DCPS/transport/rtps_udp/TransactionalRtpsSendQueue.cpp
Outdated
Show resolved
Hide resolved
Problem ------- The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems. Solution -------- Do a final deduplication after all submessages have been enqueued.
8015cf5
to
a62dfac
Compare
|
||
ACE_Guard<ACE_Thread_Mutex> guard(mutex_); | ||
while (active_transaction_count_ != 0) { | ||
condition_variable_.wait(thread_status_manager_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me, since it means that the writing thread (in this case, the transport's single EventDispatcher thread) will block, potentially preventing it from running other scheduled events. The wait will probably be small (I think the transport's reactor thread is the only thing using transactions), but this may have unintended side effects when the transport thread is particularly busy (esp within a single transaction) or as threading design changes within the transport framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I'll change it so the periodic timer will mark the queue ready for harvest and the last transaction to finish will actually drain the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this puts a large number of writes (probably the majority?) back onto the transport thread, which is also not great. Perhaps if the send queue is busy (mid-transaction), the datalink just sets a flag that the next scheduling of flush_send_queue should be immediate (have zero delay) and then a successful flush can clear that flag. That should keep the writes on the eventdispatcher thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it so that the event thread attempts to harvest the send queue. If successful, it will send. If it is not successful, then other transactions are still in progress. In this case, the other threads will harvest the queue and a flush_send_queue
will be scheduled. I'm using a SporadicEvent here, but the delay is always zero. So let me know if there is a more correct way of just scheduling an event for immediate execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you saw this TSAN issue from the last run since the actions were semi-hidden for some reason: https://github.com/objectcomputing/OpenDDS/actions/runs/3242904729/jobs/5321460350 ... looks like something is maybe up with the fsq_vec_
indexing (maybe fsq_vec_size_
is out-of-sync with fsq_vec_
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were. See dbd54c2
Problem
The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems.
Solution
Do a final deduplication after all submessages have been enqueued.