Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RtpsSendQueue is not scalable #3794

Merged
merged 5 commits into from Oct 13, 2022
Merged

Conversation

jrw972
Copy link
Contributor

@jrw972 jrw972 commented Oct 11, 2022

Problem

The RtpsSendQueue maintains maps for heartbeats and ackancks that must be merged. These maps can become very large leading to performance problems.

Solution

Do a final deduplication after all submessages have been enqueued.

@jrw972 jrw972 self-assigned this Oct 11, 2022
Problem
-------

The RtpsSendQueue maintains maps for heartbeats and ackancks that must
be merged.  These maps can become very large leading to performance
problems.

Solution
--------

Do a final deduplication after all submessages have been enqueued.

ACE_Guard<ACE_Thread_Mutex> guard(mutex_);
while (active_transaction_count_ != 0) {
condition_variable_.wait(thread_status_manager_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me, since it means that the writing thread (in this case, the transport's single EventDispatcher thread) will block, potentially preventing it from running other scheduled events. The wait will probably be small (I think the transport's reactor thread is the only thing using transactions), but this may have unintended side effects when the transport thread is particularly busy (esp within a single transaction) or as threading design changes within the transport framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'll change it so the periodic timer will mark the queue ready for harvest and the last transaction to finish will actually drain the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this puts a large number of writes (probably the majority?) back onto the transport thread, which is also not great. Perhaps if the send queue is busy (mid-transaction), the datalink just sets a flag that the next scheduling of flush_send_queue should be immediate (have zero delay) and then a successful flush can clear that flag. That should keep the writes on the eventdispatcher thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated it so that the event thread attempts to harvest the send queue. If successful, it will send. If it is not successful, then other transactions are still in progress. In this case, the other threads will harvest the queue and a flush_send_queue will be scheduled. I'm using a SporadicEvent here, but the delay is always zero. So let me know if there is a more correct way of just scheduling an event for immediate execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you saw this TSAN issue from the last run since the actions were semi-hidden for some reason: https://github.com/objectcomputing/OpenDDS/actions/runs/3242904729/jobs/5321460350 ... looks like something is maybe up with the fsq_vec_ indexing (maybe fsq_vec_size_ is out-of-sync with fsq_vec_?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were. See dbd54c2

@mitza-oci mitza-oci merged commit b8c2690 into OpenDDS:master Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants