-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock appears be caused by write data and discovery entity threads using tcp transport on the writer side. #4203
Comments
Hi @chunyisong Thanks for the report. Could you please check if with the latest release |
Hi @Mario-DL I tested test-dds with fastdds v2.13.1.Unfortunately,stucked deadlock reappeared! However,with this version, the deadlock is more difficult to trigger.Only starting more subscribers (200 readers per sub) and one publisher(200 writers) and not killing writers can not reproduce the issue (after about 30 trials of simple test,may be lucky).But following steps more likely trigger deadlock:
Additionally,other issues as follows through tests:
|
Hi @chunyisong, thanks for your report! |
Hi @chunyisong, thanks for your patience. |
Sorry for late reply.
I will take some time to test it as soon as possible.
At 2024-03-21 14:44:07, "Jesús Poderoso" ***@***.***> wrote:
Hi @chunyisong, thanks for your patience.
We've just released Fast DDS v2.14.0 with some TCP improvements and fixes (see release notes). I think that the TCPSendResources cleanup may have fixed your issue. Could you check if it persists, please?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@JesusPoderoso Sorry for late test.I test fastdds today with master using same method,the problem still exists. Note:xml using Reliable/No initPeers/TCP |
Below image shows two subs on hostA(220.8), a discovery server and tow pubs on host B(0.202).
@JesusPoderoso May be pdp or epd issues. |
@JesusPoderoso Tested with fastdds 2.14.1 today, and deadlock easily appeared.
|
Hi @chunyisong, thanks for the reproducer! |
@JesusPoderoso According to document of max_blocking_time,writer should return with timeout. |
Hi @wangzm-R , did you try v2.14.1? My test with v2.14.1 still can not recover until the sucked reader/writer is killed (after that the sucked discover server will recover). |
The publisher is on linux, the subscriber is on linux, the subscriber is on the window, when any one subscriber device is powered down (the subscriber kill process is invalid), the publisher thread will block until the subscriber restarts; when subscriber device is powered down , about after 1min30sec, the publisher thread will block at write; |
Today I test fastdds v2.14.2 with one publisher connected to two subscribers,using 2000 topics, and the deadlock almost always appeared.Also test built-in LARGE-DATA mode and custom larg-data.All attempts have shown no signs of improvement. Through testing, it is suspected that the problem is occurring at TCP EDP stage.
|
Is there an already existing issue for this?
Expected behavior
Discovering new entities and writing data should not be stucked!
Current behavior
Deadlock appears be caused by write data and discovery entity threads using tcp transport on the writer side.
I write a simple test program test-dds to reproduce this bug.
To reproduce this bug, open two different consoles:
In the first one for publisher: ./test-dds pub
Then edit DEFAULT_FASTRTPS_PROFILES.xml ,change listen port of tcpv4 to 0 or other different port number.
In the second one for subscriber: ./test-dds sub
Then the deadlock will most likely occur.If no stuckting,restart second console.
From console of publisher,_currentMatchedPubs and _totalPubOkDatas log are not changed.
From console of subscriber,_currentMatchedSubs and _totalSubValidDatas logs are alse not changed.
Additionally,other tests produce same stucked dealock:
Fast DDS version/commit
FastDDS v2.13.0/v2.13.1
Platform/Architecture
Ubuntu Focal 20.04 amd64
Transport layer
TCPv4
XML configuration file
DEFAULT_FASTRTPS_PROFILES.xml
Relevant log output
Network traffic capture
No response
The text was updated successfully, but these errors were encountered: