New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does UCX supprot TCP multi-rail? #9763
Comments
Checked the related code, seems I must construct data in a way to trigger both am_zcopy_first and am_zcopy_middle to drive more than one TCP ifaces to work. So I added the following parameters to ucx_perftest and try again: From lsof -Pn -p, I can see ucx_perf build two sockets to remote node but use the same local and remote ip port , it seems a issue? ucx_perft 3264 root 15u IPv4 60939 0t0 TCP 192.168.200.2:37768->192.168.200.1:57119 (ESTABLISHED) I would like to promote this disscusion to an issue for more attention, many thanks! Version: ucx_info -d Run command: Run log: Profiling result output(not -n 10000 but smaller value): |
By examing the run log, it seems the following things happend(on one side of ucx_perftest, the other side not listed) to eventually cause the problem :
Do not know how to solve this, if above analysis is right, some ideals currenly are: |
Confirmed that by adding the following code:
into here Line 605 in dba7b0d , then it seems the problem can be solved, both NIC are sending out traffic to peer now. P.S. trafic rate is 1.8 to 1.4. |
This trick need arp_filter=0 or 2. So for those who really need arp_filter=1, it is not a solution. So this issue seems to be a bug? |
Discussed in #9753
Originally posted by huzhijiang March 17, 2024
Here is the TCP multi-rail test result. It seems UCX always choose one rail to send message, not both. And the perfomance also down a bit if trying to use two TCP devices:
The text was updated successfully, but these errors were encountered: