fix(ibverbs): fix integration tests failing on InfiniBand link layer#106
Open
adamcavendish wants to merge 1 commit intoRDMA-Rust:mainfrom
Open
fix(ibverbs): fix integration tests failing on InfiniBand link layer#106adamcavendish wants to merge 1 commit intoRDMA-Rust:mainfrom
adamcavendish wants to merge 1 commit intoRDMA-Rust:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request introduces a lid() method to the PortAttr struct and updates test cases in queue_pair.rs, test_post_send.rs, and test_qp.rs to use dynamic LID values and include InfiniBand GIDs in the selection criteria. The review feedback suggests further refining the GID selection logic in these tests by explicitly filtering for the correct port number to ensure reliability on multi-port systems.
Codecov Report✅ All modified and coverable lines are covered by tests.
... and 2 files with indirect coverage changes 🚀 New features to boost your workflow:
|
The GID filter in tests rejected all InfiniBand GIDs because they are link-local and not RoceV1 type. Add GidType::InfiniBand to the filter and restrict selection to the correct port number so tests work on both IB and RoCE devices, including multi-port HCAs. Also add PortAttr::lid() accessor and use the queried port LID instead of hardcoding dest_lid=1, which is wrong for IB fabrics where LIDs are assigned by the subnet manager. Verified all tests pass on mlx5 ConnectX-6 InfiniBand HCAs.
352c427 to
4808c59
Compare
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GidType::InfiniBand(previously only matched RoCE GIDs, causing all IB tests to panic onunwrap())PortAttr::lid()accessor and replace hardcodedsetup_dest_lid(1)with the actual port LID fromquery_port()test_qp,test_post_send,test_query_qp, andtest_post_recv_errorson InfiniBand fabricsRoot Cause
The GID filter used in all integration tests:
This filter accepts GIDs that are either:
On InfiniBand, all GIDs are
fe80::link-local with typeGidType::InfiniBand. Neither condition matches, sofind()returnsNoneandunwrap()panics.Additionally,
setup_dest_lid(1)was hardcoded, but on IB fabrics the subnet manager assigns LIDs dynamically (e.g.,port_lid: 3471on our test node).Failure Logs (before fix)
Tested on A800 node with 4x mlx5 ConnectX-6 InfiniBand HCAs (FW 20.39.3560):
Unit tests in
queue_pair.rsalso failed at the same filter:Fix
And replace
setup_dest_lid(1)with:After fix — all tests pass
Test plan
test_qppasses (QP state transitions Init→RTR→RTS)test_post_sendall 4 cases pass (loopback send/write/inline with both basic and extended QP)test_cqandtest_mr_and_cqstill passcompiletest(compile-fail ownership tests) still pass