Skip to content

Conversation

@dragonJACson
Copy link
Contributor

@dragonJACson dragonJACson commented Oct 3, 2024

RDMA C functions basically use return value (when it could be used) for returning an errno, or when sometimes the function should return a pointer but return a NULL, user should read errno themselves.

Most of the time, the errno actually suggests an error from kernel, users really need a lot of experience on RDMA to understand what's going wrong by just read a simple 8bit error number.

RDMAmojo and rdma-core man pages provides some hints on errnos, while that need users to search and grab. Rust provides a powerful error handling system, why not just integrate the hints with errno together, plus some context to build our error types? I provide a trial implementation in this commit.

Sometimes the complicated error types introduce performance penalty, so I make it optional.

Related issue: #3

@codecov
Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 7.35294% with 63 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/verbs/queue_pair.rs 7.35% 63 Missing ⚠️
Files with missing lines Coverage Δ
src/verbs/queue_pair.rs 71.49% <7.35%> (-6.31%) ⬇️

@dragonJACson dragonJACson requested a review from FujiZ October 3, 2024 13:53
@dragonJACson
Copy link
Contributor Author

Some examples:

$ ./target/debug/examples/rc_pingpong -d mlx5_0 -n 10 -s 2400 -m 256 -g 1 127.0.0.1
 local address: QPN 0x0326, PSN 0x8088ba, GID fe80:0000:0000:0000:e46d:faff:fefa:6bcb
remote address: QPN 0x0325, PSN 0xf49e1a, GID fe80:0000:0000:0000:e46d:faff:fefa:6bcb
Error: network unreachable, source gid index: 1, destination gid: fe80:0000:0000:0000:e46d:faff:fefa:6bcb

Caused by:
    No route to host (os error 113)
$ ./target/debug/examples/rc_pingpong -n 10 -s 2400 -m 256 -d mlx5_0 127.0.0.1 -g 1
Error: invalid transition from Reset to Init, possible invalid masks QueuePairAttributeMask[], possible needed masks QueuePairAttributeMask[AccessFlags, Port]

Caused by:
    Invalid argument (os error 22)

cc @FujiZ

@dragonJACson dragonJACson marked this pull request as draft October 3, 2024 13:55
@dragonJACson dragonJACson self-assigned this Oct 3, 2024
@dragonJACson dragonJACson force-pushed the dev/rc_pingpong branch 2 times, most recently from 82b75ca to 7363f1f Compare October 6, 2024 03:34
@dragonJACson dragonJACson force-pushed the dev/rc_pingpong branch 7 times, most recently from 214d994 to 23df7d8 Compare October 12, 2024 15:23
Base automatically changed from dev/rc_pingpong to main October 13, 2024 09:37
@dragonJACson dragonJACson force-pushed the dev/error-handling branch 2 times, most recently from 55beae3 to cbdd83e Compare October 13, 2024 16:29
@dragonJACson dragonJACson marked this pull request as ready for review October 22, 2024 14:34
@dragonJACson dragonJACson force-pushed the dev/error-handling branch 2 times, most recently from 7c9c8da to 1d93c3e Compare October 22, 2024 16:34
RDMA C functions basically use return value (when it could be used)
for returning an errno, or when sometimes the function should return
a pointer but return a NULL, user should read errno themselves.

Most of the time, the errno actually suggests an error from kernel,
users really need a lot of experience on RDMA to understand what's
going wrong by just read a simple 8bit error number.

RDMAmojo and rdma-core man pages provides some hints on errnos, while
that need users to search and grab. Rust provides a powerful error
handling system, why not just integrate the hints with errno together,
plus some context to build our error types?

I provide a trial implementation in this commit. Sometimes the complicated
error types introduce performance penalty, so I make it optional.

Signed-off-by: Luke Yue <lukedyue@gmail.com>
@dragonJACson dragonJACson merged commit f90a5ba into main Oct 23, 2024
@dragonJACson dragonJACson deleted the dev/error-handling branch October 23, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants