How is this issue impacting you?
Application hang
Share Your Debug Logs
Hi NVSHMEM team,
I encountered a hang in IBRC on our internal cluster that was traced back to a hardcoded pkey_index = 0 during QP creation. In environments where the default partition key is not at index 0, this causes cq polling to fail silently and ultimately results in a hang.
Steps to Reproduce the Issue
I'd like to suggest replacing the hardcoded value with a configurable environment variable NVSHMEM_IB_PKEY_INDEX (defaulting to 0 to preserve existing behavior). The changes are minimal and touch three sites:
Happy to share the patch or discuss further. Thanks!
NVSHMEM Version
v3.6.5
Your platform details
No response
Error Message & Behavior
No response
How is this issue impacting you?
Application hang
Share Your Debug Logs
Hi NVSHMEM team,
I encountered a hang in IBRC on our internal cluster that was traced back to a hardcoded pkey_index = 0 during QP creation. In environments where the default partition key is not at index 0, this causes cq polling to fail silently and ultimately results in a hang.
Steps to Reproduce the Issue
I'd like to suggest replacing the hardcoded value with a configurable environment variable NVSHMEM_IB_PKEY_INDEX (defaulting to 0 to preserve existing behavior). The changes are minimal and touch three sites:
Happy to share the patch or discuss further. Thanks!
NVSHMEM Version
v3.6.5
Your platform details
No response
Error Message & Behavior
No response