Skip to content
This repository has been archived by the owner on May 16, 2024. It is now read-only.

Failed to Create QP #29

Open
zlwfrank opened this issue Mar 19, 2020 · 3 comments
Open

Failed to Create QP #29

zlwfrank opened this issue Mar 19, 2020 · 3 comments

Comments

@zlwfrank
Copy link

I tried to deploy the rdma device plugin in HCA mode in my kubernetes cluster. I followed the instruction and the device plugin can be registered successfully. If I run "kubectl describe node [node_name]", I can find the rdma/hca resource. If I run "ibstat" in the pods, the inifiniband information shows up and the status is active/up.

However, when I tried to run a connection test using "ib_read_bw", it threw me following error: "Couldn't get device attribute.
Unable to create QP.
Failed to create QP.
Couldn't create IB resource."

I simply run the test by running "ib_read_bw" in one pod and running "ib_read_bw [target_pod_ip_addr]" in another pod. Could anyone please help with this issue? I appreciate your help.

@paravmellanox
Copy link
Collaborator

@zlwfrank
container might not have IPC_LOCK capabilities.

Refer to example here to add "IPC_LOCK" line at appropriate place.

spec:
restartPolicy: OnFailure
containers:

  • image: mellanox/mlnx_ofed_linux-4.4-1.0.0.0-centos7.4
    name: mofed-test-ctr
    securityContext:
    capabilities:
    add: [ "IPC_LOCK" ]

@zlwfrank
Copy link
Author

@paravmellanox Thanks for the reply. Actually I was using the provided sample .yaml file and the IPC_LOCK capability had been added.

This is the file I used:

apiVersion: v1
kind: Pod
metadata:
name: ib-test-pod-1
spec:
restartPolicy: OnFailure
containers:

  • image: mellanox/centos_7_4_mofed_4_2_1_2_0_0_60
    name: mofed-test-ctr
    securityContext:
    capabilities:
    add: [ "IPC_LOCK" ]
    resources:
    limits:
    rdma/hca: 1
    command:
    • sh
    • -c
    • |
      ls -l /dev/infiniband /sys/class/net
      sleep 1000000

@yh-xu
Copy link

yh-xu commented Dec 22, 2020

@zlwfrank have you resolved this problem? I got the same symptom of "fail to create qp" when running ib_read_bw inside container, and had no idea how to deal with.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants