Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如果使用国产卡的话,还需要安装nvidia-container-toolkit吗? #325

Open
janetat opened this issue May 28, 2024 · 4 comments
Open

Comments

@janetat
Copy link

janetat commented May 28, 2024

问题1:如果我只有国产卡,例如昇腾,还需要进行preparing-your-gpu-nodes操作吗?
问题2:如果我的集群是异构的,有NVIDIA卡的节点A,又有昇腾卡的节点B,那么全部节点都需要安装nvidia-container-toolkit吗?

@CoderTH
Copy link
Contributor

CoderTH commented May 28, 2024

Question1: if you are using ascend npu, you should refer to this document :https://github.com/Project-HAMi/HAMi/blob/master/docs/ascend910b-support_cn.md
Question 2: only nvidia gpu nodes need to install nvidia-container-toolkit,ascend. Nodes need to install ascend-docker-runtime.

@janetat
Copy link
Author

janetat commented May 28, 2024

Thanks! @CoderTH

问题3:
一个节点只允许一个厂商的GPU么?能不能同时存在节点A既有NVIDIA的GPU,而且有昇腾的NPU?

问题4:
hami的nvidia device plugin Allocate接口返回信息,容器运行时根据这些信息来进行挂载设备文件、动态库。为什么还需要nvidia-container-runtime呢?原生的runc不也可以根据Allocate响应信息来进行挂载么?

@wawa0210
Copy link
Member

wawa0210 commented Jul 9, 2024

@janetat

Q3: It is recommended that a single node use the same type of GPU. Heterogeneous nodes are not recommended.

@hzliangbin
Copy link

@janetat
In addition to handling device mounting and library files, the nvidia-container-runtime also needs to manage additional configurations related to the GPU (such as environment variables, CUDA paths, etc.), which runc does not provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants