We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes version:v1.23.16
Client: Docker Engine - Community Version: 24.0.2 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.10.5 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.18.1 Path: /usr/libexec/docker/cli-plugins/docker-compose
Server: Containers: 110 Running: 56 Paused: 0 Stopped: 54 Images: 40 Server Version: 20.10.24 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc Default Runtime: nvidia Init Binary: docker-init containerd version: 3dce8... runc version: v1.1.7-0-g860f061 init version: de40ad0 Security Options: apparmor seccomp Profile: default Kernel Version: 5.4.0-150-generic Operating System: Ubuntu 20.04.6 LTS OSType: linux Architecture: x86_64 CPUs: 48 Total Memory: 125.6GiB Name: node01 ID: Docker Root Dir: /app/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false
WARNING: No swap limit support
我在集群上部署了gpushare做GPU共享,并且使用dcgm-exporter来做监控。https://github.com/NVIDIA/dcgm-exporter 但是在普罗米修斯上看不到GPU利用率的参数值,以及无法监控pod的gpu资源利用率 有同学用过这种方案吗,麻烦支持一下。
The text was updated successfully, but these errors were encountered:
同问+1
Sorry, something went wrong.
目前能收集到的指标太少了,温度功耗指标我该怎么获取。
No branches or pull requests
kubernetes version:v1.23.16
nvidia-docker info
Client: Docker Engine - Community
Version: 24.0.2
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.10.5
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.18.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 110
Running: 56
Paused: 0
Stopped: 54
Images: 40
Server Version: 20.10.24
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 3dce8...
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-150-generic
Operating System: Ubuntu 20.04.6 LTS
OSType: linux
Architecture: x86_64
CPUs: 48
Total Memory: 125.6GiB
Name: node01
ID:
Docker Root Dir: /app/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
我在集群上部署了gpushare做GPU共享,并且使用dcgm-exporter来做监控。https://github.com/NVIDIA/dcgm-exporter
但是在普罗米修斯上看不到GPU利用率的参数值,以及无法监控pod的gpu资源利用率
有同学用过这种方案吗,麻烦支持一下。
The text was updated successfully, but these errors were encountered: