-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Default GPU Utilization Metrics #281
Comments
/kind/feature |
@archlitchi cc |
Yes, I'm implementing this feature. Ref to #258 |
Even if pod does not use GPU, will there be a default value? |
@CoderTH There are two monitors: {gpu node ip}:31992/metrics represents the real-time usage of gpu-resources of each container. These metrics are read from a mmap cache file. That file is generated by HAMi-core only when the container accesses the GPU-related cuda and nvml interfaces. so if the pod does not actually use the GPU -> that file won't be generated -> these metrics will be absent. It's hard to add a |
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Issue or feature description
After assigning vGPU to a pod, we can retrieve detailed information about the pod's vGPU usage through metrics, such as memory usage and computational power. However, if the pod does not actually use the GPU, these metrics will be absent. Is it possible to add default values to these metrics? This way, when a pod is allocated vGPU, corresponding metrics will be exposed regardless of actual GPU utilization.
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
)sudo journalctl -r -u kubelet
)Additional information that might help better understand your environment and reproduce the bug:
docker version
uname -a
dmesg
The text was updated successfully, but these errors were encountered: