Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg="Failed to collect metrics with error: Failed to transform metrics for transform unsupported KubernetesGPUIDType for MetricID 'device_name': podMapper" #145

Closed
suchisur opened this issue Mar 14, 2023 · 1 comment

Comments

@suchisur
Copy link

No description provided.

@suchisur
Copy link
Author

suchisur commented Mar 14, 2023

Tried this : #27, basically added - name: "DCGM_EXPORTER_KUBERNETES_GPU_ID_TYPE" value: "device-name" On doing this, i run into the above-mentioned error as viewed on the dcgm-exporter daemonet pod logs
P.S> we are using time-slicing and each node has one GPU attached

@suchisur suchisur changed the title I mounted the /proc on the node to /proc on the dcgm exporter pod and can view the processes on doing nvidia-smi now, however on prometheus no per pod metrics are available. Tried this : #27, basically added - name: "DCGM_EXPORTER_KUBERNETES_GPU_ID_TYPE" value: "device-name" On doing this, i run into: msg="Failed to collect metrics with error: Failed to transform metrics for transform unsupported KubernetesGPUIDType for MetricID 'device_name': podMapper" msg="Failed to collect metrics with error: Failed to transform metrics for transform unsupported KubernetesGPUIDType for MetricID 'device_name': podMapper" Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant