Skip to content

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463

Open
parallelArchitect wants to merge 1 commit intoSyllo:masterfrom
parallelArchitect:fix/gb10-unified-memory-detection
Open

Fix NVML memory reporting regression on coherent UMA platforms (Fixes…#463
parallelArchitect wants to merge 1 commit intoSyllo:masterfrom
parallelArchitect:fix/gb10-unified-memory-detection

Conversation

@parallelArchitect
Copy link
Copy Markdown

@parallelArchitect parallelArchitect commented Apr 15, 2026

#449)

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with total == system MemTotal (~121GB). This prevents has_unified_memory from being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1.

Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal. If total >= 90% of system RAM, classify as unified memory and use MemAvailable instead of MemTotal for display.

Note: requires validation on GB10 / DGX Spark hardware. Author does not have access to a coherent UMA system.

References

NVML API documentation on SOC/UMA behavior: https://docs.nvidia.com/deploy/nvml-api/nvml-api-reference.html
Community NVML shim for GB10 UMA: https://forums.developer.nvidia.com/t/nvml-support-for-dgx-spark-grace-blackwell-unified-memory-community-solution/358869
NVML memory fix at the shim layer: https://github.com/parallelArchitect/nvml-unified-shim
btop PR: aristocratos/btop#1611
nvitop PR: XuehaiPan/nvitop#208

…yllo#449)

On GB10 / DGX Spark, nvmlDeviceGetMemoryInfo returns NVML_SUCCESS with
total == system MemTotal (~121GB). This prevents has_unified_memory from
being set, causing incorrect VRAM reporting and broken memory graph since 3.3.1.

Fix: detect UMA by comparing NVML total against /proc/meminfo MemTotal.
If total >= 90% of system RAM, classify as unified memory and use
MemAvailable instead of MemTotal for display.

Note: requires validation on GB10 / DGX Spark hardware. Author does not
have access to a coherent UMA system.
@parallelArchitect
Copy link
Copy Markdown
Author

For anyone needing the NVML memory fix now while this PR is under review — the fix is available in this fork: https://github.com/parallelArchitect/nvml-unified-shim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant