Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVML ERROR: RM has detected an NVML/RM version mismatch. #41

Closed
bounlu opened this issue Sep 27, 2022 · 4 comments
Closed

NVML ERROR: RM has detected an NVML/RM version mismatch. #41

bounlu opened this issue Sep 27, 2022 · 4 comments

Comments

@bounlu
Copy link

bounlu commented Sep 27, 2022

I installed the nvitop via pip3 as described and it worked fine.

Then I installed nvcc via:

sudo apt install nvidia-cuda-toolkit

Then nvitop stopped working with the error:

NVML ERROR: RM has detected an NVML/RM version mismatch.

How to make both work?

@XuehaiPan
Copy link
Owner

Then nvitop stopped working with the error:

NVML ERROR: RM has detected an NVML/RM version mismatch.

@bounlu The command sudo apt install nvidia-cuda-toolkit modifies your NVIDIA driver. You need to offload and reload the NVIDIA kernel module. The easiest and safest way is to restart your machine.

If you are sure that currently there are no processes (including the Desktop GUI) using the GPU, you can try the following command without a restart:

sudo modprobe -r -f $(sudo lsmod | grep '^nvidia' | awk '{ print $1 }')
nvidia-smi

@bounlu
Copy link
Author

bounlu commented Sep 27, 2022

I already restarted the server, which didn't help.

nvidia-smi is not installed, when I try to install, I am afraid it will conflict and break things again:

$ nvidia-smi
Command 'nvidia-smi' not found, but can be installed with:
sudo apt install nvidia-utils-390         # version 390.154-0ubuntu0.22.04.1, or
sudo apt install nvidia-utils-450-server  # version 450.203.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470         # version 470.141.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-470-server  # version 470.141.03-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510         # version 510.85.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-510-server  # version 510.85.02-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515         # version 515.65.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-515-server  # version 515.65.01-0ubuntu0.22.04.1
sudo apt install nvidia-utils-418-server  # version 418.226.00-0ubuntu4

$ sudo pip3 install --upgrade nvitop
Requirement already satisfied: nvitop in /usr/local/lib/python3.10/dist-packages (0.8.1)
Requirement already satisfied: psutil>=5.6.6 in /usr/local/lib/python3.10/dist-packages (from nvitop) (5.9.2)
Requirement already satisfied: cachetools>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from nvitop) (5.2.0)
Requirement already satisfied: nvidia-ml-py<11.500.0a0,>=11.450.51 in /usr/local/lib/python3.10/dist-packages (from nvitop) (11.495.46)
Requirement already satisfied: termcolor>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from nvitop) (2.0.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

$ nvitop 
NVML ERROR: RM has detected an NVML/RM version mismatch.

$ uname -r
5.15.0-48-generic

$ sudo apt install nvidia-utils-515-server
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libaccinj64-11.5 libasyncns0 libbabeltrace1 libboost-regex1.74.0 libcub-dev libcublas11 libcublaslt11 libcudart11.0 libcufft10 libcufftw10 libcupti-dev libcupti-doc libcupti11.5 libcurand10
  libcusolver11 libcusolvermg11 libcusparse11 libdebuginfod-common libdebuginfod1 libdouble-conversion3 libegl-dev libegl-mesa0 libegl1 libflac8 libgail-common libgail18 libgbm1 libgl-dev
  libgl1-mesa-dev libgles-dev libgles1 libgles2 libglvnd-core-dev libglvnd-dev libglx-dev libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libipt2 libnppc11 libnppial11 libnppicc11 libnppidei11
  libnppif11 libnppig11 libnppim11 libnppist11 libnppisu11 libnppitc11 libnpps11 libnvblas11 libnvjpeg11 libnvrtc-builtins11.5 libnvrtc11.2 libnvtoolsext1 libnvvm4 libogg0 libopengl-dev
  libopengl0 libopus0 libpthread-stubs0-dev libpulse0 libqt5core5a libqt5dbus5 libqt5network5 libsndfile1 libsource-highlight-common libsource-highlight4v5 libtbb-dev libtbb12 libtbbmalloc2
  libthrust-dev libvdpau-dev libvdpau1 libvorbis0a libvorbisenc2 libwayland-server0 libx11-dev libxau-dev libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-render-util0 libxcb-util1
  libxcb-xinerama0 libxcb-xkb1 libxcb1-dev libxdmcp-dev libxkbcommon-x11-0 mesa-vdpau-drivers node-html5shiv nsight-compute nsight-compute-target nvidia-cuda-gdb nvidia-cuda-toolkit-doc
  nvidia-opencl-dev ocl-icd-libopencl1 ocl-icd-opencl-dev opencl-c-headers opencl-clhpp-headers openjdk-8-jre qttranslations5-l10n vdpau-driver-all x11proto-dev xorg-sgml-doctools xtrans-dev
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libnvidia-compute-515-server
Suggested packages:
  nvidia-driver-515-server
The following packages will be REMOVED:
  libcuinj64-11.5 libnvidia-compute-495 libnvidia-compute-510 libnvidia-ml-dev nsight-systems nsight-systems-target nvidia-cuda-dev nvidia-cuda-toolkit nvidia-profiler nvidia-visual-profiler
The following NEW packages will be installed:
  libnvidia-compute-515-server nvidia-utils-515-server
0 upgraded, 2 newly installed, 10 to remove and 8 not upgraded.
Need to get 365 kB/50.3 MB of archives.
After this operation, 2,733 MB disk space will be freed.
Do you want to continue? [Y/n] n
Abort.

@XuehaiPan
Copy link
Owner

XuehaiPan commented Sep 27, 2022

I installed the nvitop via pip3 as described and it worked fine.

nvidia-smi is not installed

How do you install the NVIDIA driver? By .run file or apt? If your install the driver via .run file, you should uninstall it via .run file first.

If you install the NVIDIA driver via apt, try:

dpkg-query --show --showformat='${binary:Package} ${Status}\n' | 
    grep -v deinstall | awk '{ print $1 }' | grep nvidia-driver |
    xargs -L 1 sudo apt remove --purge

sudo apt autoremove

to install your driver first. Then:

git clone --depth=1 https://github.com/XuehaiPan/nvitop.git
cd nvitop
sudo chvt 3
./install-nvidia-driver.sh

see NVIDIA driver installer for more details.

@bounlu
Copy link
Author

bounlu commented Sep 27, 2022

Following these fixed it. Thanks a million.

@bounlu bounlu closed this as completed Sep 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants