Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] UTF-8 Error during decoding device name on R555 driver #127

Open
3 tasks done
kangkannnng opened this issue May 26, 2024 · 9 comments
Open
3 tasks done

[BUG] UTF-8 Error during decoding device name on R555 driver #127

kangkannnng opened this issue May 26, 2024 · 9 comments
Assignees
Labels
bug Something isn't working pynvml Something related to the `nvidia-ml-py` package upstream Something upstream related

Comments

@kangkannnng
Copy link

kangkannnng commented May 26, 2024

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

1.3.2

Operating system and version

Ubuntu 22.04 / WSL

NVIDIA driver version

555.42.03

NVIDIA-SMI

Sun May 26 15:50:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.03              Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        On  |   00000000:01:00.0  On |                  N/A |
| 36%   42C    P8             31W /  370W |    1544MiB /  24576MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        24      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+

Python environment

3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0] linux
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.1.105

Problem description

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Steps to Reproduce

nvitop

Traceback

Traceback (most recent call last):
  File "/home/kang/miniconda3/envs/pytorch/bin/nvitop", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/cli.py", line 353, in main
    ui = UI(
         ^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/ui.py", line 43, in __init__
    self.main_screen = MainScreen(
                       ^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/__init__.py", line 38, in __init__
    self.device_panel = DevicePanel(self.devices, compact, win=win, root=root)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 61, in __init__
    self.snapshots = self.take_snapshots()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/cachetools/__init__.py", line 702, in wrapper
    v = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/screens/main/device.py", line 142, in take_snapshots
    snapshots = [device.as_snapshot() for device in self.all_devices]
                 ^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/gui/library/device.py", line 72, in as_snapshot
    self._snapshot = super().as_snapshot()
                     ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 2146, in as_snapshot
    **{key: getattr(self, key)() for key in self.SNAPSHOT_KEYS},
            ^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/device.py", line 868, in name
    self._name = libnvml.nvmlQuery('nvmlDeviceGetName', self.handle)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/nvitop/api/libnvml.py", line 433, in nvmlQuery
    retval = func(*args, **kwargs)  # type: ignore[operator]
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kang/miniconda3/envs/pytorch/lib/python3.12/site-packages/pynvml.py", line 1921, in wrapper
    return res.decode()
           ^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte

Logs

No response

Expected behavior

No response

Additional context

No response

@kangkannnng kangkannnng added the bug Something isn't working label May 26, 2024
@XuehaiPan XuehaiPan added upstream Something upstream related pynvml Something related to the `nvidia-ml-py` package labels May 26, 2024
@XuehaiPan XuehaiPan changed the title [BUG] UTF-8 Error [BUG] UTF-8 Error during decoding device name on R555 driver May 26, 2024
@XuehaiPan
Copy link
Owner

Similar issues on other repos:

I cannot reproduce this on native Linux with 555.42.02 driver (the latest driver shipped with CUDA toolkit 12.5 at the time this comment is posted).

$ nvidia-smi
Sun May 26 22:40:20 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060        On  |   00000000:01:00.0 Off |                  N/A |
| 53%   45C    P8             14W /  170W |       2MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

It seems this is a bug that only occurs in WSL with 555.85 driver.

@ssjjrrr
Copy link

ssjjrrr commented Jun 14, 2024

Same problem

$ nvidia-smi
Fri Jun 14 20:48:14 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.52.01              Driver Version: 555.99         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
| 71%   72C    P0            279W /  285W |    6457MiB /  16376MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     32210      C   /python3.8                                  N/A      |
+-----------------------------------------------------------------------------------------+

@Saya47
Copy link

Saya47 commented Jul 3, 2024

Same issue for me as well. Windows 11, WSL2. Used to use nvitop just fine then stopped working with the UnicodeDecodeError error. Found this issue and I remembered I had also upgraded Nvidia Driver.

@winkeylucky
Copy link

Same problem, Windows 11, WSL2
| NVIDIA-SMI 555.58.02 Driver Version: 556.12 CUDA Version: 12.5

@winkeylucky
Copy link

winkeylucky commented Jul 4, 2024

image
根据报错我把这个文件改了(~/anaconda3/lib/python3.12/site-packages/pynvml.py", line 1921),,然后就有这个效果
image

@XuehaiPan
Copy link
Owner

XuehaiPan commented Jul 4, 2024

Could you try to use the latest version of nvitop and downgrade the nvidia-ml-py version?

pip3 install git+https://github.com/XuehaiPan/nvitop.git#egg=nvitop
pip3 install nvidia-ml-py==11.515.48

@winkeylucky
Copy link

install nvidia-ml-py==11.515.48

这边试过:
nvidia-ml-py 11.515.48
nvitop 1.3.3.dev20+g6bc8a8b
image
同样的问题,只是位置变了

@Gh0stExp10it
Copy link

I was able to confirm that the problem is fixed with NVIDIA driver version 560.70.

wookayin/gpustat#170 (comment)

@kenvix
Copy link

kenvix commented Aug 11, 2024

The latest nvidia driver has fixed this issue. Simply download the latest Windows driver from https://www.nvidia.cn/drivers/lookup/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pynvml Something related to the `nvidia-ml-py` package upstream Something upstream related
Projects
None yet
Development

No branches or pull requests

7 participants