Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvtop: hiding one GPU aborts with "We should not be processing a client id twice per update" #222

Closed
nabijaczleweli opened this issue Jul 12, 2023 · 14 comments · Fixed by #247

Comments

@nabijaczleweli
Copy link

Forwarding https://bugs.debian.org/1040892, nvtop/3.0.1-1.

I have a multi-GPU system:

$ lspci -s 0000:00:02.0
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
$ lspci -s 0000:03:00.0
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 24 [Radeon RX 6400/6500 XT/6500M] (rev c1)

and am presently using both, but debian nvtop isn't compiled with i915 support(? is it just not supported at all?), and I don't really care what happens there, and thus I've disabled the "Xeon" one in Setup, GPU Select>; this caused the following on the next update (and on every subsequent restart with the same config)

nvtop: ./src/extract_gpuinfo_amdgpu.c:946: parse_drm_fdinfo_amd: Assertion `!cache_entry_check && "We should not be processing a client id twice per update"' failed.
Aborted

config at https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=1040892;filename=interface.ini;msg=5

Watching both GPUs does work.

@klausman
Copy link

I have the same problem, but with an RTX2070S (that I want to watch) and a CPU-builtin AMD Radeon (which I don't care about). nvtop v3.0.2 as shipped by Debian.

@Lucas-Servi
Copy link

Same issue here with a RTX3060 in UBUNTU 23.04 (nvtop used to work, but now I get this error)

@towo2099
Copy link

towo2099 commented Oct 19, 2023

Same problem here

towo@polaris:~$ inxi -Gxxx
Graphics:
  Device-1: NVIDIA TU106M [GeForce RTX 2060 Mobile] vendor: Tongfang Hongkong driver: nvidia
    v: 545.23.06 pcie: speed: 2.5 GT/s lanes: 8 ports: active: none empty: DP-1,DP-2,HDMI-A-1
    bus-ID: 01:00.0 chip-ID: 10de:1f15 class-ID: 0300
  Device-2: AMD Renoir vendor: Tongfang Hongkong driver: amdgpu v: kernel pcie: speed: 16 GT/s
    lanes: 16 ports: active: eDP-1 empty: none bus-ID: 04:00.0 chip-ID: 1002:1636 class-ID: 0300
  Device-3: Chicony HD Webcam type: USB driver: uvcvideo bus-ID: 3-4:4 chip-ID: 04f2:b642
    class-ID: 0e02
  Display: server: X.org v: 1.21.1.4 with: Xwayland v: 22.1.1 compositor: kwin_x11 driver: X:
    loaded: amdgpu,ati,nvidia unloaded: fbdev,modesetting,nouveau,vesa gpu: amdgpu tty: 256x47
  Monitor-1: eDP-1 model: BOE Display res: 1920x1080 dpi: 142 size: 344x194mm (13.5x7.6")
    diag: 395mm (15.5") modes: max: 1920x1080 min: 640x480
  Message: GL data unavailable in console. Try -G --display

Occurs only in nvidia mode, on-demand is working fine.

nvtop: ./src/extract_gpuinfo_amdgpu.c:964: parse_drm_fdinfo_amd: Assertion `!cache_entry_check && "We should not be processing a client id twice per update"' failed.

@Syllo
Copy link
Owner

Syllo commented Oct 19, 2023

Hello guys,
I think that I found the issue, the AMD GPUs are added to fdinfo callback entries when being initialized. However they are not removed from this callback list when they are marked as hidden in the interface. The cache cleanup is only called when they are being watched, leading to the assertion when being hidden.

I'll come up with a patch to avoid needless fdinfo parsing/walk-through for hidden GPUs.

@Syllo
Copy link
Owner

Syllo commented Oct 19, 2023

Can you guys please try with the patch in #247 to see if my reasoning was right?

@towo2099
Copy link

Sadly, it doesn't work, message remains and then chrash.

@towo2099
Copy link

And now, on my intel Device it does not work anymore, similar message.

@jackyyf
Copy link
Contributor

jackyyf commented Oct 20, 2023

I think this is a similar issue to #196 which haven't been fully fixed and I've proposed #248 for a full fix, and it should fix this issue as well. Please test #248 to see if it could fix this issue.

@towo2099
Copy link

towo2099 commented Oct 20, 2023

Does it need #247 too?
Building with only #248 it does not help.

@towo2099
Copy link

towo2099 commented Oct 20, 2023

Hm, building with both, no luck.

@towo2099
Copy link

Grr, little blind, had installed the wrong package, i build.
It is working on my AMD+Nvidia system, the Intel-AMD system i can test tomorrow.

@towo2099
Copy link

So, applied both MR and it is working on both of my systems.

AMD+Nvidia ==> Ok
Intel+Nvidia ==> Ok

@Syllo
Copy link
Owner

Syllo commented Oct 21, 2023

Thanks @jackyyf

Merging both automatically closed the issue. If anything persists feel free to re-open.

@jackyyf
Copy link
Contributor

jackyyf commented Dec 16, 2023

Hi @Syllo, sorry to trouble you, but could we have a new release for this? (and other patches maybe?) At least for Debian they only catch up with version bump release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants