Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvtop issues on 8-GPU systems #71

Closed
kaerka opened this issue Jun 15, 2020 · 7 comments
Closed

nvtop issues on 8-GPU systems #71

kaerka opened this issue Jun 15, 2020 · 7 comments

Comments

@kaerka
Copy link

kaerka commented Jun 15, 2020

Great tool - I was testing this on a system with 8-GPUs (all NVIDIA V100s) - and it breaks after some number (6 or 7 GPUs). But - it appears to be an ncurses rendering issue, as it will work at a smaller screen size than full-screen. I suspect it has to do with screen size relative to number of plots that it attempts to generate and present.

In full-screen, it gives a blank output, and the process won't end with either CTRL-C or F10. It will end if you CTRL-Z terminate it, and then kill -HUP the back-grounded process.

nvtop-8-gpus-small-window
nvtop-8-gpus-noplot
nvtop-6-gpus

@Syllo
Copy link
Owner

Syllo commented Jul 1, 2020

Hello and thank you for the report.
It is definitely an nvtop bug. It is similar, and has been reported in issue #47 but I was unable to reproduce it to fix the bug on my system at the time.
I'll dig this using gdb to modify the amount of gpu and terminal size.

@Syllo
Copy link
Owner

Syllo commented Aug 22, 2020

I think that I nailed the problem down.
Could you please try the version on the branch fix_plot_manygpu (commit 71b7f96)

@kaerka
Copy link
Author

kaerka commented Aug 26, 2020

I was able to test this, and it works pretty well now - no crashes. The formatting is a little weird, but that might be my screen resolution or terminal. Also, I had to censor this a bit, as the server was in use when I tested this.
nvtop-8gpu-082620-censored

@kaerka
Copy link
Author

kaerka commented Aug 26, 2020

Also, if you're curious, as you may not have an 8 gpu system to test on - here is what it looks like with a smaller terminal window. Censored again, as it had some real workload running at the time.
nvtop-8gpu-082620-smaller-window-censored

@KGHustad
Copy link

I also saw problems with nvtop freezing on a blank output previously on a 16 GPU server, but I cannot reproduce them anymore when building from the fix_plot_manygpu branch (commit 71b7f96).

There are, however, some issues with plots seemingly being placed outside the visible terminal window. The terminal window in the screenshot below has 101 lines and 177 columns. The plots for GPU 4-5 and 10-11 appear to be placed just outside the right edge of the window.
nvtop_101_lines_177_cols

It looks fine at 102 lines and 177 columns.
nvtop_102_lines_177_cols

@Syllo
Copy link
Owner

Syllo commented Aug 27, 2020

Nice to hear that it works without blanking.

@KGHustad I might have found the problem with the plot outside the window and pushed the patch e37c62c (last commit in the branch fix_plot_manygpu).
Can you please tell me if that resolves your problem and I will merge the patches to master.

@kaerka Yes the first layout may seem weird, but the algorithm maximizes the plot size and may end up packing the top device information like that!
I never had so many GPU at once to see the problem but I might end up requiring a space between each device.

@KGHustad
Copy link

@Syllo That patch solves the problem. Thanks a lot!
nvtop_101_lines_177_cols_new

@Syllo Syllo closed this as completed in 1d13593 Aug 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants