Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU monitoring support #529

Merged
merged 50 commits into from Nov 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
d522a91
Add rudimentary, fullscreen single-GPU NVML utilization graph
romner-set May 12, 2023
adcdc58
Add GPU side panel
romner-set May 12, 2023
95b3228
Improve GPU side panel
romner-set May 13, 2023
bcffcdf
Make GPU window's size dynamic and integrate it with the rest of btop
romner-set May 14, 2023
0e0025a
Update makefile text, fix typo and adhere to contibuting guidelines
romner-set May 14, 2023
2d27f2f
Fix crash when no nvidia GPU is detected
romner-set May 14, 2023
917d568
Add multi-GPU support for NVML data collection
romner-set May 15, 2023
c352bf2
Add ROCm SMI backend for AMD GPU support
romner-set May 15, 2023
22a4639
Add GPU info to CPU panel
romner-set May 18, 2023
01acfd6
Bind GPU panel to 5,6,7,8,9,0 and fully implement multi-GPU support
romner-set May 19, 2023
8bae1ec
Fixed debug timer for gpu
aristocratos May 19, 2023
8c710a2
Makefile auto detection and initial logic for excluding gpu code when…
aristocratos May 19, 2023
04ed16a
Merged changes from main
aristocratos May 20, 2023
2e68c0b
Fixed key > gpu_names check
aristocratos May 20, 2023
1fee2bc
Add DebugTimer class and change some Logger::error calls to Logger::d…
aristocratos May 21, 2023
005de97
Add missing fmt prefixes
aristocratos May 21, 2023
414d7eb
Handle GPUs which cannot report certain stats in btop_collect.cpp and…
romner-set May 21, 2023
8c96bd5
Handle GPUs which cannot report certain stats in GPU panel
romner-set May 21, 2023
842c761
Fix crash when all GPU panels are open but the CPU panel is closed
romner-set May 22, 2023
547f17d
Add more GPU graph types to the CPU panel
romner-set May 30, 2023
b2df069
Dynamically load NVML
romner-set Jun 1, 2023
a0163ce
Statically link ROCm SMI
romner-set Jun 1, 2023
b9a4d31
Fix Makefile dependency order and layout
aristocratos Jun 1, 2023
093edfe
Minor changes in wording...
aristocratos Jun 1, 2023
daaa453
Load ROCm SMI dynamically by default, optionally statically compile a…
romner-set Jun 2, 2023
cd69792
Fix error when ROCm SMI static compilation fails
romner-set Jun 2, 2023
85a10f0
Fix ROCm SMI makefile flags
romner-set Jun 2, 2023
85892a9
Fix type: ulong -> size_t and compare std::cmp_less
aristocratos Jun 5, 2023
be10989
Parallelize NVML PCIe TX/RX data collection
romner-set Jun 6, 2023
d8ebbe1
Join NVML PCIe threads only if PCIe TX/RX is supported by GPU
romner-set Jun 8, 2023
746f716
Remove lib/rocm_smi_lib and add instructions for obtaining it to README
romner-set Jun 16, 2023
3fad8a6
Add GPU options
romner-set Jun 26, 2023
85fb28c
Fix RSMI_STATIC=true and add GPU section to README.md
romner-set Jul 14, 2023
46c6be0
Fix GPU horizontal text overflow in CPU panel
romner-set Jul 16, 2023
1f73453
Fix crashes when trying to open nth GPU box with only n-1 GPUs in the…
romner-set Jul 19, 2023
972b2b6
Fix available boxes in menu & config description
romner-set Jul 19, 2023
3a5e5fd
Improve 0-10 key input
romner-set Jul 19, 2023
346c9e4
Fix GPU text overflow in CPU panel, again
romner-set Jul 19, 2023
bd5d697
Squashed commit of the following:
aristocratos Aug 26, 2023
b3970ee
Fixed: Key 5-0 gpu box toggle
aristocratos Aug 26, 2023
a9bc087
Added show_gpu_info setting and Auto options for cpu graphs
aristocratos Aug 26, 2023
efddad4
Changed: cpu_graph_lower Auto defaults to cpu_graph_upper when show_g…
aristocratos Aug 26, 2023
283d463
Merge branch 'main' into pr/romner-set/529
aristocratos Aug 26, 2023
7290109
Merge fix
aristocratos Aug 26, 2023
08abf0b
Quickfixes for MacOS and FreeBSD compilation.
aristocratos Aug 26, 2023
975525d
Fix: Cpu gpu stats always shown when show_gpu_info is On and sizing i…
aristocratos Aug 27, 2023
b877726
Added definition GPU_SUPPORT to toggle GPU related code
aristocratos Nov 25, 2023
19bcff8
Squashed commit of the following:
aristocratos Nov 25, 2023
94d4502
Readme update and Makfile fixes.
aristocratos Nov 25, 2023
0bb8599
Merge branch 'main' into main
aristocratos Nov 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Expand Up @@ -51,6 +51,9 @@ bin
btop
.*/

# Optional libraries
lib/rocm_smi_lib

# Don't ignore .github directory
!.github/

Expand Down
84 changes: 65 additions & 19 deletions Makefile
Expand Up @@ -12,10 +12,7 @@ else
endif

ifneq ($(QUIET),true)
override PRE := info info-quiet
override QUIET := false
else
override PRE := info-quiet
endif

OLDCXX := $(CXXFLAGS)
Expand All @@ -39,6 +36,20 @@ endif

override PLATFORM_LC := $(shell echo $(PLATFORM) | tr '[:upper:]' '[:lower:]')

#? GPU Support
ifeq ($(PLATFORM_LC)$(ARCH),linuxx86_64)
ifneq ($(STATIC),true)
GPU_SUPPORT := true
endif
endif
ifneq ($(GPU_SUPPORT),true)
GPU_SUPPORT := false
endif

ifeq ($(GPU_SUPPORT),true)
override ADDFLAGS += -DGPU_SUPPORT
endif

#? Compiler and Linker
ifeq ($(shell $(CXX) --version | grep clang >/dev/null 2>&1; echo $$?),0)
override CXX_IS_CLANG := true
Expand Down Expand Up @@ -206,24 +217,37 @@ endif

P := %%

ifeq ($(VERBOSE),true)
override SUPPRESS := 1>/dev/null
else
override SUPPRESS :=
endif

#? Default Make
all: $(PRE) directories btop
.ONESHELL:
all: | info rocm_smi info-quiet directories btop

ifneq ($(QUIET),true)
info:
@printf " $(BANNER)\n"
@printf "\033[1;92mPLATFORM \033[1;93m?| \033[0m$(PLATFORM)\n"
@printf "\033[1;96mARCH \033[1;93m?| \033[0m$(ARCH)\n"
@printf "\033[1;93mCXX \033[1;93m?| \033[0m$(CXX) \033[1;93m(\033[97m$(CXX_VERSION)\033[93m)\n"
@printf "\033[1;94mTHREADS \033[1;94m:| \033[0m$(THREADS)\n"
@printf "\033[1;92mREQFLAGS \033[1;91m!| \033[0m$(REQFLAGS)\n"
@printf "\033[1;91mWARNFLAGS \033[1;94m:| \033[0m$(WARNFLAGS)\n"
@printf "\033[1;94mOPTFLAGS \033[1;94m:| \033[0m$(OPTFLAGS)\n"
@printf "\033[1;93mLDCXXFLAGS \033[1;94m:| \033[0m$(LDCXXFLAGS)\n"
@printf "\033[1;95mCXXFLAGS \033[1;92m+| \033[0;37m\$$(\033[92mREQFLAGS\033[37m) \$$(\033[93mLDCXXFLAGS\033[37m) \$$(\033[94mOPTFLAGS\033[37m) \$$(\033[91mWARNFLAGS\033[37m) $(OLDCXX)\n"
@printf "\033[1;95mLDFLAGS \033[1;92m+| \033[0;37m\$$(\033[93mLDCXXFLAGS\033[37m) \$$(\033[94mOPTFLAGS\033[37m) \$$(\033[91mWARNFLAGS\033[37m) $(OLDLD)\n"

info-quiet:
@sleep 0.1 2>/dev/null || true
@printf "\033[1;92mPLATFORM \033[1;93m?| \033[0m$(PLATFORM)\n"
@printf "\033[1;96mARCH \033[1;93m?| \033[0m$(ARCH)\n"
@printf "\033[1;95mGPU_SUPPORT \033[1;94m:| \033[0m$(GPU_SUPPORT)\n"
@printf "\033[1;93mCXX \033[1;93m?| \033[0m$(CXX) \033[1;93m(\033[97m$(CXX_VERSION)\033[93m)\n"
@printf "\033[1;94mTHREADS \033[1;94m:| \033[0m$(THREADS)\n"
@printf "\033[1;92mREQFLAGS \033[1;91m!| \033[0m$(REQFLAGS)\n"
@printf "\033[1;91mWARNFLAGS \033[1;94m:| \033[0m$(WARNFLAGS)\n"
@printf "\033[1;94mOPTFLAGS \033[1;94m:| \033[0m$(OPTFLAGS)\n"
@printf "\033[1;93mLDCXXFLAGS \033[1;94m:| \033[0m$(LDCXXFLAGS)\n"
@printf "\033[1;95mCXXFLAGS \033[1;92m+| \033[0;37m\$$(\033[92mREQFLAGS\033[37m) \$$(\033[93mLDCXXFLAGS\033[37m) \$$(\033[94mOPTFLAGS\033[37m) \$$(\033[91mWARNFLAGS\033[37m) $(OLDCXX)\n"
@printf "\033[1;95mLDFLAGS \033[1;92m+| \033[0;37m\$$(\033[93mLDCXXFLAGS\033[37m) \$$(\033[94mOPTFLAGS\033[37m) \$$(\033[91mWARNFLAGS\033[37m) $(OLDLD)\n"
else
info:
@true
endif


info-quiet: | info rocm_smi
@printf "\n\033[1;92mBuilding btop++ \033[91m(\033[97mv$(BTOP_VERSION)\033[91m) \033[93m$(PLATFORM) \033[96m$(ARCH)\033[0m\n"

help:
Expand Down Expand Up @@ -300,9 +324,31 @@ uninstall:
#? Pull in dependency info for *existing* .o files
-include $(OBJECTS:.$(OBJEXT)=.$(DEPEXT))

#? Compile rocm_smi
ifeq ($(GPU_SUPPORT)$(RSMI_STATIC),truetrue)
.ONESHELL:
rocm_smi:
@printf "\n\033[1;92mBuilding ROCm SMI static library\033[37m...\033[0m\n"
@TSTAMP=$$(date +%s 2>/dev/null || echo "0")
@mkdir -p lib/rocm_smi_lib/build
@cd lib/rocm_smi_lib/build
@$(QUIET) || printf "\033[1;97mRunning CMake...\033[0m\n"
@cmake .. $(SUPPRESS) || { printf "\033[1;91mCMake failed, continuing build without statically linking ROCm SMI\033[37m...\033[0m\n"; exit 0; }
@$(QUIET) || printf "\n\033[1;97mBuilding and linking...\033[0m\n"
@$(MAKE) $(SUPPRESS) || { printf "\033[1;91mMake failed, continuing build without statically linking ROCm SMI\033[37m...\033[0m\n"; exit 0; }
@ar -crs rocm_smi/librocm_smi64.a $$(find rocm_smi -name '*.o') $(SURPRESS) || { printf "\033[1;91mFailed to pack ROCm SMI into static library, continuing build without statically linking ROCm SMI\033[37m...\033[0m\n"; exit 0; }
@printf "\033[1;92m100$(P)\033[10D\033[5C-> \033[1;37mrocm_smi/librocm_smi64.a \033[100D\033[38C\033[1;93m(\033[1;97m$$(du -ah rocm_smi/librocm_smi64.a | cut -f1)iB\033[1;93m)\033[0m\n"
@printf "\033[1;92mROCm SMI build complete in \033[92m(\033[97m$$($(DATE_CMD) -d @$$(expr $$(date +%s 2>/dev/null || echo "0") - $(TIMESTAMP) 2>/dev/null) -u +%Mm:%Ss 2>/dev/null | sed 's/^00m://' || echo "unknown")\033[92m)\033[0m\n"
@$(eval override LDFLAGS += lib/rocm_smi_lib/build/rocm_smi/librocm_smi64.a -DRSMI_STATIC) # TODO: this seems to execute every time, no matter if the compilation failed or succeeded
@$(eval override CXXFLAGS += -DRSMI_STATIC)
else
rocm_smi:
@true
endif

#? Link
.ONESHELL:
btop: $(OBJECTS) | directories
btop: $(OBJECTS) | rocm_smi directories
@sleep 0.2 2>/dev/null || true
@TSTAMP=$$(date +%s 2>/dev/null || echo "0")
@$(QUIET) || printf "\n\033[1;92mLinking and optimizing binary\033[37m...\033[0m\n"
Expand All @@ -313,7 +359,7 @@ btop: $(OBJECTS) | directories

#? Compile
.ONESHELL:
$(BUILDDIR)/%.$(OBJEXT): $(SRCDIR)/%.$(SRCEXT) | directories
$(BUILDDIR)/%.$(OBJEXT): $(SRCDIR)/%.$(SRCEXT) | rocm_smi directories
@sleep 0.3 2>/dev/null || true
@TSTAMP=$$(date +%s 2>/dev/null || echo "0")
@$(QUIET) || printf "\033[1;97mCompiling $<\033[0m\n"
Expand Down
54 changes: 53 additions & 1 deletion README.md
Expand Up @@ -33,12 +33,33 @@
* [Compilation Linux](#compilation-linux)
* [Compilation macOS](#compilation-macos-osx)
* [Compilation FreeBSD](#compilation-freebsd)
* [GPU compatibility](#gpu-compatibility)
* [Installing the snap](#installing-the-snap)
* [Configurability](#configurability)
* [License](#license)

## News

##### 25 November 2023

GPU monitoring added for Linux!

Compile from git main to try it out.

Use keys `5`, `6`, `7` and `0` to show/hide the gpu monitoring boxes. `5` = Gpu 1, `6` = Gpu 2, etc.

Gpu stats/graphs can also be displayed in the "Cpu box" (not as verbose), see the cpu options menu for info and configuration.

Note that the binaries provided on the release page (when released) and the continuous builds will not have gpu support enabled.

Because the GPU support relies on loading of dynamic gpu libraries, gpu support will not work when also static linking.

See [Compilation Linux](#compilation-linux) for more info on how to compile with gpu monitoring support.

Many thanks to [@romner-set](https://github.com/romner-set) who wrote the vast majority of the implementation for GPU support.

Big update with version bump to 1.3 coming soon.

##### 28 August 2022

[![btop4win](https://github.com/aristocratos/btop4win/raw/master/Img/logo.png)](https://github.com/aristocratos/btop4win)
Expand Down Expand Up @@ -309,6 +330,34 @@ Also needs a UTF8 locale and a font that covers:

The makefile also needs GNU coreutils and `sed` (should already be installed on any modern distribution).

### GPU compatibility

Btop++ supports NVIDIA and AMD GPUs out of the box on Linux x86_64, provided you have the correct drivers and libraries.

Compatibility with Intel GPUs using generic DRM calls is planned, as is compatibility for FreeBSD and macOS.

Gpu support will not work when static linking glibc (or musl, etc.)!

For x86_64 Linux the flag `GPU_SUPPORT` is automatically set to `true`, to manually disable gpu support set the flag to false, like:

`make GPU_SUPPORT=false`

* **NVIDIA**

You must use an official NVIDIA driver, both the closed-source and [open-source](https://github.com/NVIDIA/open-gpu-kernel-modules) ones have been verified to work.

In addition to that you must also have the `nvidia-ml` dynamic library installed, which should be included with the driver package of your distribution.

* **AMD**

AMDGPU data is queried using the [ROCm SMI](https://github.com/RadeonOpenCompute/rocm_smi_lib) library, which may or may not be packaged for your distribution. If your distribution doesn't provide a package, btop++ is statically linked to ROCm SMI with the `RSMI_STATIC=true` make flag.

This flag expects the ROCm SMI source code in `lib/rocm_smi_lib`, and compilation will fail if it's not there. The latest tested version is 5.6.x, which can be obtained with the following command:

```bash
git clone https://github.com/RadeonOpenCompute/rocm_smi_lib.git --depth 1 -b rocm-5.6.x lib/rocm_smi_lib
```

<details>

<summary>
Expand Down Expand Up @@ -347,6 +396,9 @@ Also needs a UTF8 locale and a font that covers:
Append `ARCH=<architecture>` to manually set the target architecture.
If omitted the makefile uses the machine triple (output of `-dumpmachine` compiler parameter) to detect the target system.

Append `RSMI_STATIC=true` to statically link the ROCm SMI library used for querying AMDGPU data.
See [GPU compatibility](#gpu-compatibility) for details.

Use `ADDFLAGS` variable for appending flags to both compiler and linker.

For example: `ADDFLAGS=-march=native` might give a performance boost if compiling only for your own system.
Expand Down Expand Up @@ -834,7 +886,7 @@ graph_symbol_net = "default"
# Graph symbol to use for graphs in cpu box, "default", "braille", "block" or "tty".
graph_symbol_proc = "default"

#* Manually set which boxes to show. Available values are "cpu mem net proc", separate values with whitespace.
#* Manually set which boxes to show. Available values are "cpu mem net proc" and "gpu0" through "gpu5", separate values with whitespace.
shown_boxes = "proc cpu mem net"

#* Update time in milliseconds, recommended 2000 ms or above for better sample times for graphs.
Expand Down