```
!pip install nvidia-ml-py3
```

- SMI: system management Interface

## basics

- `nvcc -V`
    - 这里显示的 cuda 版本，很可能与 `nvidia-smi` 显示的版本不一致

```
nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version,compute_cap --format=csv
```

In [1]:
!nvidia-smi --query-gpu=gpu_name,gpu_bus_id,vbios_version,compute_cap --format=csv

name, pci.bus_id, vbios_version, compute_cap
NVIDIA GeForce RTX 4090, 00000000:18:00.0, 95.02.3C.00.99, 8.9
NVIDIA GeForce RTX 4090, 00000000:8A:00.0, 95.02.3C.00.99, 8.9


### device id（设备编号）

- https://blog.csdn.net/sdnuwjw/article/details/111615052
    - nvidia-smi 下的 GPU 编号默认使用 PCI_BUS_ID，而 PyTorch 代码默认情况下设备排序是 FASTEST_FIRST

In [1]:
!nvidia-smi -L

GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-3aea0781-60bd-0145-884f-dcea78424adb)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-3c7f0ec9-c4bd-0098-3e5b-2169faec6f6c)


In [2]:
import torch

In [3]:
torch.cuda.get_device_properties(1)

_CudaDeviceProperties(name='NVIDIA GeForce RTX 4090', major=8, minor=9, total_memory=24217MB, multi_processor_count=128)

In [4]:
torch.cuda.get_device_properties(0)

_CudaDeviceProperties(name='NVIDIA GeForce RTX 4090', major=8, minor=9, total_memory=24217MB, multi_processor_count=128)

## xorg

- `/usr/lib/xorg/Xorg`
    - 指的是X.Org Server，这是Linux和UNIX系统上的图形服务器，负责处理图形显示。

## nvlink

- 3090 与 A100 nvlink 可以通用；
- 3090 只能两两相连
    - 卡A只能跟卡B相连
    - 卡A如果跟卡B相连，就不能再跟卡C相连
- A100 可以4卡相连

## pynvml

In [5]:
from pynvml import *


def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")


In [9]:
import pynvml
# 初始化 NVML
pynvml.nvmlInit()

# 获取 GPU 数量
device_count = pynvml.nvmlDeviceGetCount()

# 遍历每个 GPU 并获取其 UUID
for i in range(device_count):
    handle = pynvml.nvmlDeviceGetHandleByIndex(i)
    uuid = pynvml.nvmlDeviceGetUUID(handle)
    print(f"GPU {i} UUID: {uuid}")
pynvml.nvmlShutdown()

GPU 0 UUID: GPU-3aea0781-60bd-0145-884f-dcea78424adb
GPU 1 UUID: GPU-3c7f0ec9-c4bd-0098-3e5b-2169faec6f6c


In [4]:
# 初始状况下的 base 显存占用
print_gpu_utilization()

GPU memory occupied: 2663 MB.


In [5]:
import torch
# 此时 kernels 也会被 loaded
torch.ones((1, 1)).to("cuda")
print_gpu_utilization()

GPU memory occupied: 2663 MB.
