先定义一个4b的tensor，并向显存申请4b的空间，我们可以看见其分配了512b的空间，同时pytorch向GPU申请了2MB的空间存储在cache中

In [None]:
import torch
a = torch.zeros(1).to('cuda')
print(torch.cuda.memory_allocated(),'B')
print(torch.cuda.memory_reserved(),'B')

512 B
2097152 B


删除a，再次检测显存占用

In [None]:
del a

In [None]:
print(torch.cuda.memory_allocated(),'B')
print(torch.cuda.memory_reserved(),'B')

0 B
2097152 B


清空pytroch的cache，这种行为仅仅建议当你想要释放缓存以便让其他人也可以一起使用当前显卡，否则不需要调用这个方法

In [None]:
torch.cuda.empty_cache()
print(torch.cuda.memory_allocated(),'B')
print(torch.cuda.memory_reserved(),'B')

0 B
0 B


再看另外一个例子，这次向显存申请10240\*1050\*4/1024/1024=41.015625MB的显存空间，同理可见pytorch为其分配了42MB的显存空间

In [None]:
import torch
a = torch.zeros(10240,1050).to('cuda')
print(torch.cuda.memory_allocated()/1024/1024,'MB')
print(torch.cuda.memory_reserved()/1024/1024,'MB')

42.0 MB
42.0 MB


In [None]:
del a

In [None]:
torch.cuda.empty_cache()

我们来实际看一下这种情况对于模型训练的分配情况的影响（此处仅仅介绍缓存按页分配，不分析其他的计算，关于模型GPU显存的占用请看下一章节）
由下方代码可知，最后为模型结果以及中间变量分配了8.515625MB的显存空间，但是其实际占用仅仅为10240\*10\*4/1024/1024=0.390625MB，表明理解显存按页分配对模型GPU占用具有重要意义

In [None]:
import torch

last_gpu_memory = 0

def get_memory():
  global last_gpu_memory
  last = last_gpu_memory
  now = torch.cuda.memory_allocated()/1024/1024
  last_gpu_memory = now
  return now,now-last
class BasicModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.lr = torch.nn.Linear(1024,10)

  def forward(self,x):
    x = self.lr(x)
    return x

model = BasicModel().to('cuda')
print('mymodel: ',get_memory())
data = torch.zeros(10240,1024).to('cuda')
print('input: ',get_memory())
out = model(data)
print("output and intermediate: ",get_memory())
print(torch.cuda.memory_reserved()/1024/1024,'MB')

mymodel:  (0.03955078125, 0.03955078125)
input:  (40.03955078125, 40.0)
output and intermediate:  (48.55517578125, 8.515625)
62.0 MB


使用pynvml 库

In [1]:
pip install pynvml

Collecting pynvml
  Downloading pynvml-11.5.0-py3-none-any.whl (53 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.1/53.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pynvml
Successfully installed pynvml-11.5.0


查看占用的缓存，在使用之前先检测kernal占用的gpu大小（查看之前先清空之前pytorch申请的内存）

In [3]:
import torch
from pynvml import *


def print_gpu_utilization():
    nvmlInit()
    handle = nvmlDeviceGetHandleByIndex(0)
    info = nvmlDeviceGetMemoryInfo(handle)
    print(f"GPU memory occupied: {info.used//1024**2} MB.")

print_gpu_utilization()

torch.ones((1, 1)).to("cuda")
print_gpu_utilization()

GPU memory occupied: 258 MB.
GPU memory occupied: 363 MB.
