定义获取显存占用的函数，返回总占用和当前行为占用

In [1]:
import torch
import torch.nn.functional as F

last_gpu_memory = 0

def get_memory():
  global last_gpu_memory
  last = last_gpu_memory
  now = torch.cuda.memory_allocated()/1024/1024
  last_gpu_memory = now
  return now,now-last
print(get_memory())

(0.0, 0.0)


用于分析显存占用的模型，包括三个线性层，一个激活层，一个softmax层

In [2]:
class BasicModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.lr1 = torch.nn.Linear(1024,1024)
    self.relu = torch.nn.ReLU()
    self.lr2 = torch.nn.Linear(1024,1024)
    self.sof = torch.nn.Softmax(dim = -1)
    self.lr3 = torch.nn.Linear(1024,2048)

  def forward(self,x):
    x = self.lr1(x)
    x = self.relu(x)
    x = self.lr2(x)
    x = self.sof(x)
    x = self.lr3(x)
    return x


测试模型参数占用

In [3]:
model = BasicModel().to('cuda')
print('mymodel: ',get_memory())

mymodel:  (16.015625, 16.015625)


测试输入数据占用

In [4]:
data = torch.zeros(10240,1024).to('cuda')
print('input: ',get_memory())

input:  (56.015625, 40.0)


测试输出以及中间激活值占用

In [5]:
out = model(data)
print("output and intermediate: ",get_memory())

output and intermediate:  (224.140625, 168.125)


显存按页分配，最低分配512字节

In [6]:
loss = torch.sum(out)
print("loss: ",get_memory())

loss:  (224.14111328125, 0.00048828125)


反向传播显存占用

In [7]:
loss.backward()
print("after backward: ",get_memory())
torch.cuda.max_memory_allocated()/1024/1024

after backward:  (168.28173828125, -55.859375)


360.2744140625

优化器参数占用

In [8]:
import torch.optim as optim
optimizer = optim.AdamW(model.parameters())
optimizer.step()
print("optimizer: ",get_memory())
torch.cuda.max_memory_allocated()/1024/1024

optimizer:  (200.31298828125, 32.03125)


360.2744140625