[toc]

# Pytorch GPU 训练

## 单GPU训练

查看 gpu 是否可用

In [5]:
import torch
torch.cuda.is_available()

False

使用 gpu 训练的时候，有三点需要注意

1. 将模型放到 gpu 上。只需要将最终的模型放到 gpu 上即可，模型中包含的层会被自动放到 gpu 上。
2. 将训练数据放到 gpu 上
3. 将输出数据从gpu上取下来

有两种方式可以将模型放到 gpu 上

1. `model = model.to("cuda:0")`
2. `model = model.cuda()`

注意，上面的模型修改不是原地的，需要进行赋值。

### 使用 tensor.to(device)


In [44]:
import torch
from sklearn.datasets import load_boston
from matplotlib import pyplot as plt

boston = load_boston()
x = torch.Tensor(boston['data'])
y = torch.Tensor(boston['target'])

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = torch.nn.Linear(13, 1)
model = model.to(device)

loss = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

n_epochs = 100
for epoch in range(n_epochs):
    x = x.to(device)
    yhat = model(x).view(-1)
    mse = loss(y, yhat)
    
    optimizer.zero_grad()
    mse.backward()
    optimizer.step()

### 使用 tensor.cuda()

In [46]:
import torch
from sklearn.datasets import load_boston
from matplotlib import pyplot as plt

boston = load_boston()
x = torch.Tensor(boston['data'])
y = torch.Tensor(boston['target'])

model = torch.nn.Linear(13, 1)
if torch.cuda.is_available():
    model = model.cuda()

loss = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

n_epochs = 100
for epoch in range(n_epochs):
    if torch.cuda.is_available():
        x = x.cuda()
    yhat = model(x).view(-1)
    mse = loss(y, yhat)
    
    optimizer.zero_grad()
    mse.backward()
    optimizer.step()

## 如何判断 model 和 tensor 所在的 device

对于 tensor 来说，可以用 tensor.device 来查看

In [38]:
x = torch.randn(3, 2)
print(x.device)

cpu


对于 model 来说，可以查看 model.parameters 所在的 device

In [37]:
class TestModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(TestModel, self).__init__()
        self.fc = nn.Linear(input_dim, output_dim)
        
    def forward(self, x):
        return self.fc(x)
    
model = TestModel(2, 3)
print(next(model.parameters()).device)

cpu


## 在类中使用中间变量时需要手动 to_device

请看下面的代码，这个代码在 gpu 条件下会报错 `RuntimeError: Input and hidden tensors are not at the same device, found input tensor at cuda:0 and hidden tensor at cpu`

In [26]:
import torch
import torch.nn as nn

class SimpleRNN(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleRNN, self).__init__()
        self.hidden_dim = hidden_dim
        self.rnn = nn.RNN(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        batch_size = x.shape[1]
        hidden = torch.zeros([1, batch_size, self.hidden_dim]) # 这个没有 to.device
        _, hidden = self.rnn(x, hidden)
        logits = self.fc(hidden.squeeze(1))
        return logits
    
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
batch_size = 3
seq_len = 4
input_dim = 10
output_dim = 3
hidden_dim = 5

x = torch.randn(seq_len, batch_size, input_dim)
x = x.to(device)

rnn = SimpleRNN(input_dim, hidden_dim, output_dim)
rnn = rnn.to(device)
output = rnn(x)
print(output)

tensor([[[ 0.2704,  0.3590, -0.7894],
         [ 0.1826,  0.0097,  0.1279],
         [ 0.2195,  0.1916, -0.1293]]], grad_fn=<AddBackward0>)


model.to(device) 的时候会将 model.parameters() 中的变量也调用 to(device)。但是 hidden 是在 forward 中间创建的变量，因此不会自动对它调用 to(device)

解决方法是想办法对 hidden 手动调用 to(device)，下面给出两种可行的解决方式

### 解决方式

#### 在运行时获取 device

In [28]:
class SimpleRNN(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleRNN, self).__init__()
        self.hidden_dim = hidden_dim
        self.rnn = nn.RNN(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        batch_size = x.shape[1]
        
        # self.parameters() 是一个 generator，需要 next 得到它的一个元素
        device = next(self.parameters()).device
        hidden = torch.zeros([1, batch_size, self.hidden_dim]) # 这个没有 to.device
        hidden = hidden.to(device)
        
        _, hidden = self.rnn(x, hidden)
        logits = self.fc(hidden.squeeze(1))
        return logits
    
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
batch_size = 3
seq_len = 4
input_dim = 10
output_dim = 3
hidden_dim = 5

x = torch.randn(seq_len, batch_size, input_dim)
x = x.to(device)

rnn = SimpleRNN(input_dim, hidden_dim, output_dim)
rnn = rnn.to(device)
output = rnn(x)
print(output)

#### 初始化时就决定device

另一种可行的解决方式是将 device 作为一个参数，在初始化的时候就传递

In [30]:
class SimpleRNN(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim, device):
        super(SimpleRNN, self).__init__()
        self.hidden_dim = hidden_dim
        self.device = device
        self.rnn = nn.RNN(input_dim, hidden_dim)
        self.fc = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        batch_size = x.shape[1]
        
        # self.parameters() 是一个 generator，需要 next 得到它的一个元素
        hidden = torch.zeros([1, batch_size, self.hidden_dim]) # 这个没有 to.device
        hidden = hidden.to(self.device)
        
        _, hidden = self.rnn(x, hidden)
        logits = self.fc(hidden.squeeze(1))
        return logits
    
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
batch_size = 3
seq_len = 4
input_dim = 10
output_dim = 3
hidden_dim = 5

x = torch.randn(seq_len, batch_size, input_dim)
x = x.to(device)

rnn = SimpleRNN(input_dim, hidden_dim, output_dim, device)
rnn = rnn.to(device)
output = rnn(x)
print(output)

tensor([[[-0.1413,  0.0965,  0.2120],
         [-0.4704,  0.0670,  0.2349],
         [ 0.1077,  0.5577, -0.1399]]], grad_fn=<AddBackward0>)


# References
1. [(1 封私信 / 18 条消息) 如何在pytorch中正确使用GPU进行训练？ - 知乎](https://www.zhihu.com/question/345418003)

2. [(2条消息)pytorch查看torch.Tensor和model是否在CUDA上_WYXHAHAHA123的博客-CSDN博客](https://blog.csdn.net/WYXHAHAHA123/article/details/86596981)