### **The final class of** Deep Learning with PyTorch: A 60 Minute Blitz

*嗷哦哦哦哦哦哦哦哦哦┗|｀O′|┛ 嗷~~*

#### 用GPU训练CIFER-10

In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.optim as optim

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        self.conv1=nn.Conv2d(3,100,5)
        self.pool=nn.MaxPool2d(2,2)
        self.conv2=nn.Conv2d(100,16,5)
        self.fc1=nn.Linear(16*5*5,120) #6*6来自于图像维度
        self.fc2=nn.Linear(120,84)
        self.fc3=nn.Linear(84,10)
        
    def forward(self,x):
        x=self.pool(F.relu(self.conv1(x)))
        x=self.pool(F.relu(self.conv2(x)))
        x=x.view(-1,16*5*5)
        x=F.relu(self.fc1(x))
        x=F.relu(self.fc2(x))
        x=self.fc3(x)
        return x
net=Net()

transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
trainset=torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
trainloader=torch.utils.data.DataLoader(trainset,batch_size=4,shuffle=True,num_workers=2)
testset=torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
testloader=torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=2)
classes=('plane','car','bird','cat','deer','dog','frog','horse','ship','truck')

criterion=nn.CrossEntropyLoss()
optimizer=optim.SGD(net.parameters(),lr=0.001,momentum=0.9)


Files already downloaded and verified
Files already downloaded and verified


用单个GPU训练

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)

Net(
  (conv1): Conv2d(3, 100, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(100, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [4]:
%%time
for epoch in range(2):
    running_loss=0
    for i,data in enumerate(trainloader,0):
        # get the inputs; data is a list of [inputs, labels]
        inputs,labels=data[0].to(device),data[1].to(device)
        # zero the parameter gradients
        optimizer.zero_grad()
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step() # Does the update
        
        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999: # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
            running_loss=0.0
print("Finished Training")

[1,  2000] loss: 2.080
[1,  4000] loss: 1.709
[1,  6000] loss: 1.562
[1,  8000] loss: 1.482
[1, 10000] loss: 1.403
[1, 12000] loss: 1.359
[2,  2000] loss: 1.285
[2,  4000] loss: 1.260
[2,  6000] loss: 1.225
[2,  8000] loss: 1.184
[2, 10000] loss: 1.171
[2, 12000] loss: 1.159
Finished Training
Wall time: 2min 25s


对比用CPU训练（*用时6min1s*），用CPU训练大大提高了速度。

但当网络规模很小时，GPU训练加速效果不明显，甚至比CPU还慢

由于本电脑只有一块GPU，所以尚未进行数据并行（*Data Parallelism*）的尝试。

等使用服务器时，再做[并行尝试](https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html)