# DeZero with GPU

使用Google Colab服务，我们就可以在GPU上运行DeZero了（在Google Colab上可以免费使用GPU）。这里我们在CPU/GPU之间切换运行DeZero的计算，观察在GPU上的运行速度有多少提升。

# DeZero的安装

首先安装DeZero。由于DeZero已发布到[PyPI](https://pypi.org/project/dezero/)中，所以我们可以通过`pip install dezero`命令来安装它。

In [1]:
pip install dezero

Collecting dezero
  Downloading https://files.pythonhosted.org/packages/1c/d0/bdc1949ff8bcba4a1cf572174e17cc7971daf30989f278c904f97c91ff3a/dezero-0.0.11-py3-none-any.whl
Installing collected packages: dezero
Successfully installed dezero-0.0.11


接下来检查在DeZero中能否使用GPU。

In [0]:
import dezero
dezero.cuda.gpu_enable

True

如果是`True`，则说明GPU处于可用的状态。继续后续的操作。

如果是`False`，则说明我们需要在Google Colab中对GPU进行设置。设置方法如下所示。

* 从菜单的“运行时”中选择“改变运行时类型”
* 从“硬件加速器”下拉菜单选择“GPU”

# Train MNIST with CPU
下面使用DeZero训练MNIST。
首先在CPU上计算。

In [0]:
import time
import dezero
import dezero.functions as F
from dezero import optimizers
from dezero import DataLoader
from dezero.models import MLP

max_epoch = 5
batch_size = 100
cpu_times = []

train_set = dezero.datasets.MNIST(train=True)
train_loader = DataLoader(train_set, batch_size)
model = MLP((1000, 10))
optimizer = optimizers.SGD().setup(model)

for epoch in range(max_epoch):
    start = time.time()
    sum_loss = 0

    for x, t in train_loader:
        y = model(x)
        loss = F.softmax_cross_entropy(y, t)
        model.cleargrads()
        loss.backward()
        optimizer.update()
        sum_loss += float(loss.data) * len(t)

    elapsed_time = time.time() - start
    cpu_times.append(elapsed_time)
    print('epoch: {}, loss: {:.4f}, time: {:.4f}[sec]'.format(
        epoch + 1, sum_loss / len(train_set), elapsed_time))

epoch: 1, loss: 1.9140, time: 7.8949[sec]
epoch: 2, loss: 1.2791, time: 7.8918[sec]
epoch: 3, loss: 0.9211, time: 7.9565[sec]
epoch: 4, loss: 0.7381, time: 7.8198[sec]
epoch: 5, loss: 0.6339, time: 7.9302[sec]


# Train MNIST on GPU
接下来使用GPU进行计算。

In [0]:
gpu_times = []

# GPU mode
train_loader.to_gpu()
model.to_gpu()

for epoch in range(max_epoch):
    start = time.time()
    sum_loss = 0

    for x, t in train_loader:
        y = model(x)
        loss = F.softmax_cross_entropy(y, t)
        model.cleargrads()
        loss.backward()
        optimizer.update()
        sum_loss += float(loss.data) * len(t)

    elapsed_time = time.time() - start
    gpu_times.append(elapsed_time)
    print('epoch: {}, loss: {:.4f}, time: {:.4f}[sec]'.format(
        epoch + 1, sum_loss / len(train_set), elapsed_time))

epoch: 1, loss: 0.5678, time: 1.5356[sec]
epoch: 2, loss: 0.5227, time: 1.5687[sec]
epoch: 3, loss: 0.4898, time: 1.5498[sec]
epoch: 4, loss: 0.4645, time: 1.5433[sec]
epoch: 5, loss: 0.4449, time: 1.5512[sec]


以上计算的结果如下所示。

In [0]:
cpu_avg_time = sum(cpu_times) / len(cpu_times)
gpu_avg_time = sum(gpu_times) / len(gpu_times)

print('CPU: {:.2f}[sec]'.format(cpu_avg_time))
print('GPU: {:.2f}[sec]'.format(gpu_avg_time))
print('GPU speedup over CPU: {:.1f}x'.format(cpu_avg_time/gpu_avg_time))

CPU: 7.90[sec]
GPU: 1.55[sec]
GPU speedup over CPU: 5.1x
