## Train Chainer : GPU

参考：<br>
- 深層学習コンパイラTVMと主要深層学習フレームワークをColaboratoryで使い倒そう<br>
https://qiita.com/stakemura/items/1761be70a06fa8ee853f

- 簡単なCNNによるディープラーニングライブラリ速度比較<br>
https://qiita.com/daigo0927/items/8092f3ff5276ffc4f088

## GPU モードの設定

メニューより<br>
　　<strong>ランタイム  ⇒  ランタイムのタイプを変更</strong> <br>
 を選択して、現れたダイアログで<br>
- ランタイムのタイプ  = <font color='red'><strong>Python3</strong></font>
- ハードウェアアクセラレータ  = <font color='red'><strong>GPU</strong></font>
- このノートブックを保存する際にコードセルの出力を除外する = <font color='red'><strong>OFF</strong></font>

に設定してから【保存】ボタンを押す。

## Google Drive をマウント

### <font color='red'>注意</font>
ランタイムの最初において、下記のコードを実行すると、<font color='red'><strong>認証コード</strong></font> の URL が表示される。<br>
URL をクリックして、リンク先で自分のアカウントを選択して認証した後、<br>
表示された認証コードをコピーして、下記の入力欄にペーストすればマウントが完了する。

### 参考：
　　Google ドライブの使い方<br>
　　https://www.appsupport.jp/category/drive/

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## パスを追加して独自のパッケージをインポートできるようにする

In [0]:
import sys
sys.path.append('/content/drive/My Drive/compare_deeplibs')

In [3]:
!ls -l /content/drive/'My Drive'/compare_deeplibs

total 1306
drwx------ 2 root root    4096 Aug 25 05:33 CIFAR10
-rw------- 1 root root    9130 Aug 25 08:35 lap_record.csv
drwx------ 2 root root    4096 Aug 25 07:33 model_chainer
drwx------ 2 root root    4096 Aug 25 05:02 model_keras
drwx------ 2 root root    4096 Aug 25 04:20 model_tf
-rw------- 1 root root 1172512 Aug 25 08:28 model_torch.pth
drwx------ 2 root root    4096 Aug 24 23:11 __pycache__
-rw------- 1 root root   19190 Aug 25 08:22 train_Chainer_GPU_Tesla-T4.ipynb
-rw------- 1 root root   20845 Aug 25 08:22 train_Keras_GPU_Tesla-T4.ipynb
-rw------- 1 root root   30977 Aug 25 06:51 train_Keras_TPU.ipynb
-rw------- 1 root root   18555 Aug 25 08:35 train_PyTorch_GPU_Tesla-T4.ipynb
-rw------- 1 root root   19678 Aug 25 06:38 train_TensorFlow_GPU_Tesla-K80.ipynb
-rw------- 1 root root   19621 Aug 25 08:22 train_TensorFlow_GPU_Tesla-T4.ipynb
-rw------- 1 root root    3432 Aug 24 22:59 utils.py


In [4]:
from utils import load_cifar10, load_cifar100, show_progress

Using TensorFlow backend.


## Chainer のバージョンの確認

In [5]:
import chainer
print(chainer.__version__)

5.4.0


## GPU のデバイスの情報を表示

In [6]:
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
device_name = cuda.get_device_name(cuda.current_device())
print(device_name)

Tesla T4


## モデルを構築するためのクラス

In [0]:
from chainer import Chain
import chainer.links as L
import chainer.functions as F

class CNN(Chain):
    def __init__(self, num_output=10):
        super(CNN, self).__init__()
        
        with self.init_scope():
            self.conv1 = L.Convolution2D(in_channels=3, out_channels=64, ksize=5, stride=1, pad=2)
            self.bn1   = L.BatchNormalization(64)
            self.conv2 = L.Convolution2D(64, 128, 5, 1, 2)
            self.bn2   = L.BatchNormalization(128)
            self.fc    = L.Linear(None, num_output)

    def __call__(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.max_pooling_2d(x, 2, 2)
        x = F.relu(self.bn2(self.conv2(x)))
        x = F.max_pooling_2d(x, 2, 2)
        
        return self.fc(x)

## 学習を管理するクラス

In [0]:
import time
import os
import numpy as np
from chainer.optimizers import Adam
from chainer.cuda import to_gpu, to_cpu
from chainer import serializers

class Trainer(object):
    def __init__(self, num_epochs, batch_size, gpu_id):
        self.num_epochs = num_epochs
        self.batch_size = batch_size
        self.gpu_id     = gpu_id
        self.net = CNN()
        self.opt = Adam()
        self.opt.setup(self.net)
        self.load_cifar10()

    def load_cifar10(self):
        (x_train, y_train), (x_test, y_test) = load_cifar10(to_categoric=False)
        self.x_train = np.transpose(x_train.astype('f'), (0, 3, 1, 2))
        self.y_train = y_train.flatten().astype('i')
        self.x_test  = np.transpose(x_test.astype('f'), (0, 3, 1, 2))
        self.y_test  = y_test.flatten().astype('i')

    def load_cifar100(self):
        (x_train, y_train), (x_test, y_test) = load_cifar100(to_categoric = False)
        self.x_train = np.transpose(x_train.astype('f'), (0, 3, 1, 2))
        self.y_train = y_train.flatten().astype('i')
        self.x_test  = np.transpose(x_test.astype('f'), (0, 3, 1, 2))
        self.y_test  = y_test.flatten().astype('i')

    def train(self):
        if self.gpu_id is not None:
            self.net.to_gpu(self.gpu_id)
            self.x_test = to_gpu(self.x_test, self.gpu_id)
            self.y_test = to_gpu(self.y_test, self.gpu_id)

        num_batches = int(len(self.x_train) / self.batch_size)
        print('epochs : {}, number of batches : {}'.format(self.num_epochs, num_batches))

        lap_times = []
        # training iteration
        for e in range(self.num_epochs):
            permute_idx = np.random.permutation(np.arange(50000))
            lap_time = []

            for b in range(num_batches):
                x_batch = self.x_train[permute_idx[b*self.batch_size:(b+1)*self.batch_size]]
                y_batch = self.y_train[permute_idx[b*self.batch_size:(b+1)*self.batch_size]]

                s_time = time.time()
                if self.gpu_id is not None:
                    x_batch = to_gpu(x_batch, self.gpu_id)
                    y_batch = to_gpu(y_batch, self.gpu_id)
                logits = self.net(x_batch)
                loss = F.softmax_cross_entropy(logits, y_batch)
                self.net.cleargrads()
                loss.backward()
                self.opt.update()
                e_time = time.time()
                lap_time.append(e_time - s_time)

                if b % 10 == 0:
                    loss = to_cpu(loss.data)
                    acc = F.accuracy(logits, y_batch)
                    acc = to_cpu(acc.data)
                    show_progress(e+1, b+1, num_batches, loss, acc)

            # record single epoch training lap-time
            lap_times.append(np.sum(lap_time))

            # validation
            accs_val = []
            for b in range(int(len(self.x_test) / self.batch_size)):
                x_val = self.x_test[b*self.batch_size:(b+1)*self.batch_size]
                y_val = self.y_test[b*self.batch_size:(b+1)*self.batch_size]
                
                preds_val = self.net(x_val)
                acc_val = F.accuracy(preds_val, y_val)
                accs_val.append(to_cpu(acc_val.data))
            print('\n{} epoch validation accuracy {}'.format(e+1, np.mean(accs_val)))

            # save trained model
            #if not os.path.exists('/content/drive/My Drive/compare_deeplibs/model_chainer'):
            #    os.mkdir('/content/drive/My Drive/compare_deeplibs/model_chainer')
            #serializers.save_npz('/content/drive/My Drive/compare_deeplibs/model_chainer/chainer{}.model'.format(e), self.net)

        with open('/content/drive/My Drive/compare_deeplibs/lap_record.csv', 'a') as f:
            f.write('Chainer-GPU')
            f.write(',' + device_name)
            for lap in lap_times:
                f.write(',' + str(lap))
            f.write('\n')

## 学習を実行するための関数

In [0]:
def train_chainer(args):
    trainer = Trainer(num_epochs = args['epochs'],
                      batch_size = args['batch_size'],
                      gpu_id     = args['gpu_id'])
    trainer.train()

## 計算開始時刻の記録

Google Colaboratory で実行する際に、日本時間の時刻を表示するためにはタイムゾーンの取得が必要となる。

In [10]:
import datetime
import pytz

start_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print(start_time)

2019-08-25 17:38:27.633142+09:00


## 学習の実行

In [11]:
args={
    'epochs'     : 20,
    'batch_size' : 128,
    'gpu_id'     : 0
}

print(args)

for key, value in args.items():
    print('{:12s} : {}'.format(key, value))

train_chainer(args)

{'epochs': 20, 'batch_size': 128, 'gpu_id': 0}
epochs       : 20
batch_size   : 128
gpu_id       : 0
load cifar10 data ...
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
epochs : 20, number of batches : 390
  1: [  381 /   390] loss: 1.150800 acc: 0.625000
1 epoch validation accuracy 0.582932710647583
  2: [  381 /   390] loss: 0.946729 acc: 0.703125
2 epoch validation accuracy 0.6391226053237915
  3: [  381 /   390] loss: 0.912600 acc: 0.687500
3 epoch validation accuracy 0.6977163553237915
  4: [  381 /   390] loss: 0.954384 acc: 0.734375
4 epoch validation accuracy 0.7116386294364929
  5: [  381 /   390] loss: 0.529003 acc: 0.804688
5 epoch validation accuracy 0.7126402258872986
  6: [  381 /   390] loss: 0.607578 acc: 0.789062
6 epoch validation accuracy 0.7395833134651184
  7: [  381 /   390] loss: 0.506572 acc: 0.796875
7 epoch validation accuracy 0.7333734035491943
  8: [  381 /   390] loss: 0.631446 acc: 0.765625
8 epoch validation accuracy 0.7380

## 学習に要した時間の表示

In [12]:
end_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print("\nStart   Time  : " + str(start_time))
print(  "End     Time  : " + str(end_time))
print(  "Elapsed Time  : " + str(end_time - start_time))


Start   Time  : 2019-08-25 17:38:27.633142+09:00
End     Time  : 2019-08-25 17:41:04.741417+09:00
Elapsed Time  : 0:02:37.108275


## Google Colaboratory のセッションを開始してからの経過時間を表示

In [13]:
!cat /proc/uptime | awk '{print "経過時間 : " ($1 / 3600) " hours (" $1 " sec)"}'

経過時間 : 0.0711889 hours (256.28 sec)
