## Train TensorFlow : GPU

参考：<br>
- 深層学習コンパイラTVMと主要深層学習フレームワークをColaboratoryで使い倒そう<br>
https://qiita.com/stakemura/items/1761be70a06fa8ee853f

- 簡単なCNNによるディープラーニングライブラリ速度比較<br>
https://qiita.com/daigo0927/items/8092f3ff5276ffc4f088

## GPU モードの設定

メニューより<br>
　　<strong>ランタイム  ⇒  ランタイムのタイプを変更</strong> <br>
 を選択して、現れたダイアログで<br>
- ランタイムのタイプ  = <font color='red'><strong>Python3</strong></font>
- ハードウェアアクセラレータ  = <font color='red'><strong>GPU</strong></font>
- このノートブックを保存する際にコードセルの出力を除外する = <font color='red'><strong>OFF</strong></font>

に設定してから【保存】ボタンを押す。

## Google Drive をマウント

### <font color='red'>注意</font>
ランタイムの最初において、下記のコードを実行すると、<font color='red'><strong>認証コード</strong></font> の URL が表示される。<br>
URL をクリックして、リンク先で自分のアカウントを選択して認証した後、<br>
表示された認証コードをコピーして、下記の入力欄にペーストすればマウントが完了する。

### 参考：
　　Google ドライブの使い方<br>
　　https://www.appsupport.jp/category/drive/

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## パスを追加して独自のパッケージをインポートできるようにする

In [0]:
import sys
sys.path.append('/content/drive/My Drive/compare_deeplibs')

In [3]:
!ls -l /content/drive/'My Drive'/compare_deeplibs

total 1296
drwx------ 2 root root    4096 Aug 25 05:33 CIFAR10
-rw------- 1 root root   11115 Aug 25 09:28 lap_record.csv
-rw------- 1 root root 1172512 Aug 25 08:28 model_torch.pth
drwx------ 2 root root    4096 Aug 24 23:11 __pycache__
-rw------- 1 root root   19208 Aug 25 09:05 train_Chainer_GPU_Tesla-T4.ipynb
-rw------- 1 root root   20858 Aug 25 08:43 train_Keras_GPU_Tesla-T4.ipynb
-rw------- 1 root root   30977 Aug 25 06:51 train_Keras_TPU.ipynb
-rw------- 1 root root   19410 Aug 25 09:29 train_PyTorch_GPU_Tesla-T4.ipynb
-rw------- 1 root root   19678 Aug 25 06:38 train_TensorFlow_GPU_Tesla-K80.ipynb
-rw------- 1 root root   19634 Aug 25 08:44 train_TensorFlow_GPU_Tesla-T4.ipynb
-rw------- 1 root root    3432 Aug 24 22:59 utils.py


In [4]:
from utils import load_cifar10, load_cifar100, show_progress

Using TensorFlow backend.


## TensorFlow のバージョンの確認

In [5]:
import tensorflow as tf
print("TensorFlow: ", tf.__version__)

TensorFlow:  1.14.0


## GPU のデバイスの情報を表示

In [6]:
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
device_name = cuda.get_device_name(cuda.current_device())
print(device_name)

Tesla T4


## Warnings の抑制

今後の変更点などが警告として表示されるので、以下のセルの各文をコメントアウトして、一度は眺めておくと参考になる。

In [0]:
import warnings
warnings.filterwarnings('ignore')

import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

## モデルを構築するクラス

In [0]:
from tensorflow.layers import Conv2D, BatchNormalization
from tensorflow.layers import MaxPooling2D, Flatten, Dense
from tensorflow.nn import relu

class CNN(object):
    def __init__(self, num_output, name='cnn'):
        self.num_output = num_output
        self.name = name

    def __call__(self, inputs):
        with tf.variable_scope(self.name) as vs:
            x = Conv2D(64, (5, 5), (1, 1), 'same')(inputs)
            x = BatchNormalization()(x)
            x = relu(x)
            x = MaxPooling2D((2, 2), (2, 2), 'same')(x)
            
            x = Conv2D(128, (5, 5), (1, 1), 'same')(x)
            x = BatchNormalization()(x)
            x = relu(x)
            x = MaxPooling2D((2, 2), (2, 2), 'same')(x)
            
            x = Flatten()(x)
            logits = Dense(self.num_output)(x)

            return logits

    @property
    def vars(self):
        return [var for var in tf.global_variables() if self.name in var.name]

## 学習を管理するクラス

In [0]:
import time
import os
import numpy as np
from tensorflow.train import Saver, AdamOptimizer

class Trainer(object):
    def __init__(self, num_epochs, batch_size):
        self.num_epochs = num_epochs
        self.batch_size = batch_size
        self.sess = tf.Session()
        self.load_cifar10()
        self.build_graph()

    def load_cifar10(self):
        (self.x_train, self.y_train), (self.x_test, self.y_test) = load_cifar10()

    def load_cifar100(self):
        (self.x_train, self.y_train), (self.x_test, self.y_test) = load_cifar100()

    def build_graph(self):
        self.images = tf.placeholder(tf.float32, shape=(self.batch_size, 32, 32, 3), name='images')
        self.labels = tf.placeholder(tf.float32, shape=(self.batch_size, 10)       , name='labels')

        self.net = CNN(num_output=10)
        self.logits = self.net(self.images)

        self.loss = tf.losses.softmax_cross_entropy(self.labels, self.logits)

        self.preds = tf.nn.softmax(self.logits)
        self.accuracy = tf.reduce_mean(tf.reduce_sum(self.labels * self.preds, axis=1))
        
        self.opt = AdamOptimizer().minimize(self.loss, var_list=self.net.vars)

        self.saver = Saver()
        self.sess.run(tf.global_variables_initializer())

    def train(self):
        num_batches = int(len(self.x_train) / self.batch_size)
        print('epochs : {}, number of batches : {}'.format(self.num_epochs, num_batches))

        lap_times = []
        # training iteration
        for e in range(self.num_epochs):
            permute_idx = np.random.permutation(np.arange(50000))
            lap_time = []
            
            for b in range(num_batches):
                x_batch = self.x_train[permute_idx[b*self.batch_size:(b+1)*self.batch_size]]
                y_batch = self.y_train[permute_idx[b*self.batch_size:(b+1)*self.batch_size]]

                s_time = time.time()
                _, loss, acc = self.sess.run([self.opt, self.loss, self.accuracy],
                                             feed_dict={self.images:x_batch, self.labels:y_batch})
                e_time = time.time()
                lap_time.append(e_time - s_time)

                if b % 10 == 0:
                    show_progress(e+1, b+1, num_batches, loss, acc)

            # record single epoch training lap-time
            lap_times.append(np.sum(lap_time))
            
            # validation
            accs_val = []
            for b in range(int(len(self.x_test) / self.batch_size)):
                x_val = self.x_test[b*self.batch_size:(b+1)*self.batch_size]
                y_val = self.y_test[b*self.batch_size:(b+1)*self.batch_size]
                
                acc_val = self.sess.run(self.accuracy,
                                        feed_dict={self.images:x_val, self.labels:y_val})
                accs_val.append(acc_val)
            print('\n{} epoch validation accuracy {}'.format(e+1, np.mean(accs_val)))

            # save trained model
            #if not os.path.exists('/content/drive/My Drive/compare_deeplibs/model_tf'):
            #    os.mkdir('/content/drive/My Drive/compare_deeplibs/model_tf')

            #self.saver.save(self.sess, '/content/drive/My Drive/compare_deeplibs/model_tf/model{}.ckpt'.format(e))

        # record training time
        with open('/content/drive/My Drive/compare_deeplibs/lap_record.csv', 'a') as f:
            f.write('TensorFlow-GPU')
            f.write(',' + device_name)
            for lap in lap_times:
                f.write(',' + str(lap))
            f.write('\n')

## 学習を実行するための関数

In [0]:
def train_tf(args):
    os.environ['CUDA_VISIBLE_DEVICES'] = str(args['gpu_id'])

    trainer = Trainer(num_epochs = args['epochs'],
                      batch_size = args['batch_size'])
    trainer.train()

## 計算開始時刻の記録

Google Colaboratory で実行する際に、日本時間の時刻を表示するためにはタイムゾーンの取得が必要となる。

In [11]:
import datetime
import pytz

start_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print(start_time)

2019-08-25 18:30:41.042922+09:00


## 学習の実行

In [12]:
args={
    'epochs'     : 20,
    'batch_size' : 128,
    'gpu_id'     : 0
}

print(args)

for key, value in args.items():
    print('{:12s} : {}'.format(key, value))

train_tf(args)

{'epochs': 20, 'batch_size': 128, 'gpu_id': 0}
epochs       : 20
batch_size   : 128
gpu_id       : 0
load cifar10 data ...
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
epochs : 20, number of batches : 390
  1: [  381 /   390] loss: 1.148852 acc: 0.447839
1 epoch validation accuracy 0.44495514035224915
  2: [  381 /   390] loss: 0.941039 acc: 0.544278
2 epoch validation accuracy 0.5342670679092407
  3: [  381 /   390] loss: 0.717604 acc: 0.615286
3 epoch validation accuracy 0.553236186504364
  4: [  381 /   390] loss: 0.646558 acc: 0.648324
4 epoch validation accuracy 0.5980484485626221
  5: [  381 /   390] loss: 0.818102 acc: 0.598083
5 epoch validation accuracy 0.5985779166221619
  6: [  381 /   390] loss: 0.439115 acc: 0.751228
6 epoch validation accuracy 0.6307581663131714
  7: [  381 /   390] loss: 0.574835 acc: 0.702958
7 epoch validation accuracy 0.6495941877365112
  8: [  381 /   390] loss: 0.573189 acc: 0.699909
8 epoch validation accuracy 0.657

## 学習に要した時間の表示

In [13]:
end_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print("\nStart   Time  : " + str(start_time))
print(  "End     Time  : " + str(end_time))
print(  "Elapsed Time  : " + str(end_time - start_time))


Start   Time  : 2019-08-25 18:30:41.042922+09:00
End     Time  : 2019-08-25 18:33:03.390253+09:00
Elapsed Time  : 0:02:22.347331


## Google Colaboratory のセッションを開始してからの経過時間を表示

In [14]:
!cat /proc/uptime | awk '{print "経過時間 : " ($1 / 3600) " hours (" $1 " sec)"}'

経過時間 : 0.0619056 hours (222.86 sec)
