## Train Torch : GPU

参考：<br>
- 深層学習コンパイラTVMと主要深層学習フレームワークをColaboratoryで使い倒そう<br>
https://qiita.com/stakemura/items/1761be70a06fa8ee853f

- 簡単なCNNによるディープラーニングライブラリ速度比較<br>
https://qiita.com/daigo0927/items/8092f3ff5276ffc4f088

## GPU モードの設定

メニューより<br>
　　<strong>ランタイム  ⇒  ランタイムのタイプを変更</strong> <br>
 を選択して、現れたダイアログで<br>
- ランタイムのタイプ  = <font color='red'><strong>Python3</strong></font>
- ハードウェアアクセラレータ  = <font color='red'><strong>GPU</strong></font>
- このノートブックを保存する際にコードセルの出力を除外する = <font color='red'><strong>OFF</strong></font>

に設定してから【保存】ボタンを押す。

## Google Drive をマウント

### <font color='red'>注意</font>
ランタイムの最初において、下記のコードを実行すると、<font color='red'><strong>認証コード</strong></font> の URL が表示される。<br>
URL をクリックして、リンク先で自分のアカウントを選択して認証した後、<br>
表示された認証コードをコピーして、下記の入力欄にペーストすればマウントが完了する。

### 参考：
　　Google ドライブの使い方<br>
　　https://www.appsupport.jp/category/drive/

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## パスを追加して独自のパッケージをインポートできるようにする

In [0]:
import sys
sys.path.append('/content/drive/My Drive/compare_deeplibs')

In [0]:
!ls -l /content/drive/'My Drive'/compare_deeplibs

total 1289
drwx------ 2 root root    4096 Aug 25 05:33 CIFAR10
-rw------- 1 root root   10716 Aug 25 09:22 lap_record.csv
-rw------- 1 root root 1172512 Aug 25 08:28 model_torch.pth
drwx------ 2 root root    4096 Aug 24 23:11 __pycache__
-rw------- 1 root root   19208 Aug 25 09:05 train_Chainer_GPU_Tesla-T4.ipynb
-rw------- 1 root root   20858 Aug 25 08:43 train_Keras_GPU_Tesla-T4.ipynb
-rw------- 1 root root   30977 Aug 25 06:51 train_Keras_TPU.ipynb
-rw------- 1 root root   12104 Aug 25 09:23 train_PyTorch_GPU_Tesla-T4.ipynb
-rw------- 1 root root   19678 Aug 25 06:38 train_TensorFlow_GPU_Tesla-K80.ipynb
-rw------- 1 root root   19634 Aug 25 08:44 train_TensorFlow_GPU_Tesla-T4.ipynb
-rw------- 1 root root    3432 Aug 24 22:59 utils.py


In [0]:
from utils import show_progress

Using TensorFlow backend.


## PyTorch のバージョンの確認

In [0]:
import torch
print(torch.__version__)

1.1.0


## GPU のデバイスの情報を表示

In [0]:
from torch import cuda
assert cuda.is_available()
assert cuda.device_count() > 0
device_name = cuda.get_device_name(cuda.current_device())
print(device_name)

Tesla T4


## モデルを構築するためのクラス

In [0]:
import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 5, 1, 2),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.MaxPool2d(2, 2),

            nn.Conv2d(64, 128, 5, 1, 2),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.MaxPool2d(2, 2)
        )

        self.classifier = nn.Sequential(
            nn.Linear(128 * 8 * 8, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 128 * 8 * 8)
        x = self.classifier(x)
        
        return x

## CIFA-10 のデータセットの読み込み

In [0]:
from torchvision.datasets import CIFAR10
from torchvision import transforms
from torch.utils.data import DataLoader

def load_CIFAR10(batch_size):
    transform = transforms.Compose(
        [transforms.ToTensor(),
         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

    trainset = CIFAR10(root='/content/drive/My Drive/compare_deeplibs/CIFAR10', train=True,
                                        download=True, transform=transform)
    trainloader = DataLoader(trainset,
                                    batch_size=batch_size,
                                    shuffle=True, num_workers=1, pin_memory=True)
    testset = CIFAR10(root='/content/drive/My Drive/compare_deeplibs/CIFAR10', train=False,
                                        download=True, transform=transform)
    testloader = DataLoader(testset,
                                    batch_size=batch_size,
                                    shuffle=False, num_workers=1, pin_memory=True)
    
    return (trainloader, testloader)

## 予測精度の計算

In [0]:
def accuracy(out, labels):
    _, pred= torch.max(out, 1)
    return (pred == labels).sum().item() / labels.size(0)

## 学習を実行するための関数

In [0]:
import time
import torch.optim as optim

def train(args):
    # load dataset
    # ==========================
    trainloader, testloader = load_CIFAR10(args['batch_size'])
    N = len(trainloader)
    print('# of trainset: ', N)
    print('epoch: %d  batch size: %d' % (args['epochs'], args['batch_size']))

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(device)
 
    cnn = CNN()
    cnn.to(device)
    criterion = nn.CrossEntropyLoss()
    criterion.to(device)

    optimizer = optim.Adam(cnn.parameters())

    
    # train
    # ==========================
    #loss_history = []
    #acc_history  = []
    time_history = []
    
    for e in range(args['epochs']):
        loss_cum = 0.0
        acc_cum  = 0.0
        time_cum = 0.0
        
        for b, (imgs, labels) in enumerate(trainloader):
            start = time.time()
            imgs, labels = imgs.to(device), labels.to(device)
            cnn.zero_grad()
            outputs = cnn(imgs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            time_cum += time.time() - start

            loss_cum += loss.item()
            acc = accuracy(outputs, labels)
            acc_cum += acc

            if b % 10 == 0:
                show_progress(e+1, b+1, N, loss.item(), acc)

        print('\t mean acc: %f' % (acc_cum / N))
        #loss_history.append(loss_cum / N)
        #acc_history.append(acc_cum / N)
        time_history.append(time_cum)

        # test accuracy
        #cnn.eval()
        correct, total = 0, 0
        for imgs, labels in testloader:
            imgs, labels = imgs.to(device), labels.to(device)
            outputs = cnn(imgs)
            _, pred = torch.max(outputs, 1)  # 列方向で最大値を取ったときの最大値とそのインデックス
            total += labels.size(0)
            correct += (pred == labels).sum().item()
        
        print('mean accuracy on %d test images: %f' % (total, correct / total))

    # save histories
    # with open('/content/drive/My Drive/compare_deeplibs/loss_pytorch.csv', 'w') as f:
    #     f.write('pytorch')
    #     for l in loss_history:
    #         f.write(',' + str(l))
    #     f.write('\n')
    # print('saved loss history')
    # with open('/content/drive/My Drive/compare_deeplibs/acc_pytorch.csv', 'w') as f:
    #     f.write('pytorch')
    #     for l in acc_history:
    #         f.write(',' + str(l))
    #     f.write('\n')
    # print('saved acc history')
    
    with open('/content/drive/My Drive/compare_deeplibs/lap_record.csv', 'a') as f:
        f.write('Pytorch-GPU')
        f.write(',' + device_name)
        for t in time_history:
            f.write(',' + str(t))
        f.write('\n')
    
    # save models
    #torch.save(cnn.state_dict(), '/content/drive/My Drive/compare_deeplibs/model_torch.pth')

## 計算開始時刻の記録

Google Colaboratory で実行する際に、日本時間の時刻を表示するためにはタイムゾーンの取得が必要となる。

In [0]:
import datetime
import pytz

start_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print(start_time)

2019-08-25 18:23:58.718828+09:00


## 学習の実行

In [0]:
args={
    'epochs'     : 20,
    'batch_size' : 128,
    'gpu_id'     : 0
}

print(args)

for key, value in args.items():
    print('{:12s} : {}'.format(key, value))

train(args)

{'epochs': 20, 'batch_size': 128, 'gpu_id': 0}
epochs       : 20
batch_size   : 128
gpu_id       : 0
Files already downloaded and verified
Files already downloaded and verified
# of trainset:  391
epoch: 20  batch size: 128
cuda
  1: [  391 /   391] loss: 1.130980 acc: 0.587500	 mean acc: 0.543722
mean accuracy on 10000 test images: 0.645600
  2: [  391 /   391] loss: 0.752534 acc: 0.737500	 mean acc: 0.681754
mean accuracy on 10000 test images: 0.695200
  3: [  391 /   391] loss: 0.670533 acc: 0.775000	 mean acc: 0.733780
mean accuracy on 10000 test images: 0.730700
  4: [  391 /   391] loss: 0.551118 acc: 0.837500	 mean acc: 0.768067
mean accuracy on 10000 test images: 0.735300
  5: [  391 /   391] loss: 0.637408 acc: 0.725000	 mean acc: 0.792455
mean accuracy on 10000 test images: 0.752400
  6: [  391 /   391] loss: 0.536278 acc: 0.837500	 mean acc: 0.815381
mean accuracy on 10000 test images: 0.752100
  7: [  391 /   391] loss: 0.667059 acc: 0.750000	 mean acc: 0.837056
mean accura

## 学習に要した時間の表示

In [0]:
end_time = datetime.datetime.now(pytz.timezone('Asia/Tokyo'))
print("\nStart   Time  : " + str(start_time))
print(  "End     Time  : " + str(end_time))
print(  "Elapsed Time  : " + str(end_time - start_time))


Start   Time  : 2019-08-25 18:23:58.718828+09:00
End     Time  : 2019-08-25 18:28:40.221854+09:00
Elapsed Time  : 0:04:41.503026


## Google Colaboratory のセッションを開始してからの経過時間を表示

In [0]:
!cat /proc/uptime | awk '{print "経過時間 : " ($1 / 3600) " hours (" $1 " sec)"}'

経過時間 : 0.121469 hours (437.29 sec)
