![Pytorch](../../../pytorch_logo_2018.svg)

# Pytorch 番外篇：Pytorch中的TensorBoard（TensorBoard in PyTorch）

>参考代码
>
>**yunjey的 [pytorch tutorial系列](https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/04-utils/tensorboard)**

## TensorBoard相关资料

TensorBoard是Tensorflow官方推出的可视化工具。

>**官方介绍**
>
>[TensorBoard: Visualizing Learning](https://www.tensorflow.org/guide/summaries_and_tensorboard)
>
>[TensorBoard实践介绍（2017年TensorFlow开发大会）](https://www.youtube.com/watch?v=eBbEDRsCmv4&feature=youtu.be)

>**相关博客**
>
>[Tensorflow的可视化工具Tensorboard的初步使用](https://blog.csdn.net/sinat_33761963/article/details/62433234)
>
>[TensorFlow教程 4 Tensorboard 可视化好帮手](https://blog.csdn.net/u012052268/article/details/75394077)

## Pytorch 实现

在这次的代码里，是通过简单的神经网络实现一个MINIST的分类器，并且通过**TensorBoard**实现训练过程的可视化。

在训练阶段，通过`scalar_summary`画出损失和精确率，通过`image_summary`可视化训练的图像。

另外，使用`histogram_summary`可视化神经网络的参数的权重和梯度值。

*需要安装的 package*
- tensorflow
- torch
- torchvision
- scipy
- numpy

### LOG功能实现（Logger类）

基于TensorBoard，给Pytorch的训练提供保存训练信息的接口。

Tensorboard可以记录与展示以下数据形式：
- 标量Scalars 
- 图片Images 
- 音频Audio 
- 计算图Graph 
- 数据分布Distribution 
- 直方图Histograms 
- 嵌入向量Embeddings

代码中实现了标量Scalar、图片Image、直方图Histogram的保存。

In [2]:
# 包
import tensorflow as tf
import numpy as np
import scipy.misc 
try:
    from StringIO import StringIO  # Python 2.7
except ImportError:
    from io import BytesIO         # Python 3.x

In [3]:
class Logger(object):
    
    def __init__(self, log_dir):
        """Create a summary writer logging to log_dir."""
         # 创建一个指向log文件夹的summary writer
        self.writer = tf.summary.FileWriter(log_dir)

    def scalar_summary(self, tag, value, step):
        """Log a scalar variable."""
        # 标量信息 日志
        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value)])
        self.writer.add_summary(summary, step)

    def image_summary(self, tag, images, step):
        """Log a list of images."""
        # 图像信息 日志
        img_summaries = []
        for i, img in enumerate(images):
            # Write the image to a string
            try:
                s = StringIO()
            except:
                s = BytesIO()
            scipy.misc.toimage(img).save(s, format="png")

            # Create an Image object
            img_sum = tf.Summary.Image(encoded_image_string=s.getvalue(),
                                       height=img.shape[0],
                                       width=img.shape[1])
            # Create a Summary value
            img_summaries.append(tf.Summary.Value(tag='%s/%d' % (tag, i), image=img_sum))

        # Create and write Summary
        summary = tf.Summary(value=img_summaries)
        self.writer.add_summary(summary, step)
        
    def histo_summary(self, tag, values, step, bins=1000):
        """Log a histogram of the tensor of values."""
        # 直方图信息 日志
        # Create a histogram using numpy
        counts, bin_edges = np.histogram(values, bins=bins)

        # Fill the fields of the histogram proto
        hist = tf.HistogramProto()
        hist.min = float(np.min(values))
        hist.max = float(np.max(values))
        hist.num = int(np.prod(values.shape))
        hist.sum = float(np.sum(values))
        hist.sum_squares = float(np.sum(values**2))

        # Drop the start of the first bin
        bin_edges = bin_edges[1:]

        # Add bin edges and counts
        for edge in bin_edges:
            hist.bucket_limit.append(edge)
        for c in counts:
            hist.bucket.append(c)

        # Create and write Summary
        summary = tf.Summary(value=[tf.Summary.Value(tag=tag, histo=hist)])
        self.writer.add_summary(summary, step)
        self.writer.flush()

### 创建模型并训练（训练过程中输出日志）

In [6]:
# 包
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms

In [7]:
# 设备配置
torch.cuda.set_device(1) # 这句用来设置pytorch在哪块GPU上运行
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [8]:
# MNIST 数据集
dataset = torchvision.datasets.MNIST(root='../../../data/minist', 
                                     train=True, 
                                     transform=transforms.ToTensor(),  
                                     download=True)

# Data loader
data_loader = torch.utils.data.DataLoader(dataset=dataset, 
                                          batch_size=100, 
                                          shuffle=True)

In [9]:
# 定义一个全连接网络（含一个隐藏层）
# Fully connected neural network with one hidden layer
class NeuralNet(nn.Module):
    def __init__(self, input_size=784, hidden_size=500, num_classes=10):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

In [10]:
# 实例化模型
model = NeuralNet().to(device)

In [11]:
# 创建日志类，指定文件夹
logger = Logger('./logs')

In [12]:
# 指定损失函数和优化器
criterion = nn.CrossEntropyLoss()  
optimizer = torch.optim.Adam(model.parameters(), lr=0.00001)  

In [13]:
# 超参数
data_iter = iter(data_loader)
iter_per_epoch = len(data_loader)
total_step = 50000

In [14]:
# 开始训练
for step in range(total_step):
    
    # 重置迭代器
    if (step+1) % iter_per_epoch == 0:
        data_iter = iter(data_loader)

    # 获取图像和标签
    images, labels = next(data_iter)
    images, labels = images.view(images.size(0), -1).to(device), labels.to(device)
    
    # 前向传播
    outputs = model(images)
    loss = criterion(outputs, labels)
    
    # 反向传播和优化
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # 计算准确率
    _, argmax = torch.max(outputs, 1)
    accuracy = (labels == argmax.squeeze()).float().mean()

    if (step+1) % 100 == 0:
        print ('Step [{}/{}], Loss: {:.4f}, Acc: {:.2f}' 
               .format(step+1, total_step, loss.item(), accuracy.item()))

        # ================================================================== #
        #                        该部分为保存 TensorBoard 日志信息                       #
        # ================================================================== #

        # 1. Log scalar values (scalar summary)
        # 日志输出标量信息（scalar summary）
        info = { 'loss': loss.item(), 'accuracy': accuracy.item() }

        for tag, value in info.items():
            logger.scalar_summary(tag, value, step+1)

        # 2. Log values and gradients of the parameters (histogram summary)
        # 日志输出参数值和梯度（histogram summary)
        for tag, value in model.named_parameters():
            tag = tag.replace('.', '/')
            logger.histo_summary(tag, value.data.cpu().numpy(), step+1)
            logger.histo_summary(tag+'/grad', value.grad.data.cpu().numpy(), step+1)

        # 3. Log training images (image summary)
        # 日志输出图像(image summary)
        info = { 'images': images.view(-1, 28, 28)[:10].cpu().numpy() }

        for tag, images in info.items():
            logger.image_summary(tag, images, step+1)

Step [100/50000], Loss: 2.1946, Acc: 0.44
Step [200/50000], Loss: 2.1081, Acc: 0.51
Step [300/50000], Loss: 1.9934, Acc: 0.68
Step [400/50000], Loss: 1.7980, Acc: 0.78
Step [500/50000], Loss: 1.7040, Acc: 0.71
Step [600/50000], Loss: 1.5549, Acc: 0.73
Step [700/50000], Loss: 1.4596, Acc: 0.73
Step [800/50000], Loss: 1.3418, Acc: 0.80
Step [900/50000], Loss: 1.2599, Acc: 0.73
Step [1000/50000], Loss: 1.1744, Acc: 0.78
Step [1100/50000], Loss: 1.1137, Acc: 0.81
Step [1200/50000], Loss: 1.0074, Acc: 0.85
Step [1300/50000], Loss: 0.8924, Acc: 0.86
Step [1400/50000], Loss: 0.9266, Acc: 0.84
Step [1500/50000], Loss: 0.8485, Acc: 0.87
Step [1600/50000], Loss: 0.8521, Acc: 0.82
Step [1700/50000], Loss: 0.8131, Acc: 0.85
Step [1800/50000], Loss: 0.6989, Acc: 0.86
Step [1900/50000], Loss: 0.6134, Acc: 0.95
Step [2000/50000], Loss: 0.7695, Acc: 0.83
Step [2100/50000], Loss: 0.6143, Acc: 0.90
Step [2200/50000], Loss: 0.6876, Acc: 0.84
Step [2300/50000], Loss: 0.5857, Acc: 0.88
Step [2400/50000], L

Step [19000/50000], Loss: 0.2679, Acc: 0.92
Step [19100/50000], Loss: 0.2031, Acc: 0.91
Step [19200/50000], Loss: 0.3118, Acc: 0.88
Step [19300/50000], Loss: 0.2032, Acc: 0.95
Step [19400/50000], Loss: 0.3927, Acc: 0.89
Step [19500/50000], Loss: 0.2999, Acc: 0.92
Step [19600/50000], Loss: 0.1275, Acc: 0.98
Step [19700/50000], Loss: 0.1703, Acc: 0.94
Step [19800/50000], Loss: 0.2165, Acc: 0.93
Step [19900/50000], Loss: 0.4208, Acc: 0.89
Step [20000/50000], Loss: 0.2030, Acc: 0.95
Step [20100/50000], Loss: 0.3723, Acc: 0.89
Step [20200/50000], Loss: 0.1659, Acc: 0.95
Step [20300/50000], Loss: 0.1654, Acc: 0.93
Step [20400/50000], Loss: 0.1170, Acc: 0.96
Step [20500/50000], Loss: 0.2645, Acc: 0.95
Step [20600/50000], Loss: 0.2268, Acc: 0.91
Step [20700/50000], Loss: 0.2426, Acc: 0.91
Step [20800/50000], Loss: 0.1867, Acc: 0.95
Step [20900/50000], Loss: 0.3319, Acc: 0.92
Step [21000/50000], Loss: 0.2269, Acc: 0.94
Step [21100/50000], Loss: 0.2745, Acc: 0.88
Step [21200/50000], Loss: 0.1546

Step [37700/50000], Loss: 0.1356, Acc: 0.95
Step [37800/50000], Loss: 0.2596, Acc: 0.93
Step [37900/50000], Loss: 0.2773, Acc: 0.94
Step [38000/50000], Loss: 0.1568, Acc: 0.95
Step [38100/50000], Loss: 0.1733, Acc: 0.94
Step [38200/50000], Loss: 0.0969, Acc: 0.97
Step [38300/50000], Loss: 0.0885, Acc: 0.96
Step [38400/50000], Loss: 0.1053, Acc: 0.97
Step [38500/50000], Loss: 0.1265, Acc: 0.97
Step [38600/50000], Loss: 0.1126, Acc: 0.97
Step [38700/50000], Loss: 0.1779, Acc: 0.94
Step [38800/50000], Loss: 0.0966, Acc: 0.99
Step [38900/50000], Loss: 0.1019, Acc: 0.97
Step [39000/50000], Loss: 0.1765, Acc: 0.94
Step [39100/50000], Loss: 0.0542, Acc: 1.00
Step [39200/50000], Loss: 0.2043, Acc: 0.95
Step [39300/50000], Loss: 0.2087, Acc: 0.94
Step [39400/50000], Loss: 0.0576, Acc: 0.98
Step [39500/50000], Loss: 0.1783, Acc: 0.94
Step [39600/50000], Loss: 0.1845, Acc: 0.95
Step [39700/50000], Loss: 0.1496, Acc: 0.95
Step [39800/50000], Loss: 0.1212, Acc: 0.98
Step [39900/50000], Loss: 0.0735

## 调用TensorBoard进行可视化

经过训练后，日志信息保存在`./logs`文件夹下。运行命令进行可视化，

```
$ tensorboard --logdir='./logs' --port=6006
```

然后打开本地浏览器，打开` http://localhost:6006/ `就能看到了。

### 标量Scalar

![标量Scalar](TensorBoard_Scalar.png)

### 图片Image

![图片Image](TensorBoard_Image.png)

### 直方图Histogram

![直方图Histogram](TensorBoard_Histogram.png)

# 致谢

[yunjey的Pytorch](https://github.com/yunjey/pytorch-tutorial)总算学完了，既初步掌握了Pytorch，又把深度学习中的重要概念过了一遍，收获多多。

大神的代码简洁无比，非常感谢。

学完Pytorch，后面应该盯着目标检测去了，至少掌握了一门深度学习框架，实践起来应该会顺手很多。