# 训练的提速
实际训练后发现，速度没有预想得那么快，打开GPU性能窗口发现大多数时间GPU只是显存占用，算力占用为0,而CPU确实占满了的。
[GPU](../images/GPU.png)，经过测试和查阅资料，发现是因为CPU的读取速度过慢导致跟不上GPU的运算，进而使得大部分时间都是消耗在CPU调用数据，GPU都是闲置的，如果只有我的电脑这样倒也没啥，可能说明我的CPU比较垃圾，但是在kaggle上的测试也是这样[kaggle](../images/kaggle.png)，进一步查阅资料和测试发现，合理设置num_workers可以缓解这一问题，不同num_workers的耗时对比见下图[time_consume_w.png](../images/time_consume_w.png),可以看到，训练的时间基本都是稳定的，占比很小，占大头的都是进入和退出for循环和epoch，这个都是CPU读取数据的耗时，下面测试各种优化方法，看看哪种有效。

## 1.设置不同的num_workers

设置不同的num_workers确实可以影响，但是其他参数变了之后这个好像也要重新选，在batch_size = 32的情况下，测试不同batch_size的影响如下图所示：
在cornell上的测试结果：

[time_consume_c.png](../images/time_consume_c.png)

在jacquard上的测试结果：

[time_consume_j.png](../images/time_consume_j.png)

## 2.使用dali库
NVIDIA DALI（R）是NVIDIA的数据加载库，它是高度优化的构建基块和执行引擎的集合，可加速深度学习应用程序输入数据的预处理。DALI提供了性能和灵活性，可以作为一个库来加速不同的数据管道。然后可以将该库轻松集成到不同的深度学习训练和推理应用程序中。

主要就是想了利用它从磁盘批量读取图像文件并转换成Tensor文件，同时可以执行一定的增强操作。

下面按照官方的例程来试一下

In [1]:
from nvidia.dali.pipeline import Pipeline

help(Pipeline)

Help on class Pipeline in module nvidia.dali.pipeline:

class Pipeline(builtins.object)
 |  Pipeline class is the base of all DALI data pipelines. The pipeline
 |  encapsulates the data processing graph and the execution engine.
 |  
 |  Parameters
 |  ----------
 |  `batch_size` : int, optional, default = -1
 |      Batch size of the pipeline. Negative values for this parameter
 |      are invalid - the default value may only be used with
 |      serialized pipeline (the value stored in serialized pipeline
 |      is used instead).
 |  `num_threads` : int, optional, default = -1
 |      Number of CPU threads used by the pipeline.
 |      Negative values for this parameter are invalid - the default
 |      value may only be used with serialized pipeline (the value
 |      stored in serialized pipeline is used instead).
 |  `device_id` : int, optional, default = -1
 |      Id of GPU used by the pipeline.
 |      A None value for this parameter means that DALI should not use GPU nor CUDA

In [17]:
import os.path

test_data_root = os.environ['DALI_EXTRA_PATH']

# Caffe LMDB
lmdb_folder = os.path.join(test_data_root, 'db', 'lmdb')

N = 8             # number of GPUs
BATCH_SIZE = 128  # batch size per GPU
ITERATIONS = 32
IMAGE_SIZE = 3

In [18]:
from nvidia.dali.pipeline import Pipeline
import nvidia.dali.ops as ops
import nvidia.dali.types as types

class CaffeReadPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, num_gpus):
        super(CaffeReadPipeline, self).__init__(batch_size, num_threads, device_id)

        self.input = ops.CaffeReader(path = lmdb_folder,
                                     random_shuffle = True, shard_id = device_id, num_shards = num_gpus)
        self.decode = ops.ImageDecoder(device = "mixed", output_type = types.RGB)
        self.resize = ops.Resize(device = "gpu",
                                 interp_type = types.INTERP_LINEAR)
        self.cmn = ops.CropMirrorNormalize(device = "gpu",
                                            dtype = types.FLOAT,
                                            crop = (227, 227),
                                            mean = [128., 128., 128.],
                                            std = [1., 1., 1.])
        self.uniform = ops.Uniform(range = (0.0, 1.0))
        self.resize_rng = ops.Uniform(range = (256, 480))

    def define_graph(self):
        inputs, labels = self.input(name="Reader")
        images = self.decode(inputs)
        images = self.resize(images, resize_shorter = self.resize_rng())
        output = self.cmn(images, crop_pos_x = self.uniform(),
                          crop_pos_y = self.uniform())
        return (output, labels)

In [19]:
import numpy as np
from nvidia.dali.plugin.pytorch import DALIGenericIterator

label_range = (0, 999)
pipes = [CaffeReadPipeline(batch_size=BATCH_SIZE, num_threads=2, device_id = device_id, num_gpus = N) for device_id in range(N)]
pipes[0].build()
dali_iter = DALIGenericIterator(pipes, ['data', 'label'], pipes[0].epoch_size("Reader"))
for i, data in enumerate(dali_iter):
    if i >= ITERATIONS:
        break
    # Testing correctness of labels
    for d in data:
        label = d["label"]
        image = d["data"]
        ## labels need to be integers
        assert(np.equal(np.mod(label, 1), 0).all())
        ## labels need to be in range pipe_name[2]
        assert((label >= label_range[0]).all())
        assert((label <= label_range[1]).all())
print("OK")

RuntimeError: Critical error when building pipeline:
Error when constructing operator: CaffeReader encountered:
[/opt/dali/dali/operators/reader/loader/lmdb.h:52] Assert on "mdb_env_open(mdb_env_, path.c_str(), mdb_flags, 0664) == 0" failed: LMDB Error: Invalid argument, with file: /opt/dali_extra/db/lmdb
Stacktrace (100 entries):
[frame 0]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali_operators.so(+0x3cf8ee) [0x7fa7aa7218ee]
[frame 1]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali_operators.so(+0x182a83b) [0x7fa7abb7c83b]
[frame 2]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali_operators.so(+0x1830cc9) [0x7fa7abb82cc9]
[frame 3]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali_operators.so(std::_Function_handler<std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (dali::OpSpec const&), std::unique_ptr<dali::OperatorBase, std::default_delete<dali::OperatorBase> > (*)(dali::OpSpec const&)>::_M_invoke(std::_Any_data const&, dali::OpSpec const&)+0xc) [0x7fa7aa7196bc]
[frame 4]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali.so(+0x18b9e4) [0x7fa7dc6ea9e4]
[frame 5]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali.so(dali::InstantiateOperator(dali::OpSpec const&)+0x264) [0x7fa7dc6ea324]
[frame 6]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali.so(dali::OpGraph::InstantiateOperators()+0xa2) [0x7fa7dc6a7352]
[frame 7]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/libdali.so(dali::Pipeline::Build(std::vector<std::pair<std::string, std::string>, std::allocator<std::pair<std::string, std::string> > >)+0x9e0) [0x7fa7dc703fb0]
[frame 8]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x444af) [0x7fa7dd4d34af]
[frame 9]: /home/ldh/anaconda3/envs/gg_cnn/lib/python3.6/site-packages/nvidia/dali/backend_impl.cpython-36m-x86_64-linux-gnu.so(+0x90bb4) [0x7fa7dd51fbb4]
[frame 10]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x154) [0x55574efda304]
[frame 11]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199c5e) [0x55574f061c5e]
[frame 12]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 13]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x192f26) [0x55574f05af26]
[frame 14]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193f31) [0x55574f05bf31]
[frame 15]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 16]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 17]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyEval_EvalCodeEx+0x329) [0x55574f05ca49]
[frame 18]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyEval_EvalCode+0x1c) [0x55574f05d7ec]
[frame 19]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1ba227) [0x55574f082227]
[frame 20]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x91) [0x55574efda241]
[frame 21]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 22]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 23]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyGen_Send+0x256) [0x55574f064bc6]
[frame 24]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x1445) [0x55574f085955]
[frame 25]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyGen_Send+0x256) [0x55574f064bc6]
[frame 26]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x1445) [0x55574f085955]
[frame 27]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyGen_Send+0x256) [0x55574f064bc6]
[frame 28]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x115) [0x55574efda2c5]
[frame 29]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 30]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 31]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 32]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 33]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 34]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 35]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 36]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 37]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x192f26) [0x55574f05af26]
[frame 38]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyFunction_FastCallDict+0x3d8) [0x55574f05c628]
[frame 39]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_FastCallDict+0x26f) [0x55574efda6cf]
[frame 40]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_Call_Prepend+0x63) [0x55574efdf143]
[frame 41]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyObject_Call+0x3e) [0x55574efda10e]
[frame 42]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x1aaf) [0x55574f085fbf]
[frame 43]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1931f6) [0x55574f05b1f6]
[frame 44]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193f31) [0x55574f05bf31]
[frame 45]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 46]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x10c9) [0x55574f0855d9]
[frame 47]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x19c744) [0x55574f064744]
[frame 48]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x91) [0x55574efda241]
[frame 49]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 50]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 51]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1931f6) [0x55574f05b1f6]
[frame 52]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193f31) [0x55574f05bf31]
[frame 53]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 54]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 55]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x19c744) [0x55574f064744]
[frame 56]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x91) [0x55574efda241]
[frame 57]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 58]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 59]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1931f6) [0x55574f05b1f6]
[frame 60]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193f31) [0x55574f05bf31]
[frame 61]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 62]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 63]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x19c744) [0x55574f064744]
[frame 64]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x91) [0x55574efda241]
[frame 65]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 66]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 67]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1931f6) [0x55574f05b1f6]
[frame 68]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyFunction_FastCallDict+0x1be) [0x55574f05c40e]
[frame 69]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_FastCallDict+0x26f) [0x55574efda6cf]
[frame 70]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_Call_Prepend+0x63) [0x55574efdf143]
[frame 71]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyObject_Call+0x3e) [0x55574efda10e]
[frame 72]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x1aaf) [0x55574f085fbf]
[frame 73]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyGen_Send+0x134) [0x55574f064aa4]
[frame 74]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyCFunction_FastCallDict+0x115) [0x55574efda2c5]
[frame 75]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199b0c) [0x55574f061b0c]
[frame 76]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 77]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 78]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 79]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 80]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x1931f6) [0x55574f05b1f6]
[frame 81]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyFunction_FastCallDict+0x1be) [0x55574f05c40e]
[frame 82]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_FastCallDict+0x26f) [0x55574efda6cf]
[frame 83]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x205c62) [0x55574f0cdc62]
[frame 84]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyObject_FastCallDict+0x8b) [0x55574efda4eb]
[frame 85]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199c5e) [0x55574f061c5e]
[frame 86]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 87]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 88]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 89]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 90]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyEval_EvalCodeEx+0x966) [0x55574f05d086]
[frame 91]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x195864) [0x55574f05d864]
[frame 92]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(PyObject_Call+0x3e) [0x55574efda10e]
[frame 93]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x1aaf) [0x55574f085fbf]
[frame 94]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 95]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 96]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]
[frame 97]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x193cfb) [0x55574f05bcfb]
[frame 98]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(+0x199be5) [0x55574f061be5]
[frame 99]: /home/ldh/anaconda3/envs/gg_cnn/bin/python(_PyEval_EvalFrameDefault+0x30a) [0x55574f08481a]

Current pipeline object is no longer valid.