Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

luotao1 · 2018-02-09T07:19:54Z

This issue comes from Zelazko, Pawel. Related with question #6 in https://github.com/PaddlePaddle/Paddle/wiki/Intel-Open-Questions

In fluid, for CUDNN, GetExpectedKernel decides if it has to use CUDNN library based on operator use_cudnn flag.
How control of kernel-selection is different in unit-test and normal use case?
Having use_mkldnn both as global and OP field, should MLDNN be chosen only when both are set to true, or only the global one?
I’m not sure if I will have an access to global fields in GetExpectedKernel method, I will check that later.

jacquesqiao · 2018-02-09T08:32:14Z

global flag

We will have a priority mechanism for kernel-selection, for example, if the default priority is

std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority = {
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain),
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CPUPlace(), LibraryType::kPlain),
}

and global use_mkldnn flag will give cudnn a higher priority, if user set it, the priority can be

std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority = {
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain),
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CPUPlace(), LibraryType::kPlain),
}

then the kernel selection will choose mkldnn kernel first if there is one, if there is not one, it will fall back to find another kernel with lower priority.

flag in op

flag in op attribute is used to force kernel selection to find a kernel of a certain type, if it can not find one, the framework will throw an exception.

pzelazko-intel · 2018-02-13T11:21:08Z

I see that kKernelPriority table is not going to be used until all transformations are implemented: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L516

As kKernelPriority is not honored at the moment, I am not implementing use_mkldnn global flag, but only use_mkldnn flag on OP-level, by default set to True.

How does kKernelPriority and OP flag mechanisms will mix up? What if for example kKernelPriority has kPlain library higher, but GetExpectedKernelType returns library kMKLDNN?

Is use_gpu global flag used in fluid? I seems that it's unnecessary as we choose place on script level.

jacquesqiao · 2018-02-27T02:27:10Z

As kKernelPriority is not honored at the moment, I am not implementing use_mkldnn global flag, but only use_mkldnn flag on OP-level, by default set to True.

Ok

How does kKernelPriority and OP flag mechanisms will mix up? What if for example kKernelPriority has kPlain library higher, but GetExpectedKernelType returns library kMKLDNN?

kKernelPriority will be a fallback mechanism, the framework will firstly check op flag, if it can not deside then it will use priority to choose a proper kernel.

Is use_gpu global flag used in fluid? I seems that it's unnecessary as we choose place on script level.

use_gpu is not used in current fluid, in fact currently the place is created and passed to executor at Python side by the user, but this will change in the future.

luotao1 · 2018-02-27T02:59:56Z

but only use_mkldnn flag on OP-level, by default set to True.

We'd better not set the default value to True now. We should align the accuracy of mkldnn op on PaddlePaddle books Fluid (see https://github.com/dzhwinter/benchmark) at first.

pzelazko-intel · 2018-02-28T12:14:10Z

@luotao1 To align the accuracy of mkldnn op, should we run scripts from https://github.com/dzhwinter/benchmark/tree/master/fluid and check if accuracy does not decrease for MKLDNN kernels? I don't also see any actual results there... Is someone resposible for running benchamarks or developers run them?

dzhwinter · 2018-02-28T13:50:34Z

Align the mkldnn operator, I think it can be divided into two parts.
@pzelazko-intel

Single operator/kernel implementation.
If you want to validate a single op/kernel implement accuracy, we have a small operator test framework
to do the job. Take the mul_op for example,

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_mul_op.py#L33

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y': np.random.random((84, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}

    def test_check_output(self):
        self.check_output()

    def test_check_grad_normal(self):
        self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)

In the forward, we compare the result with numpy.
In the backward, we use the gradient checking to validate the result. max_relative_error means the maximum tolerable error value of the gradient.
http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization

Composed operators as a model.
We need to check the convergence rate of the model. Given the same randomness, see if two models can go to the same position in critical conditions.

dzhwinter · 2018-02-28T13:54:13Z

For the benchmark model, you mentioned, sorry for the inconvenient.
Recently we are focused on optimizing the kernel implement and saving memory in our framework. So the markdown is not updated in time. We plan to release the benchmark result at the end of this week.

The benchmark result is organized as below.

Model convergence rate comparison.
We provide the training curves in VisualDL graph https://github.com/PaddlePaddle/Paddle/VisualDL
Then we can check every minibatch accuracy/loss during the training process. After release, this result will be shown at paddlepaddle.org website.
Model speed comparison and some
We compare the training speed in different batch size, to measure the framework performance in different situations.
https://docs.google.com/spreadsheets/d/12enqFEnw3lJrovdVS3By641zonqySl5bUR3bp1gKlOQ/edit#gid=2045395118
This excel will be updated to the benchmark repo.
@pzelazko-intel

Here is the releases notes, we can track the release process at #8533

pzelazko-intel · 2018-02-28T14:11:18Z

Thank you, @dzhwinter
I actually have unit tests for single kernels:
https://github.com/PaddlePaddle/Paddle/pull/8451/files#diff-d3ce28c8e4cf715ded3772881fc24aa7R224 - files test_conv2d_op.py and test_pool2d_op.py

Also, @jczaja ran tests on mnist dataset with MKLDNN kerneles and it converged to the values obtained in caffe framework.

@luotao1 I can set use_mkldnn flag to False by default for now and can change it int hte future when MKLDNN kernels will be more mature? What's your opinion?

luotao1 · 2018-03-16T13:50:06Z

@pzelazko-intel sorry to reply late.
Yes, you can changeuse_mkldnn flag to True when MKLDNN kernels are more mature in the future.

luotao1 added the Intel label Feb 9, 2018

pzelazko-intel closed this as completed Mar 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

luotao1 commented Feb 9, 2018 •

edited

Loading

jacquesqiao commented Feb 9, 2018

pzelazko-intel commented Feb 13, 2018

jacquesqiao commented Feb 27, 2018

luotao1 commented Feb 27, 2018

pzelazko-intel commented Feb 28, 2018

dzhwinter commented Feb 28, 2018

dzhwinter commented Feb 28, 2018

pzelazko-intel commented Feb 28, 2018

luotao1 commented Mar 16, 2018

Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

Comments

luotao1 commented Feb 9, 2018 • edited Loading

jacquesqiao commented Feb 9, 2018

global flag

flag in op

pzelazko-intel commented Feb 13, 2018

jacquesqiao commented Feb 27, 2018

luotao1 commented Feb 27, 2018

pzelazko-intel commented Feb 28, 2018

dzhwinter commented Feb 28, 2018

dzhwinter commented Feb 28, 2018

pzelazko-intel commented Feb 28, 2018

luotao1 commented Mar 16, 2018

luotao1 commented Feb 9, 2018 •

edited

Loading