Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use both global use_mkldnn flag and per-OP use_ mkldnn flag as it is in paddle v2? #8313

Closed
luotao1 opened this issue Feb 9, 2018 · 9 comments
Labels

Comments

@luotao1
Copy link
Contributor

luotao1 commented Feb 9, 2018

This issue comes from Zelazko, Pawel. Related with question #6 in https://github.com/PaddlePaddle/Paddle/wiki/Intel-Open-Questions

In fluid, for CUDNN, GetExpectedKernel decides if it has to use CUDNN library based on operator use_cudnn flag.
How control of kernel-selection is different in unit-test and normal use case?
Having use_mkldnn both as global and OP field, should MLDNN be chosen only when both are set to true, or only the global one?
I’m not sure if I will have an access to global fields in GetExpectedKernel method, I will check that later.

@luotao1 luotao1 added the Intel label Feb 9, 2018
@jacquesqiao
Copy link
Member

global flag

We will have a priority mechanism for kernel-selection, for example, if the default priority is

std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority = {
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain),
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CPUPlace(), LibraryType::kPlain),
}

and global use_mkldnn flag will give cudnn a higher priority, if user set it, the priority can be

std::vector<std::tuple<platform::Place, LibraryType>> kKernelPriority = {
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kCUDNN),
    std::make_tuple(platform::CUDAPlace(0), LibraryType::kPlain),
    std::make_tuple(platform::CPUPlace(), LibraryType::kMKLDNN),
    std::make_tuple(platform::CPUPlace(), LibraryType::kPlain),
}

then the kernel selection will choose mkldnn kernel first if there is one, if there is not one, it will fall back to find another kernel with lower priority.

flag in op

flag in op attribute is used to force kernel selection to find a kernel of a certain type, if it can not find one, the framework will throw an exception.

@pzelazko-intel
Copy link
Contributor

I see that kKernelPriority table is not going to be used until all transformations are implemented: https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/framework/operator.cc#L516

As kKernelPriority is not honored at the moment, I am not implementing use_mkldnn global flag, but only use_mkldnn flag on OP-level, by default set to True.

How does kKernelPriority and OP flag mechanisms will mix up? What if for example kKernelPriority has kPlain library higher, but GetExpectedKernelType returns library kMKLDNN?

Is use_gpu global flag used in fluid? I seems that it's unnecessary as we choose place on script level.

@jacquesqiao
Copy link
Member

As kKernelPriority is not honored at the moment, I am not implementing use_mkldnn global flag, but only use_mkldnn flag on OP-level, by default set to True.

Ok

How does kKernelPriority and OP flag mechanisms will mix up? What if for example kKernelPriority has kPlain library higher, but GetExpectedKernelType returns library kMKLDNN?

kKernelPriority will be a fallback mechanism, the framework will firstly check op flag, if it can not deside then it will use priority to choose a proper kernel.

Is use_gpu global flag used in fluid? I seems that it's unnecessary as we choose place on script level.

use_gpu is not used in current fluid, in fact currently the place is created and passed to executor at Python side by the user, but this will change in the future.

@luotao1
Copy link
Contributor Author

luotao1 commented Feb 27, 2018

but only use_mkldnn flag on OP-level, by default set to True.

We'd better not set the default value to True now. We should align the accuracy of mkldnn op on PaddlePaddle books Fluid (see https://github.com/dzhwinter/benchmark) at first.

@pzelazko-intel
Copy link
Contributor

@luotao1 To align the accuracy of mkldnn op, should we run scripts from https://github.com/dzhwinter/benchmark/tree/master/fluid and check if accuracy does not decrease for MKLDNN kernels? I don't also see any actual results there... Is someone resposible for running benchamarks or developers run them?

@dzhwinter
Copy link
Contributor

Align the mkldnn operator, I think it can be divided into two parts.
@pzelazko-intel

  1. Single operator/kernel implementation.
    If you want to validate a single op/kernel implement accuracy, we have a small operator test framework
    to do the job. Take the mul_op for example,

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/test_mul_op.py#L33

class TestMulOp(OpTest):
    def setUp(self):
        self.op_type = "mul"
        self.inputs = {
            'X': np.random.random((32, 84)).astype("float32"),
            'Y': np.random.random((84, 100)).astype("float32")
        }
        self.outputs = {'Out': np.dot(self.inputs['X'], self.inputs['Y'])}

    def test_check_output(self):
        self.check_output()

    def test_check_grad_normal(self):
        self.check_grad(['X', 'Y'], 'Out', max_relative_error=0.5)

In the forward, we compare the result with numpy.
In the backward, we use the gradient checking to validate the result. max_relative_error means the maximum tolerable error value of the gradient.
http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization

  1. Composed operators as a model.
    We need to check the convergence rate of the model. Given the same randomness, see if two models can go to the same position in critical conditions.

@dzhwinter
Copy link
Contributor

For the benchmark model, you mentioned, sorry for the inconvenient.
Recently we are focused on optimizing the kernel implement and saving memory in our framework. So the markdown is not updated in time. We plan to release the benchmark result at the end of this week.

The benchmark result is organized as below.

  1. Model convergence rate comparison.
    We provide the training curves in VisualDL graph https://github.com/PaddlePaddle/Paddle/VisualDL
    Then we can check every minibatch accuracy/loss during the training process. After release, this result will be shown at paddlepaddle.org website.

  2. Model speed comparison and some
    We compare the training speed in different batch size, to measure the framework performance in different situations.
    https://docs.google.com/spreadsheets/d/12enqFEnw3lJrovdVS3By641zonqySl5bUR3bp1gKlOQ/edit#gid=2045395118
    This excel will be updated to the benchmark repo.
    @pzelazko-intel

Here is the releases notes, we can track the release process at #8533

@pzelazko-intel
Copy link
Contributor

Thank you, @dzhwinter
I actually have unit tests for single kernels:
https://github.com/PaddlePaddle/Paddle/pull/8451/files#diff-d3ce28c8e4cf715ded3772881fc24aa7R224 - files test_conv2d_op.py and test_pool2d_op.py

Also, @jczaja ran tests on mnist dataset with MKLDNN kerneles and it converged to the values obtained in caffe framework.

@luotao1 I can set use_mkldnn flag to False by default for now and can change it int hte future when MKLDNN kernels will be more mature? What's your opinion?

@luotao1
Copy link
Contributor Author

luotao1 commented Mar 16, 2018

@pzelazko-intel sorry to reply late.
Yes, you can changeuse_mkldnn flag to True when MKLDNN kernels are more mature in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants