Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing MKL to softmax for inference #14437

Merged
merged 3 commits into from Nov 20, 2018

Conversation

Projects
None yet
3 participants
@jczaja
Copy link
Contributor

commented Nov 15, 2018

This PR is introducing MKL based execution of softmax operator.

Capi DAM test's profiling shows ~2 times improvement in softmax op execution with this optimization.
Num threads: 1
Batch: 1,8,32,128
Platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz

Notes:

  • Optimization is enabled when Paddle is configured with: ON_INFER = ON flag
  • To have unit test for it , just build with ON_INFER=ON and run test_softmax_op.py

@jczaja jczaja requested a review from luotao1 Nov 15, 2018

@jczaja jczaja force-pushed the jczaja:prv-softmax-mkl branch from 3a04fb6 to 616965e Nov 15, 2018

@jczaja jczaja requested review from tpatejko and removed request for tpatejko Nov 15, 2018

@jczaja jczaja force-pushed the jczaja:prv-softmax-mkl branch from 616965e to e64ad52 Nov 15, 2018

@luotao1 luotao1 added the Intel label Nov 16, 2018

Squashing MKL based softmax for inference
test=develop

- Added profiling to softmax functors

- MKL based softmax inference op

- Fix to softmax compuation via MKL

- cleaning

- Cosmetic fixes to softmax MKL

- Fix to ON_INFER lack of propagation

@jczaja jczaja force-pushed the jczaja:prv-softmax-mkl branch from e64ad52 to 513bb6c Nov 16, 2018

- Fix to GPU
test=develop

@jczaja jczaja force-pushed the jczaja:prv-softmax-mkl branch from 3a5a17b to be80bb4 Nov 16, 2018

@jczaja jczaja requested a review from tpatejko Nov 18, 2018

#ifdef PADDLE_ON_INFERENCE
math::SoftmaxFunctor<
DeviceContext, T,
std::is_same<DeviceContext, platform::CPUDeviceContext>::value>()(
context.template device_context<DeviceContext>(), &X_2d, &Out_2d);

This comment has been minimized.

Copy link
@tpatejko

tpatejko Nov 19, 2018

Contributor

Why is this std::is_same needed? The result of this expression sets is_test template parameter. The expression checks if DeviceContext is of type CPUDeviceContext. The same is done by std::enable_if in the preceding code. One of them seems to be redundant.

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

You are correct, that condition is left over from template experiemnts. Let me remove it in my next softmax for inference PR

@@ -19,7 +19,8 @@ namespace paddle {
namespace operators {
namespace math {

template <typename DeviceContext, typename T, bool is_test>
template <typename DeviceContext, typename T, bool is_test,

This comment has been minimized.

Copy link
@tpatejko

tpatejko Nov 19, 2018

Contributor

Suggestion just for the future reference. You can safely ignore it.

Using the boolean is_test flag makes code a bit difficult to follow due to constant values true or false in the specializations of the class.

Maybe in future you could try to experiment with property type for inference defined along these lines:
struct is_test {}; and use it whenever type specific for inference is used.

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

Interesting suggesstion will look to it closer on my next PR related to softmax for inference

#TODO(luotao), combine this warning with `make inference_lib_dist` command.
message(WARNING "On inference mode, will take place some specific optimization. Turn on the ON_INFER flag when building inference_lib only.")
endif()

This comment has been minimized.

Copy link
@tpatejko

tpatejko Nov 19, 2018

Contributor

This code is partially present in the main CMakeLists.txt file:

Paddle/CMakeLists.txt

Lines 316 to 321 in f4c869d

if (ON_INFER)
message(STATUS "On inference mode, will take place some specific optimization.")
else()
#TODO(luotao), combine this warning with `make inference_lib_dist` command.
message(WARNING "On inference mode, will take place some specific optimization. Turn on the ON_INFER flag when building inference_lib only.")
endif()

Is it necessary to copy just to set an additional define?

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

I do not quite understand. I moved quoted section from very bottom of main CMakeLists.txt to a bit above in the same CMakeLists.txt. I cannot see any redundancy here

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 19, 2018

Contributor

@tpatejko These codes are only moved, not copied.

This comment has been minimized.

Copy link
@tpatejko

tpatejko Nov 19, 2018

Contributor

My mistake. I haven't noticed that.

@@ -19,7 +19,8 @@ namespace paddle {
namespace operators {
namespace math {

template <typename DeviceContext, typename T, bool is_test>
template <typename DeviceContext, typename T, bool is_test,
typename Enable = void>

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 19, 2018

Contributor

Enable: Could you rename this parameter name to make it much clearer?

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

Ok, I will change a name of mentioned template param in next PR. Do you have any suggestions which name would be clearer?

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 20, 2018

Contributor

How about rename to enable_if_CPU?

const int batch_size = in_dims[kBatchDim];
const int num_classes = in_dims[kClassDim];
std::vector<float> entities(batch_size);
auto blas = math::GetBlas<DeviceContext, float>(context);

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 19, 2018

Contributor

how about int xxx= n * num_classes outside of forto make less compute?

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

From my experience, compiler can detect such computation to be done many times on same data and put it automatically out of loop

for (int c = 1; c < num_classes; ++c) {
entities[n] = in_data[n * num_classes + c] > entities[n]
? in_data[n * num_classes + c]
: entities[n];

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 19, 2018

Contributor

for line91-93, does following codes much faster?

if(n_data[n * num_classes + c] > entities[n]) entities[n] = in_data[n * num_classes + c]

Besides, line89-93 is used for finding max of in_data[n * num_classes + c] (c=0->num_classes), could these lines use cblas_i?amax function?

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

No, I do not think we can use that. cblas_i?amax searches for maximal absolute value element. And At that moment we are before exponenting array so there may be both negative and positive values. So we could get diffrent value got as maximum values chosen than in orginal code

entities[n] = out_data[n * num_classes];
for (int c = 1; c < num_classes; ++c) {
entities[n] += out_data[n * num_classes + c];
}

This comment has been minimized.

Copy link
@luotao1

luotao1 Nov 19, 2018

Contributor

line 103-106 are used for sum out_data[n * num_classes + c] (c=0->num_classes), could these line use cblas-asum?

This comment has been minimized.

Copy link
@jczaja

jczaja Nov 19, 2018

Author Contributor

Yes at that situation we could do that but I'm working on next part of optimization ot it eg. enhancing vectorization using OpenMP simd directives. I will check if cblas-asum is better than openmp simd code and present faster one in next PR

@luotao1

This comment has been minimized.

Copy link
Contributor

commented Nov 19, 2018

how do you think #14437 (comment)? @ jczaja

@luotao1 luotao1 merged commit 1b894e4 into PaddlePaddle:develop Nov 20, 2018

4 checks passed

PR_CI (Paddle) TeamCity build finished
Details
PR_CI_python35 (Paddle) TeamCity build finished
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
@luotao1

This comment has been minimized.

Copy link
Contributor

commented Nov 20, 2018

Looking forward to your next PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.