-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade MKL-DNN commit #15116
Upgrade MKL-DNN commit #15116
Conversation
…convolutions; test=develop
http://ci.paddlepaddle.org/downloadBuildLog.html?buildId=43515&plain=true [03:43:23]W: [Step 1/1] /paddle/build/third_party/ngraph/src/extern_ngraph/src/ngraph/runtime/cpu/mkldnn_emitter.cpp:810:51: error: expected type-specifier @baojun-nervana We want to upgrade MKL-DNN commit, which requires relu primitive should be changed to elementwise. Could you please update the code accordingly and include this PR? Thanks. |
@luotao1 Will try to reproduce locally first. |
Confirmed with MKL-DNN team, MKL-DNN breaks compilation due to new packed BLAS API change (which is not available in earlier versions of Intel MKL). So, we have to upgrade MKLML to: https://github.com/intel/mkl-dnn/releases/download/v0.17.2/mklml_lnx_2019.0.1.20181227.tgz Is it possible to upgrade MKLML first? @luotao1 |
@hshen14 I will upgrade MKLML at first. |
PR_Windows_CI fail: [01:05:12] : [Step 2/5] C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\Microsoft.CppCommon.targets(171,5): error MSB6006: “cmd.exe”已退出,代码为 1。 [D:\home\BuildAgent\work\a9b0372f0aea0a80\build\python\paddle_python.vcxproj] Another GPU failure? local_stderr: b'W0109 01:57:43.922690 30317 init.cc:121] AVX is available, Please re-compile on local machine\nW0109 01:57:49.484200 30317 device_context.cc:257] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.0, Runtime API Version: 8.0\nW0109 01:57:49.484375 30317 device_context.cc:265] device: 0, cuDNN Version: 7.0.\n\n\n\n\n\n\nTraceback (most recent call last):\n File "dist_se_resnext.py", line 258, in \n runtime_main(DistSeResneXt2x2)\n File "/paddle/build/python/paddle/fluid/tests/unittests/test_dist_base.py", line 204, in runtime_main\n model.run_trainer(args)\n File "/paddle/build/python/paddle/fluid/tests/unittests/test_dist_base.py", line 164, in run_trainer\n feed=feeder.feed(get_data()))\n File "/paddle/build/python/paddle/fluid/tests/unittests/test_dist_base.py", line 151, in get_data\n origin_batch = next(reader_generator)\n File "/paddle/build/python/paddle/batch.py", line 35, in batch_reader\n for instance in r:\n File "/paddle/build/python/paddle/reader/decorator.py", line 52, in reader\n for e in map(func, *rs):\n File "/paddle/build/python/paddle/dataset/flowers.py", line 130, in reader\n data = batch['data']\nKeyError: 'data'\n' |
The rest result seems fail with different symptom after several re-runs. Is there any way to make the infrastructure more stable? @luotao1 |
Need help from @yinghu5 |
Yes, MKL DNN team decide to move the destination folder from lib to bin/ as it is aligned with general binary libraries distribution. The change will be done in current master and later version (after V.17.2) |
Seems Windows test fail: [08:30:43]W: [Step 4/5] Errors while running CTest |
@wopeizl adds the MKLDNN bin/ into CI configure path and reruns the windows CI. |
It seems that MKL has random diff. |
The earliest MKL diff is in Dec 15th. @yinghu5 @bingyanghuang @yihuaxu
|
@hshen14 I rerun the PR_CI again, and the new error is:
|
http://ci.paddlepaddle.org/viewType.html?buildTypeId=Paddle_PrCi&branch_Paddle=15116&tab=buildTypeStatusDiv |
@luotao1 One Windows failure - rerun now: @panyx0718 Need your approval for cmake external change introduced by MKL-DNN change http://ci.paddlepaddle.org/viewLog.html?buildId=55852&buildTypeId=Paddle_PrCi&tab=buildLog |
@luotao1 @kbinias @jianhang-liu @yinghu5 After discussion, we agreed to have two-stage MKL-DNN upgrade. @kbinias's team was working on validation with MKL-DNN v0.18rc and will prepare a PR after successful validation. I will keep this PR open for reference and close it when the new PR with v0.18rc is submitted. |
The new MKL-DNN upgrade was WIP: #15861 Close this intermediate PR. |
The commit includes the optimizations for:
test=develop