Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Upgrade 3rdparty/openmp to release_90 version #17012

Merged
merged 1 commit into from Dec 9, 2019

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Dec 8, 2019

Description

Fixes #10856

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Upgrade 3rdparty/openmp to release_90 version

Comments

See #10856 for a discussion on the symptom. It seems it's due to a bug in llvm openmp fixed in the meantime.

The previously failing assertion is still part of the release_90 version. It is at https://github.com/llvm-mirror/openmp/blob/release_90/runtime/src/kmp_runtime.cpp#L6616
But the assertion does not fail anymore (based on my test with cmake -DUSE_CUDA=0 -DUSE_MKLDNN=0 -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DBUILD_CYTHON_MODULES=1 -DUSE_OPENCV=1 -DCMAKE_BUILD_TYPE=Debug .., which leads to assertion failure without this PR: #10856 (comment))

You can compare the lines in blame mode and conclude they are the same

https://github.com/llvm-mirror/openmp/blame/37c72127e90360a020f351f18d9cccfc30e5145a/runtime/src/kmp_runtime.cpp#L6481
https://github.com/llvm-mirror/openmp/blame/release_90/runtime/src/kmp_runtime.cpp#L6616

Copy link
Member

@cjolivier01 cjolivier01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with this after we can resolve that there’s no bug in mxnet which is causing the assert which led to this PR.
Also would be interesting to note if the assert is still there in this version and if not, why was it removed (ie git blame PR)?

@leezu
Copy link
Contributor Author

leezu commented Dec 9, 2019

Please see the last line of the PR description above. Given that the assert is still there and it is not failing anymore I think we can establish that there is no bug in MXNet. Thus I can't follow your reasoning that there's a bug in MXNet.
I think it's more likely that there was a bug in the 2 year old openmp version.

Then please rescind your veto or give some evidence that there is a bug. Currently I consider this to be a veto without technical justification.

Thanks!

@leezu
Copy link
Contributor Author

leezu commented Dec 9, 2019

Force push to retrigger CI. Failed with Github API error for 5/11 jobs (other 6 passed):

Pull request #17012 opened

Connecting to https://api.github.com using anirudh2290/****** (anirudh2290/****** (Anirudh Subramanians personal access token to checkout the public MXNet repository. https://github.com/settings/tokens/324757639))

ERROR: Error while retrieving pull request 17012 merge hash : org.kohsuke.github.HttpException: Server returned HTTP response code: -1, message: 'null' for URL: https://api.github.com/repos/apache/incubator-mxnet/commits/83a782e27e6a6b1f9d093a374ef042ba1994ea79

Finished: FAILURE

CC @marcoabreu

@leezu
Copy link
Contributor Author

leezu commented Dec 9, 2019

@leezu leezu merged commit e40cceb into apache:master Dec 9, 2019
@leezu leezu deleted the fix3rdpartyopenmp branch December 9, 2019 08:19
@cjolivier01
Copy link
Member

cjolivier01 commented Dec 9, 2019 via email

@leezu
Copy link
Contributor Author

leezu commented Dec 9, 2019

Sorry, that wasn't my intention. I just wanted to reference your approval on the mailinglist.

leezu added a commit to leezu/mxnet that referenced this pull request Dec 17, 2019
OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
apache#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.
leezu added a commit to leezu/mxnet that referenced this pull request Dec 17, 2019
OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
apache#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.
leezu added a commit to leezu/mxnet that referenced this pull request Dec 18, 2019
OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
apache#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.
szha pushed a commit that referenced this pull request Dec 23, 2019
* Disable OpenMP offloading support for 3rdparty/openmp

OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.

* Update CMake on CI
leezu added a commit to leezu/mxnet that referenced this pull request Dec 30, 2019
leezu added a commit to leezu/mxnet that referenced this pull request Dec 30, 2019
* Disable OpenMP offloading support for 3rdparty/openmp

OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
apache#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.

* Update CMake on CI
leezu added a commit that referenced this pull request Dec 30, 2019
* Upgrade 3rdparty/openmp to release_90 version (#17012)

Fixes #10856

* Fix omp assert issue (#17039)

* Disable OpenMP offloading support for 3rdparty/openmp (#17098)

* Disable OpenMP offloading support for 3rdparty/openmp

OpenMP offloading was introduced some time during the past two years and is
enabled by default. With upgrading 3rdparty/openmp in
#17012
it was made part of the MXNet CMake build.

But we don't use OpenMP offloading and the Cuda target in the llvm OpenMP
Offloading build is broken in our setting.

* Update CMake on CI

* Fix reshape interoperability test (#17155)

* fix reshape interoperability test

* fix for scipy import

Co-authored-by: Chris Olivier <cjolivier01@gmail.com>
Co-authored-by: Hao Jin <hjjn.amzn@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
3 participants