Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[BUGFIX] Ubuntu 20.04: support for in-distro Intel MKL libraries #19766

Merged
merged 2 commits into from
Feb 2, 2021

Conversation

akarbown
Copy link
Contributor

@akarbown akarbown commented Jan 19, 2021

Description

Since Ubuntu 20.04 Intel MKL packages are available in the distribution. Thus, its search could be simplified by using the unified cmake FindBLAS function.

Changes

  • add support for intel-mkl distribution version;
  • use FindBLAS cmake function to look for the intel-mkl;
  • add FindBLAS.cmake file upstream version (due to Intel10_64_dyn &
    --start-group/--end-gropup options support)
  • remove FindMKL.cmake;
  • preserve Ubuntu 18.04 images for TensorRT pipeline;
  • preserve MKL libraries linking support on non-Ubuntu 20.04 OSes
    (w/o FindMKL.cmake file).

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Comments

Reconsidering MKL_USE_SINGLE_DYNAMIC_LIBRARY option as the potential solution for 18255. This needs more investigation.

@mxnet-bot
Copy link

Hey @akarbown , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, centos-cpu, unix-gpu, centos-gpu, windows-cpu, unix-cpu, miscellaneous, website, sanity, windows-gpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-work-in-progress PR is still work in progress label Jan 19, 2021
@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@leezu
Copy link
Contributor

leezu commented Jan 22, 2021

@akarbown the CI fails because the tensorrt gpu build uses Ubuntu 18.04. The reason is that TensorRT was not available for 20.04 yet (see https://gitlab.com/nvidia/container-images/cuda/-/issues/99)

But for 18.04 intel-mkl is not available, so you may need to add a bash if/else statement to install intel-mkl if base image is 20.04 and rely on the apt repo otherwise.

[2021-01-21T18:40:17.268Z] E: Unable to locate package intel-mkl

[2021-01-21T18:40:18.193Z] Service 'ubuntu_tensorrt_cu111' failed to build: The command '/bin/sh -c export DEBIAN_FRONTEND=noninteractive &&     apt-get update &&     apt-get install -y wget software-properties-common &&     apt-get update &&     apt-get install -y         curl         unzip         pandoc         build-essential         ninja-build         git         protobuf-compiler         libprotobuf-dev         default-jdk         clang-6.0         python-yaml         clang-10         clang-tidy-10         g++         g++-7         g++-8         intel-mkl         libomp-dev         libgomp1         libturbojpeg0-dev         libcurl4-openssl-dev         libatlas-base-dev         libzmq3-dev         libopencv-dev         libxml2-dev         numactl         libnuma-dev         python3         python3-pip         doxygen         pandoc         autoconf         gperf         libb2-dev         libzstd-dev         gfortran &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

Another option is to wait until nvidia makes tensorrt available on 20.04

@akarbown
Copy link
Contributor Author

@akarbown the CI fails because the tensorrt gpu build uses Ubuntu 18.04. The reason is that TensorRT was not available for 20.04 yet (see https://gitlab.com/nvidia/container-images/cuda/-/issues/99)

But for 18.04 intel-mkl is not available, so you may need to add a bash if/else statement to install intel-mkl if base image is 20.04 and rely on the apt repo otherwise.

I thought that it would be able to remove apt repo.

[2021-01-21T18:40:17.268Z] E: Unable to locate package intel-mkl

[2021-01-21T18:40:18.193Z] Service 'ubuntu_tensorrt_cu111' failed to build: The command '/bin/sh -c export DEBIAN_FRONTEND=noninteractive &&     apt-get update &&     apt-get install -y wget software-properties-common &&     apt-get update &&     apt-get install -y         curl         unzip         pandoc         build-essential         ninja-build         git         protobuf-compiler         libprotobuf-dev         default-jdk         clang-6.0         python-yaml         clang-10         clang-tidy-10         g++         g++-7         g++-8         intel-mkl         libomp-dev         libgomp1         libturbojpeg0-dev         libcurl4-openssl-dev         libatlas-base-dev         libzmq3-dev         libopencv-dev         libxml2-dev         numactl         libnuma-dev         python3         python3-pip         doxygen         pandoc         autoconf         gperf         libb2-dev         libzstd-dev         gfortran &&     rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

Another option is to wait until nvidia makes tensorrt available on 20.04

ok, thanks!
I was just trying to test:

@@ -95,7 +95,7 @@ services:
       dockerfile: Dockerfile.build.ubuntu
       target: gpu
       args:
-        BASE_IMAGE: nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
+        BASE_IMAGE: nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04
       cache_from:
         - ${DOCKER_CACHE_REGISTRY}/build.ubuntu_tensorrt_cu111:latest
   ubuntu_gpu_cu111:

but it seems that there is no sense to check it, isn't it?

@leezu
Copy link
Contributor

leezu commented Jan 22, 2021

but it seems that there is no sense to check it, isn't it?

Yes, until https://gitlab.com/nvidia/container-images/cuda/-/issues/99 it won't work.

cc @TristonC as missing TensorRT for Ubuntu 20.04 is causing issues in this PR.

@akarbown akarbown force-pushed the akarbown-ubuntu_20_04 branch 3 times, most recently from 402a2b1 to a182d89 Compare January 27, 2021 22:38
@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [sanity]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [sanity]

@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@akarbown
Copy link
Contributor Author

@mxnet-bot run ci [miscellaneous]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [miscellaneous]

Copy link
Contributor

@leezu leezu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@akarbown akarbown changed the title [WIP] Ubuntu 20.04: support for in-distro Intel MKL libraries [FEATURE] Ubuntu 20.04: support for in-distro Intel MKL libraries Feb 1, 2021
@akarbown akarbown changed the title [FEATURE] Ubuntu 20.04: support for in-distro Intel MKL libraries [BUGFIX] Ubuntu 20.04: support for in-distro Intel MKL libraries Feb 1, 2021
@lanking520 lanking520 removed the pr-work-in-progress PR is still work in progress label Feb 1, 2021
@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 1, 2021
@akarbown
Copy link
Contributor Author

akarbown commented Feb 2, 2021

@mxnet-bot run ci [centos-gpu, unix-cpu, unix-gpu, website, windows-gpu, windows-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-cpu, unix-cpu, windows-gpu, centos-gpu, website, unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 2, 2021
Ubuntu 20.04:
  - add support for intel-mkl distribution version;
  - use FindBLAS cmake function to look for the intel-mkl;
  - add FindBLAS.cmake file upstream version (due to Intel10_64_dyn &
    --start-group/--end-gropup options support)
  - remove FindMKL.cmake;
  - preserve Ubuntu 18.04 images for TensorRT pipeline;
  - preserve MKL libraries linking support on non-Ubuntu 20.04 OSes
    (w/o FindMKL.cmake file).
@akarbown akarbown changed the title [BUGFIX] Ubuntu 20.04: support for in-distro Intel MKL libraries [WIP] Ubuntu 20.04: support for in-distro Intel MKL libraries Feb 2, 2021
@leezu
Copy link
Contributor

leezu commented Feb 2, 2021

@akarbown as you marked this WIP, can you clarify what more you like to include?

@akarbown
Copy link
Contributor Author

akarbown commented Feb 2, 2021

@akarbown as you marked this WIP, can you clarify what more you like to include?

I just wanted to make sure that after I squashed the commits into one all the tests passes without issues.

@leezu
Copy link
Contributor

leezu commented Feb 2, 2021

I just wanted to make sure that after I squashed the commits into one all the tests passes without issues.

Thank you. Actually we can squash the commits automatically during "merge". I think the tests passed previously, but now there seems to be an issue with np.linspace? I think it's due to the new numpy release and to workaround the issue you can force numpy<1.20.0

@akarbown akarbown changed the title [WIP] Ubuntu 20.04: support for in-distro Intel MKL libraries [BUGFIX] Ubuntu 20.04: support for in-distro Intel MKL libraries Feb 2, 2021
@leezu
Copy link
Contributor

leezu commented Feb 2, 2021

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@lanking520 lanking520 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Feb 2, 2021
@leezu leezu merged commit f9d90c9 into apache:master Feb 2, 2021
@access2rohit access2rohit mentioned this pull request Feb 17, 2021
13 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants