Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Use CMake standard library to handle CUDA #17031

Merged
merged 1 commit into from Dec 30, 2019
Merged

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Dec 10, 2019

Description

Use CMake standard library to handle CUDA.
Makes our code more maintainable and may fix bugs, because we rely on code tested by all CMake users, instead of the small subset of MXNet CMake users.

Include FindCUDAToolkit that replaces the deprecated FindCUDA module as of CMake 1.17.
https://gitlab.kitware.com/cmake/cmake/merge_requests/4093

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Refactor cuda handling in cmake build

Comments

@leezu leezu requested a review from szha as a code owner December 10, 2019 02:23
@leezu leezu force-pushed the cmakerefactorcuda branch 2 times, most recently from 5e35717 to 24e7209 Compare December 10, 2019 04:51
CMakeLists.txt Outdated Show resolved Hide resolved
@yajiedesign
Copy link
Contributor

Consider integration #16980 and update cmake to 3.15.5.

@leezu
Copy link
Contributor Author

leezu commented Dec 10, 2019

I'll integrate the changes of #16980 once it's merged. Or would you like me to do it directly?
I'm currently not sure why edge build is failing. So #16980 may be merged first.

@yajiedesign
Copy link
Contributor

Currently #16980 is waiting for ci to upgrade cmake. I don't know when the upgrade is complete.

@yajiedesign
Copy link
Contributor

i see apache/mxnet-ci#17 is merged.When can upgrade complete? I'm not sure about this process.

@leezu
Copy link
Contributor Author

leezu commented Dec 10, 2019

@yajiedesign I think the change will be picked up within 7 days. But @marcoabreu may provide more details (or can force the change to be picked up earlier).

The current PR is also waiting for apache/mxnet-ci#17 to be picked up.

@larroy
Copy link
Contributor

larroy commented Dec 11, 2019

i see apache/incubator-mxnet-ci#17 is merged.When can upgrade complete? I'm not sure about this process.

The change should be picked up as soon as you merge this from master in your branch.

@leezu
Copy link
Contributor Author

leezu commented Dec 12, 2019

i see apache/incubator-mxnet-ci#17 is merged.When can upgrade complete? I'm not sure about this process.

The change should be picked up as soon as you merge this from master in your branch.

What do you mean? It's for the Windows images used by the CI. When will the CI use the updated images?

@leezu leezu force-pushed the cmakerefactorcuda branch 4 times, most recently from f1c0dbe to 37c9680 Compare December 12, 2019 06:47
@leezu
Copy link
Contributor Author

leezu commented Dec 12, 2019

@marcoabreu can you share the date when mxnet-CI will use the updated Windows AMI?

@larroy
Copy link
Contributor

larroy commented Dec 12, 2019

i see apache/incubator-mxnet-ci#17 is merged.When can upgrade complete? I'm not sure about this process.

The change should be picked up as soon as you merge this from master in your branch.

What do you mean? It's for the Windows images used by the CI. When will the CI use the updated images?

Oh yes I got confused sorry. We would have to update the AMIs used by CI. This needs to be done by an Amazon engineer. Marco is not involved.

@leezu
Copy link
Contributor Author

leezu commented Dec 12, 2019

Ok, thank you. I saw Marco mentioned this is done automatically every 7 days (https://github.com/apache/incubator-mxnet-ci/blob/master/services/jenkins-slave-creation-windows/README.md). Is it wrong?
@larroy could you help to get the update done? Thank you

@leezu leezu force-pushed the cmakerefactorcuda branch 2 times, most recently from 19088f7 to 08017fe Compare December 12, 2019 15:19
@larroy
Copy link
Contributor

larroy commented Dec 12, 2019

I passed on your request to the engineering team.

@leezu leezu force-pushed the cmakerefactorcuda branch 2 times, most recently from a94bbf8 to e8163e6 Compare December 13, 2019 14:08
@leezu leezu force-pushed the cmakerefactorcuda branch 4 times, most recently from 899277f to 80d3e7a Compare December 17, 2019 14:30
@larroy
Copy link
Contributor

larroy commented Dec 18, 2019

Why are the dockcross images related to CMake changes??

@leezu
Copy link
Contributor Author

leezu commented Dec 18, 2019

Because the edge build is currently completely broken and only keeps on working in some special settings due to caching effects. Updating cmake on the current dockcross images is one case that will break the caching effects and thus disable the build. Updating the dockcross images fixes the edge build, and further the updated version ships with cmake 13.

It's not necessary to merge the dockcross change as part of this PR. But until the edge build is not fixed in master, the dockcross update is a necessary part of this PR.

@leezu leezu added the pr-awaiting-review PR is waiting for code review label Dec 18, 2019
@leezu leezu added this to In progress in MXNet 2.0 via automation Dec 18, 2019
@leezu
Copy link
Contributor Author

leezu commented Dec 23, 2019

Rebased after #17098 got merged.

@leezu
Copy link
Contributor Author

leezu commented Dec 26, 2019

Ping for review

@szha @yajiedesign or anyone willing

Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures.
Previously cuda architecture setting was partially broken and different options
were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN
CUDA_ARCH_PTX and CUDA_ARCH_LIST).

Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA
functionality for finding the cuda toolkit include directories and libraries.
@szha szha merged commit 230ceee into apache:master Dec 30, 2019
MXNet 2.0 automation moved this from In progress to Done Dec 30, 2019
@leezu leezu deleted the cmakerefactorcuda branch December 30, 2019 09:54
yajiedesign pushed a commit to yajiedesign/mxnet that referenced this pull request Jan 4, 2020
Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures.
Previously cuda architecture setting was partially broken and different options
were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN
CUDA_ARCH_PTX and CUDA_ARCH_LIST).
Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA
functionality for finding the cuda toolkit include directories and libraries.

Workaround for DLL size limitation on Windows (apache#16980)
* change windows build system.
add gen_warp cpp version
add add_custom_command to run warp_gen
add download cmake
add option
change option
add dynamic read mxnet dll
yajiedesign pushed a commit to yajiedesign/mxnet that referenced this pull request Jan 5, 2020
Switch to modern CMake CUDA handling (apache#17031)

Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures.
Previously cuda architecture setting was partially broken and different options
were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN
CUDA_ARCH_PTX and CUDA_ARCH_LIST).

Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA
functionality for finding the cuda toolkit include directories and libraries.

Workaround for DLL size limitation on Windows (apache#16980)

* change windows build system.
add gen_warp cpp version
add add_custom_command to run warp_gen
add download cmake
add option
change option
add dynamic read mxnet dll
yajiedesign pushed a commit to yajiedesign/mxnet that referenced this pull request Jan 5, 2020
…ranch

Fix CUDNN detection for CMake build (apache#17019)

Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (apache#17018)

Switch to modern CMake CUDA handling (apache#17031)

Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures.
Previously cuda architecture setting was partially broken and different options
were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN
CUDA_ARCH_PTX and CUDA_ARCH_LIST).

Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA
functionality for finding the cuda toolkit include directories and libraries.

Workaround for DLL size limitation on Windows (apache#16980)

* change windows build system.
add gen_warp cpp version
add add_custom_command to run warp_gen
add download cmake
add option
change option
add dynamic read mxnet dll
ptrendx pushed a commit that referenced this pull request Jan 5, 2020
Fix CUDNN detection for CMake build (#17019)

Replace mxnet_option macro with standard CMAKE_DEPENDENT_OPTION (#17018)

Switch to modern CMake CUDA handling (#17031)

Introduce unified MXNET_CUDA_ARCH option to specify cuda architectures.
Previously cuda architecture setting was partially broken and different options
were applied to different parts of the build (CUDA_ARCH_NAME CUDA_ARCH_BIN
CUDA_ARCH_PTX and CUDA_ARCH_LIST).

Include FindCUDAToolkit from CMake 3.17, which replaces the deprecated FindCUDA
functionality for finding the cuda toolkit include directories and libraries.

Workaround for DLL size limitation on Windows (#16980)

* change windows build system.
add gen_warp cpp version
add add_custom_command to run warp_gen
add download cmake
add option
change option
add dynamic read mxnet dll

Co-authored-by: Leonard Lausen <leonard@lausen.nl>
endif()


get_filename_component(CUDAToolkit_ROOT_DIR ${CUDAToolkit_BIN_DIR} DIRECTORY ABSOLUTE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this has been included from https://gitlab.kitware.com/cmake/cmake/merge_requests/4093
However, I'm confused between CUDAToolkit_ROOT_DIR and CUDA_TOOLKIT_ROOT_DIR that's used at rest of the places in cmake in our codebase.
Are they both pointing to same thing? @leezu Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be insensitive to capitalization

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CMake CMake related bugs/issues/improvements pr-awaiting-review PR is waiting for code review
Projects
MXNet 2.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants