Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demostration of cmake refine for HIP support. #9165

Merged
merged 3 commits into from
Mar 22, 2018

Conversation

sabreshao
Copy link
Contributor

  1. Add option WITH_AMD_GPU.
  2. Add cmake/hip.cmake for HIP toolchain.
  3. Some external module such as eigen may need HIP port.
  4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake.
  5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.

1. Add option WITH_AMD_GPU.
2. Add cmake/hip.cmake for HIP toolchain.
3. Some external module such as eigen may need HIP port.
4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake.
5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.
@CLAassistant
Copy link

CLAassistant commented Mar 16, 2018

CLA assistant check
All committers have signed the CLA.

CMakeLists.txt Outdated
@@ -69,6 +70,9 @@ if(NOT CMAKE_BUILD_TYPE)
FORCE)
endif()

if(WITH_AMD_GPU)
endif()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 73-74 could be removed since there is nothing between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

INSTALL_COMMAND ""
TEST_COMMAND ""
)
INCLUDE_DIRECTORIES(${EIGEN_SOURCE_DIR}/src/extern_eigen3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(WITH_AMD_GPU)
    SET(GIT_REPOSITORY "https://github.com/sabreshao/hipeigen.git")
    SET(GIT_TAG 0cba03ff9f8f9f70bbd92ac5857b031aa8fed6f9)
else()
    SET(GIT_REPOSITORY  "https://github.com/RLovelett/eigen.git")
    SET(GIT_TAG         0cba03ff9f8f9f70bbd92ac5857b031aa8fed6f9)
endif()
ExternalProject_Add(
    extern_eigen3
    ${EXTERNAL_PROJECT_LOG_ARGS}
     GIT_REPOSITORY ${GIT_REPOSITORY}
     GIT_TAG ${GIT_TAG}
     ...
)

may be more cleaner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In AMD GPU build, use hipeigen instead since it includes necessary patches for HIP.

include_directories("/opt/rocm/hiprand/include")
include_directories("/opt/rocm/rocrand/include")
include_directories("/opt/rocm/rccl/include")
include_directories("/opt/rocm/thrust")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could /opt/rocm/include etc be a relative path like ${CUDA_INCLUDE_DIRS}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until now no such thing is defined in HIP/ROCm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't have such define. NV defines /usr/local/cuda as the symbolic link, that's why they need those env vars so users can define those to point to the absolute CUDA paths.

cmake/hip.cmake Outdated
list(APPEND HIP_HCC_FLAGS ${CMAKE_CXX_FLAGS_DEBUG})
elseif(CMAKE_BUILD_TYPE STREQUAL "Release")
# Disable optimization since one eigen symbol will be removed in math_function.cu
#list(APPEND HIP_HCC_FLAGS ${CMAKE_CXX_FLAGS_RELEASE})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 31-32 could be removed if not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -76,6 +76,9 @@ function(op_library TARGET)
if (WITH_GPU)
nv_library(${TARGET} SRCS ${cc_srcs} ${cu_cc_srcs} ${cudnn_cu_cc_srcs} ${mkldnn_cc_srcs} ${cu_srcs} DEPS ${op_library_DEPS}
${op_common_deps})
elseif (WITH_AMD_GPU)
hip_library(${TARGET} SRCS ${cc_srcs} ${hip_cc_srcs} ${miopen_cu_cc_srcs} ${mkldnn_cc_srcs} ${hip_srcs} DEPS
${op_library_DEPS} ${op_common_deps})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the definitions of ${hip_cc_srcs}, ${miopen_cu_cc_srcs} and ${hip_srcs}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refined paddle/fluid/operators/CMakeLists.txt .

@@ -1,9 +1,16 @@
if(WITH_PYTHON)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of updating this pybind/CMakeLists.txt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason of updating pybind/CMakeLists.txt is that shared library which refers any HIP kernel must be linked with hipcc rather than gcc.

1. Add option WITH_AMD_GPU.
2. Add cmake/hip.cmake for HIP toolchain.
3. Some external module such as eigen may need HIP port.
4. Add macro hip_library/hip_binary/hip_test to cmake/generic.cmake.
5. Add one HIP source concat.hip.cu as an example. Each .cu may have its corresponding .hip.cu.
reyoung
reyoung previously approved these changes Mar 21, 2018
Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are some tiny comments. However, do not let them block this PR.

include_directories("/opt/rocm/rccl/include")
include_directories("/opt/rocm/thrust")

list(APPEND EXTERNAL_LIBS "-L/opt/rocm/lib/ -lhip_hcc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many hard codes in this file. Maybe we can modify the FindHip.cmake after this PR?


list(APPEND EXTERNAL_LIBS "-L/opt/rocm/lib/ -lhip_hcc")

set(HIP_HCC_FLAGS "${HIP_HCC_FLAGS} -fPIC -DPADDLE_WITH_HIP -std=c++14" )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use c++14 by default? Paddle is currently using c++11

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sabreshao , could you check the c++14 option here? This is not required by HIP as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunway513 @reyoung c++14 is to enable "function return type deduction in lambda", which may be needed in following PR.

@helinwang
Copy link
Contributor

@sabreshao thanks for the PR!

The CI seems to fail, can you take a look? (PR can not be merged if CI fails)

[02:47:35] :     [Step 1/1]   2/215 Test #209: test_recognize_digits ...........................***Failed    0.70 sec
[02:47:35] :     [Step 1/1] Traceback (most recent call last):
[02:47:35] :     [Step 1/1]   File "test_recognize_digits.py", line 16, in <module>
[02:47:35] :     [Step 1/1]     import paddle.fluid as fluid
[02:47:35] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/__init__.py", line 17, in <module>
[02:47:35] :     [Step 1/1]     import framework
[02:47:35] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/framework.py", line 22, in <module>
[02:47:35] :     [Step 1/1]     from . import core
[02:47:35] :     [Step 1/1] ImportError: /paddle/build/python/build/lib-python/paddle/fluid/core.so: undefined symbol: _ZN6paddle6pybind18BindRecordIOWriterERN8pybind116moduleE
[02:47:35] :     [Step 1/1]
[02:47:35] :     [Step 1/1]         Start 210: test_word2vec
[02:47:41] :     [Step 1/1]   3/215 Test #206: test_image_classification .......................***Failed    7.25 sec
[02:47:41] :     [Step 1/1] Traceback (most recent call last):
[02:47:41] :     [Step 1/1]   File "test_image_classification.py", line 18, in <module>
[02:47:41] :     [Step 1/1]     import paddle.fluid as fluid
[02:47:41] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/__init__.py", line 17, in <module>
[02:47:41] :     [Step 1/1]     import framework
[02:47:41] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/framework.py", line 22, in <module>
[02:47:41] :     [Step 1/1]     from . import core
[02:47:41] :     [Step 1/1] ImportError: /paddle/build/python/build/lib-python/paddle/fluid/core.so: undefined symbol: _ZN6paddle6pybind18BindRecordIOWriterERN8pybind116moduleE
[02:47:41] :     [Step 1/1]
[02:47:41] :     [Step 1/1]         Start   1: serialization_test
[02:47:41] :     [Step 1/1]   4/215 Test #208: test_label_semantic_roles .......................***Failed    7.25 sec
[02:47:41] :     [Step 1/1] Traceback (most recent call last):
[02:47:41] :     [Step 1/1]   File "test_label_semantic_roles.py", line 20, in <module>
[02:47:41] :     [Step 1/1]     import paddle.fluid as fluid
[02:47:41] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/__init__.py", line 17, in <module>
[02:47:41] :     [Step 1/1]     import framework
[02:47:41] :     [Step 1/1]   File "/paddle/build/python/build/lib-python/paddle/fluid/framework.py", line 22, in <module>
[02:47:41] :     [Step 1/1]     from . import core
[02:47:41] :     [Step 1/1] ImportError: /paddle/build/python/build/lib-python/paddle/fluid/core.so: undefined symbol: _ZN6paddle6pybind18BindRecordIOWriterERN8pybind116moduleE

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@luotao1 luotao1 merged commit 9126e62 into PaddlePaddle:develop Mar 22, 2018
@sabreshao sabreshao deleted the amd_cmake_01 branch August 10, 2018 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants