-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable HIPRTC support as default from ROCm 5.0 #1237
Conversation
…_PCH_ENFORCE: Removed possibility to enable PCH.
…ess. [hiprtc] Added WORKAROUND_ISSUE_HIPRTC_TRUE_TYPE.
…sing types + WORKAROUND_ISSUE_HIPRTC_TRUE_TYPE
…rmer is not necessary.
…d when SWDEV-297217 is resolved)
…RTC_HALF_CONVERSION. Host side changes.
… when "get binary" fails.
# RESOLVED Conflicts: # test/CMakeLists.txt
# RESOLVED Conflicts: # CMakeLists.txt # src/comgr.cpp # test/CMakeLists.txt
|
||
# Do not enable HIPRTC by default for older ROCm versions in order to avoid | ||
# build time errors, because HIPRTC is a relatively new component. | ||
set_var_to_condition(MIOPEN_USE_HIPRTC_DEFAULT ${MIOPEN_USE_COMGR} AND (${MIOPEN_hip_VERSION_FLAT} GREATER 500000000)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MIOPEN_USE_COMGR
does not have a default value, which causes a default cmake
run to fail. Such as
CXX=/opt/rocm/llvm/bin/clang++ cmake ..
Please update the PR so that the value of MIOPEN_USE_COMGR
is always specified.
auto opts = | ||
miopen::SplitSpaceSeparated(options, miopen::comgr::compiler::lc::GetOptionsNoSplit()); | ||
compiler::lc::RemoveOptionsUnwanted(opts); | ||
opts.push_back("-DWORKAROUND_ISSUE_HIPRTC_TRUE_TYPE"); // Workaround for SWDEV-308073 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we refactor these common defines to a place where they may be used both from hip_build_utils.cpp
and here(comgr.cpp
) so that we something needs to be fixed, it only needs to be fixed in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so that we something needs to be fixed, it only needs to be fixed in one place.
I do not understand. Can you please clarify the use case?
/// /opt/rocm/include/hip/amd_detail/amd_hip_vector_types.h, | ||
/// which defines std::true_type as well (which is wrong). | ||
|
||
namespace std { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these to a common file and include that file everywhere instead ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this WA potentially causing the numerical changes in #1237 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's fix this issue in follow up PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move these to a common file and include that file everywhere instead ?
No. This is workaround. We do not know how the problem will evolve in the future. Applying the "good design practices" can be a waste of time.
@atamazov @JehandadKhan @asroy @zjing14 @qianfengz could you take a look at CK related changes?
|
Very strange! I could not reproduce the issue on MI100 and MI50 using hiprtc branch. From your test, the reduced result value get on Host is just half that of the GPU. |
This might not be hiprtc related but instead ROCm 5.0 related. Removing the blocker for this PR for now. |
This could be compiler issue. I found that the |
We should stop the chaos from spreading and start adding comments to the relevant tickets. |
Yes, currently two priorities for this week: (1) what might have caused workspace diffs in last tuning updates; (2) warpSize inconsistent between different HIP kernel building methods, e.g. hip-Clang and hipRTC. Each is tracked by an issue in blocking urgency. #1429 and #1431 The first one is actively been resolved. The second has a workaround for now (not sure if there will be other issues though). |
@atamazov resnet is getting gradient overflow with this PR enabling hipRTC as default. Is it safe to revert it? Thanks! |
@junliume Just change MIOPEN_USE_HIPRTC_DEFAULT (line 226 in ./CMakeLists.txt) to something like |
Then you'll be able to use |
MIOPEN_DEBUG_USE_HIPRTC
env var, which can be used to fall back to COMGR.