Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HIPRTC support as default from ROCm 5.0 #1237

Merged
merged 40 commits into from
Feb 16, 2022
Merged

Enable HIPRTC support as default from ROCm 5.0 #1237

merged 40 commits into from
Feb 16, 2022

Conversation

atamazov
Copy link
Contributor

@atamazov atamazov commented Oct 22, 2021

  • HIPRTC support is added and enabled by default starting from ROCm 5.0
    • This will automatically enable HIPRTC testing after CI upgrade
    • ⚠️ HIP version must be fixed in Mainline in order to enable HIPRTC testing by QA
  • Added MIOPEN_DEBUG_USE_HIPRTC env var, which can be used to fall back to COMGR.
  • Workarounds:
    • Added W/A for SWDEV-308073 (WORKAROUND_ISSUE_HIPRTC_TRUE_TYPE)
    • Added W/A for SWDEV-307838 (WORKAROUND_ISSUE_HIPRTC_HIPRTC_HEADER_H)

@atamazov atamazov marked this pull request as ready for review February 11, 2022 22:45
CMakeLists.txt Outdated Show resolved Hide resolved

# Do not enable HIPRTC by default for older ROCm versions in order to avoid
# build time errors, because HIPRTC is a relatively new component.
set_var_to_condition(MIOPEN_USE_HIPRTC_DEFAULT ${MIOPEN_USE_COMGR} AND (${MIOPEN_hip_VERSION_FLAT} GREATER 500000000))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIOPEN_USE_COMGR does not have a default value, which causes a default cmake run to fail. Such as

CXX=/opt/rocm/llvm/bin/clang++ cmake ..

Please update the PR so that the value of MIOPEN_USE_COMGR is always specified.

auto opts =
miopen::SplitSpaceSeparated(options, miopen::comgr::compiler::lc::GetOptionsNoSplit());
compiler::lc::RemoveOptionsUnwanted(opts);
opts.push_back("-DWORKAROUND_ISSUE_HIPRTC_TRUE_TYPE"); // Workaround for SWDEV-308073
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we refactor these common defines to a place where they may be used both from hip_build_utils.cpp and here(comgr.cpp) so that we something needs to be fixed, it only needs to be fixed in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JehandadKhan

so that we something needs to be fixed, it only needs to be fixed in one place.

I do not understand. Can you please clarify the use case?

/// /opt/rocm/include/hip/amd_detail/amd_hip_vector_types.h,
/// which defines std::true_type as well (which is wrong).

namespace std {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to a common file and include that file everywhere instead ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this WA potentially causing the numerical changes in #1237 (comment) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix this issue in follow up PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JehandadKhan

Can we move these to a common file and include that file everywhere instead ?

No. This is workaround. We do not know how the problem will evolve in the future. Applying the "good design practices" can be a waste of time.

@junliume
Copy link
Collaborator

@atamazov @JehandadKhan @asroy @zjing14 @qianfengz could you take a look at CK related changes?
The composable kernel-related changes in this PR might have affected accuracy:

# ./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0
./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0 
FAILED: 0.255031
Iteration: 0
verify_reduce_no_indices failed
Input Tensor 64, 3, 280, 81
Max diff: 2.27564
Mismatch at 0: 1.00436 != 2.00873

@qianfengz
Copy link
Contributor

@atamazov @JehandadKhan @asroy @zjing14 @qianfengz could you take a look at CK related changes? The composable kernel-related changes in this PR might have affected accuracy:

# ./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0
./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0 
FAILED: 0.255031
Iteration: 0
verify_reduce_no_indices failed
Input Tensor 64, 3, 280, 81
Max diff: 2.27564
Mismatch at 0: 1.00436 != 2.00873

Very strange! I could not reproduce the issue on MI100 and MI50 using hiprtc branch. From your test, the reduced result value get on Host is just half that of the GPU.

@junliume
Copy link
Collaborator

@atamazov @JehandadKhan @asroy @zjing14 @qianfengz could you take a look at CK related changes? The composable kernel-related changes in this PR might have affected accuracy:

# ./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0
./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0 
FAILED: 0.255031
Iteration: 0
verify_reduce_no_indices failed
Input Tensor 64, 3, 280, 81
Max diff: 2.27564
Mismatch at 0: 1.00436 != 2.00873

Very strange! I could not reproduce the issue on MI100 and MI50 using hiprtc branch. From your test, the reduced result value get on Host is just half that of the GPU.

This might not be hiprtc related but instead ROCm 5.0 related. Removing the blocker for this PR for now.

@junliume junliume changed the title HIPRTC support Enable HIPRTC support as default from ROCm 5.0 Feb 16, 2022
@junliume junliume merged commit b735eb2 into develop Feb 16, 2022
@qianfengz
Copy link
Contributor

@atamazov @JehandadKhan @asroy @zjing14 @qianfengz could you take a look at CK related changes? The composable kernel-related changes in this PR might have affected accuracy:

# ./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0
./bin/test_reduce_test --float --D 64 3 280 81 --R 0 --ReduceOp 0 --CompType 1 --N 0 --I 0 --scales 1 0 
FAILED: 0.255031
Iteration: 0
verify_reduce_no_indices failed
Input Tensor 64, 3, 280, 81
Max diff: 2.27564
Mismatch at 0: 1.00436 != 2.00873

Very strange! I could not reproduce the issue on MI100 and MI50 using hiprtc branch. From your test, the reduced result value get on Host is just half that of the GPU.

This might not be hiprtc related but instead ROCm 5.0 related. Removing the blocker for this PR for now.

This could be compiler issue. I found that the warpSize used by my DirectWarpSize reduction kernels is 64, while the value of handle.GetWavefrontWidth() is 32. My understand is that warpSize is provided as a constant by the compiler, while handle.GetWavefrontWidth() gets the warp size from the HIP Runtime device properties. So the constant value maintained by the compiler under ROCM 5.0 is problematic on Navi. The issue can be worked-around by passing the value of handle.GetWavefrontWidth() to the kernel and let the DirectWarpWise kernel to use it instead of the warpSize

@atamazov
Copy link
Contributor Author

We should stop the chaos from spreading and start adding comments to the relevant tickets.

@junliume
Copy link
Collaborator

We should stop the chaos from spreading and start adding comments to the relevant tickets.

Yes, currently two priorities for this week: (1) what might have caused workspace diffs in last tuning updates; (2) warpSize inconsistent between different HIP kernel building methods, e.g. hip-Clang and hipRTC.

Each is tracked by an issue in blocking urgency. #1429 and #1431 The first one is actively been resolved. The second has a workaround for now (not sure if there will be other issues though).

@junliume
Copy link
Collaborator

@atamazov resnet is getting gradient overflow with this PR enabling hipRTC as default. Is it safe to revert it? Thanks!

@atamazov
Copy link
Contributor Author

@junliume Just change MIOPEN_USE_HIPRTC_DEFAULT (line 226 in ./CMakeLists.txt) to something like ...GREATER 900000000)) as a workaround.

@atamazov
Copy link
Contributor Author

Then you'll be able to use -DMIOPEN_USE_HIPRTC=On/Off for experiments.

@junliume
Copy link
Collaborator

@junliume Just change MIOPEN_USE_HIPRTC_DEFAULT (line 226 in ./CMakeLists.txt) to something like ...GREATER 900000000)) as a workaround.

Thank you! @atamazov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants