Skip to content

v5r1 fusion kernels for inference#49

Merged
asroy merged 68 commits into
developfrom
v5r1_add
Nov 18, 2021
Merged

v5r1 fusion kernels for inference#49
asroy merged 68 commits into
developfrom
v5r1_add

Conversation

@zjing14
Copy link
Copy Markdown
Contributor

@zjing14 zjing14 commented Oct 29, 2021

  • Add Ops, including Bias, Activation, Resize+Add, and MaxPool
  • Extract the Write-out stage from GridwiseGemm, treat Write-Out and Gridwise as separate Ops
  • Merge after V5r1 refactor #36

@zjing14 zjing14 requested a review from asroy November 1, 2021 16:36
@zjing14 zjing14 changed the title [WIP] v5r1 fusion kernels for inference v5r1 fusion kernels for inference Nov 1, 2021
@zjing14 zjing14 mentioned this pull request Nov 15, 2021
}
else if constexpr(N == 8)
{
#if 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something wrong with llvm_amdgcn_raw_buffer_store_fp16x4?

Comment thread composable_kernel/include/utility/utility.hpp Outdated
Copy link
Copy Markdown
Contributor

@asroy asroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


CMakeFiles/device_gemm_instance.dir/__/device_operation/device_gemm_xdl_instance_f32_f32_f32_mk_nk_mn.cpp.o: In function `bfloat16_to_float':
/opt/rocm-4.1.0/hip/include/hip/hcc_detail/hip_vector_types.h:373: multiple definition of `bfloat16_to_float'
CMakeFiles/device_gemm_instance.dir/__/device_operation/device_gemm_xdl_instance_f32_f32_f32_mk_kn_mn.cpp.o:/opt/rocm-4.1.0/hip/include/hip/hcc_detail/hip_vector_types.h:373: first defined here
CMakeFiles/device_gemm_instance.dir/__/device_operation/device_gemm_xdl_instance_f32_f32_f32_mk_nk_mn.cpp.o: In function `size':
/root/workspace/composable_kernel/external/rocm/include/bfloat16_dev.hpp:68: multiple definition of `float_to_bfloat16'

Please fix the build issue for ckProfiler

// experimental implementation
#ifndef CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK
#define CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK 0
#define CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK 1
Copy link
Copy Markdown
Contributor

@asroy asroy Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using this, for correctness issue?

This trick (OOB feature + set offset to > 4GB) should no longer give better performance than the other (OOB feature + set padding element to 0)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Used this for correctness issue.

@asroy asroy merged commit 970fa3e into develop Nov 18, 2021
@illsilin illsilin deleted the v5r1_add branch December 8, 2023 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants