Skip to content

V5r1 refactor#36

Closed
zjing14 wants to merge 36 commits into
developfrom
v5r1_refactor
Closed

V5r1 refactor#36
zjing14 wants to merge 36 commits into
developfrom
v5r1_refactor

Conversation

@zjing14
Copy link
Copy Markdown
Contributor

@zjing14 zjing14 commented Oct 2, 2021

  • refactor v5r1 kernels and corresponding gridwise/blockwise/threadwise functions
  • add outer loops for allowing larger C/K
  • To-Do (in a seperate PR)
    • move b_thread_copy into blockwise_gemm
    • move c_thread_buffer allocation into blockwise_gemm

asroy pushed a commit that referenced this pull request Oct 6, 2021
…WD-v4r5 (#36)

* experimenting magic number division

* overhauling fwd-v4r4 to clearly reflect transformation graph

* added fwd-v4r5

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* bug fix and added sanity-check in transform_dynamic_tensor_descriptor

* added conv_driver_v2
// experimental implementation
#ifndef CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK
#define CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK 0
#define CK_EXPERIMENTAL_USE_BUFFER_LOAD_OOB_CHECK_OFFSET_TRICK 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why need it, for performance or correctness?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it bring us better performance?

a_k0_m_k1_grid_desc,
b_k0_n_k1_grid_desc,
c_m0_n0_m1_n1_m2_m3_m4_n2_grid_desc,
c_m0_m1_m2_n_grid_desc,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@zjing14
Copy link
Copy Markdown
Contributor Author

zjing14 commented Oct 7, 2021

@asroy Updated. Add a separate driver: conv_fwd_driver_offline_nchwc for NCHWC format. Removed data transform inside device function.

@zjing14
Copy link
Copy Markdown
Contributor Author

zjing14 commented Oct 29, 2021

@asroy Please review the PR again.

@zjing14 zjing14 requested a review from asroy October 29, 2021 02:07
@zjing14
Copy link
Copy Markdown
Contributor Author

zjing14 commented Nov 15, 2021

This PR is included in PR #49

@zjing14 zjing14 closed this Nov 15, 2021
@junliume junliume deleted the v5r1_refactor branch October 21, 2023 06:09
carlushuang pushed a commit that referenced this pull request Mar 26, 2024
Initial MI350 enablement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants