Skip to content

GroupedGEMM more bigger tiles.#577

Merged
zjing14 merged 6 commits into
developfrom
aosewski/ggemm
Feb 13, 2023
Merged

GroupedGEMM more bigger tiles.#577
zjing14 merged 6 commits into
developfrom
aosewski/ggemm

Conversation

@aosewski
Copy link
Copy Markdown
Collaborator

@aosewski aosewski commented Feb 8, 2023

I've added some more bigger tiles' configurations.

@zjing14 Please note that I've removed the smallest tiles (16x16) as they exhibit the worst performance in tested scenarios.

@aosewski aosewski requested a review from zjing14 February 8, 2023 13:31
@aosewski aosewski self-assigned this Feb 8, 2023
@zjing14
Copy link
Copy Markdown
Contributor

zjing14 commented Feb 9, 2023

@aosewski Any performance improve with larger tiles?

@aosewski
Copy link
Copy Markdown
Collaborator Author

aosewski commented Feb 10, 2023

For row 12 (irregular mk-nk-mn)

# enlarging tiles first part
Best Perf: 1.03203 ms, 56.1824 TFlops, 268.232 GB/s, DeviceGroupedGemm_Xdl<256, 128, 64, 32, 8, 8, 32, 32, 2, 1>
# enlarging tiles second part
Best Perf: 0.752408 ms, 77.062 TFlops, 367.918 GB/s, DeviceGroupedGemm_Xdl<256, 128, 128, 32, 8, 8, 32, 32, 2, 2>

overall speedup (wrt to table): 11.469ms -> 0.752ms =~ 15.25x

For row 13 (irregular mk-kn-mn)

# enlarging tiles first part
Best Perf: 0.334912 ms, 86.5632 TFlops, 469.635 GB/s, DeviceGroupedGemm_Xdl<256, 64, 128, 32, 8, 2, 32, 32, 1, 2>

speedup (wrt to table) 1.113ms -> 0.3349 =~ 3.32x

@zjing14
Copy link
Copy Markdown
Contributor

zjing14 commented Feb 11, 2023

For row 12 (irregular mk-nk-mn)

# enlarging tiles first part
Best Perf: 1.03203 ms, 56.1824 TFlops, 268.232 GB/s, DeviceGroupedGemm_Xdl<256, 128, 64, 32, 8, 8, 32, 32, 2, 1>
# enlarging tiles second part
Best Perf: 0.752408 ms, 77.062 TFlops, 367.918 GB/s, DeviceGroupedGemm_Xdl<256, 128, 128, 32, 8, 8, 32, 32, 2, 2>

overall speedup (wrt to table): 11.469ms -> 0.752ms =~ 15.25x

For row 13 (irregular mk-kn-mn)

# enlarging tiles first part
Best Perf: 0.334912 ms, 86.5632 TFlops, 469.635 GB/s, DeviceGroupedGemm_Xdl<256, 64, 128, 32, 8, 2, 32, 32, 1, 2>

speedup (wrt to table) 1.113ms -> 0.3349 =~ 3.32x

That is awesome. Will merge the PR once it passes CI.

@zjing14 zjing14 merged commit 8f42780 into develop Feb 13, 2023
@illsilin illsilin deleted the aosewski/ggemm branch December 14, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants