Skip to content

[DLight] Update GEMV rule to support Adreno outer reduction#15730

Merged
Hzfengsy merged 1 commit intoapache:unityfrom
Hzfengsy:outer_reduction_gemv
Sep 13, 2023
Merged

[DLight] Update GEMV rule to support Adreno outer reduction#15730
Hzfengsy merged 1 commit intoapache:unityfrom
Hzfengsy:outer_reduction_gemv

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

As Adreno has poor shared memory performance, for LLM workloads, we prefer q4f16_0 instead of q4f16_1.

This PR adds GEMV rule support for Adreno outer reduction, comparing with q4f16_1, we get a performance gain

  • Prefill(not related with this PR changes): 3636.6501 ms -> 1241.9469 ms
  • Decode: 211.0834 ms -> 174.9357 ms

As Adreno has poor shared memory performance, for LLM workloads, we
prefer `q4f16_0` instead of `q4f16_1`.

This PR adds GEMV rule support for Adreno outer reduction, comparing
with `q4f16_1`, we get a performance gain

- Prefill(not related with this PR changes): 3636.6501 ms -> 1241.9469 ms
- Decode: 211.0834 ms -> 174.9357 ms
@Hzfengsy Hzfengsy force-pushed the outer_reduction_gemv branch from 64e50dc to 3faa86b Compare September 13, 2023 01:58
@Hzfengsy Hzfengsy merged commit 93bb647 into apache:unity Sep 13, 2023
Hzfengsy pushed a commit to Hzfengsy/mlc-llm that referenced this pull request Sep 14, 2023
As apache/tvm#15730 merged, it's no need to
dispatch pre-tuned kernel anymore. This PR disables the dispatch.
Hzfengsy pushed a commit to Hzfengsy/tvm that referenced this pull request Sep 14, 2023
The PR apache#15730 introduced the outer_reduction
for adreno gemv. This PR fixes the length issue when applying on dynamic workloads.
Hzfengsy pushed a commit to Hzfengsy/mlc-llm that referenced this pull request Sep 14, 2023
As apache/tvm#15730 merged, it's no need to
dispatch pre-tuned kernel anymore. This PR disables the dispatch.
MasterJH5574 pushed a commit to mlc-ai/mlc-llm that referenced this pull request Sep 14, 2023
As apache/tvm#15730 merged, it's no need to
dispatch pre-tuned kernel anymore. This PR disables the dispatch.
junrushao pushed a commit that referenced this pull request Sep 16, 2023
The PR #15730 introduced the outer_reduction
for adreno gemv. This PR fixes the length issue when applying on dynamic workloads.
@Hzfengsy Hzfengsy deleted the outer_reduction_gemv branch January 25, 2024 09:53
smickey040404 added a commit to smickey040404/mlc-llm that referenced this pull request Feb 11, 2025
As apache/tvm#15730 merged, it's no need to
dispatch pre-tuned kernel anymore. This PR disables the dispatch.
tristankincaid added a commit to tristankincaid/mlc-llm that referenced this pull request Feb 16, 2025
As apache/tvm#15730 merged, it's no need to
dispatch pre-tuned kernel anymore. This PR disables the dispatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants