Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable gemv schedule for adreno #16932

Merged
merged 7 commits into from
Apr 29, 2024
Merged

Conversation

krishnaraj36
Copy link
Contributor

Enabled new gemv schedule for opencl target, which effectively improves decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

Models Baseline Latest improved

Llama-2-7B 10 tok/sec 12.5 tok/sec
Qwen-7b 8.5 tok/sec 11 tok/sec

Enabled new gemv schedule for opencl target, which effectively improves
decode performance of mlc-llm LLM models with q4f16_0 format.

Few LLM models Decode performance on Snapdragon Gen-3 android.

  Models         Baseline       Latest improved

Llama-2-7B       10 tok/sec       12.5 tok/sec
Qwen-7b          8.5 tok/sec      11 tok/sec
@krishnaraj36
Copy link
Contributor Author

@srkreddy1238 @tqchen : Can you please take a look in this PR

@Hzfengsy Hzfengsy self-assigned this Apr 27, 2024
@Hzfengsy
Copy link
Member

Hzfengsy commented Apr 27, 2024

Thanks @krishnaraj36 for the great PR and significant perf improvement.

However, q4f16_0 should be outer_reduction as the layout is KN. I wonder why the rule is named as sch_adreno_inner_reduction

If it's a naming issue, we can replace the current rule of sch_outer_reduction as it is specially designed for android only

@krishnaraj36
Copy link
Contributor Author

krishnaraj36 commented Apr 29, 2024

Thanks @krishnaraj36 for the great PR and significant perf improvement.

However, q4f16_0 should be outer_reduction as the layout is KN. I wonder why the rule is named as sch_adreno_inner_reduction

If it's a naming issue, we can replace the current rule of sch_outer_reduction as it is specially designed for android only

@Hzfengsy Thanks for your review.
Yes, Its naming issue, I have made changes to func naming to make sense.

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

python/tvm/dlight/gpu/gemv.py Outdated Show resolved Hide resolved
python/tvm/dlight/gpu/gemv.py Outdated Show resolved Hide resolved
python/tvm/dlight/gpu/gemv.py Show resolved Hide resolved
@tqchen tqchen merged commit b4a69de into apache:main Apr 29, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants