-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mistral and sliding window attention #1075
Conversation
@grimoire is very productive, and the support for new models on the PyTorch engine is very timely. Currently, Mistral, Qwen 1.5, and DeepSeek MoE all rely on sliding window attention. @lvhan028 @RunningLeon Will we consider prioritizing this PR for review and merging it as soon as possible? Given that these are relatively large features, the community would likely appreciate being able to use them sooner rather than later. |
@zhyncs Hi, feel free to review and test this PR. Any comment would be sincerely appreciated. |
This PR makes a great change. QA needs more time to test it. |
@RunningLeon Is there evaluation result of this PR? |
|
lmdeploy/pytorch/paging/eviction_helper/recompute_eviction_helper.py
Outdated
Show resolved
Hide resolved
@zhulinJulia24 may perform regression test. |
@lvhan028 The perform test is OK as here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Important
This PR contains refactoring of the engine core mechanism.