-
Notifications
You must be signed in to change notification settings - Fork 29.1k
Do not erase a cache_position passed explicitly to generate(), if there is one #37986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not erase a cache_position passed explicitly to generate(), if there is one #37986
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I am curious, is the logic to create build positions from scratch not working in some cases?
Not many users know about cache_position
or pass a custom new position, so I guess this is for a custom usage with compressed cache
The use case is related to Mamba models, so the logic for initializing the cache position unfortunately doesn't work because there is no way to know how many tokens were already processed, and starting with index 0 disables the cache since it is supposed not to be initialized yet for the first token. |
Calling generate is the easiest way to continue generating after the first batch of tokens. Calling forward directly would also work but then it's necessary to reimplement all the sampling stuff which generate usually does for you in a very convenient way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining, makes sense. For special models we don't infer cache position correctly. Can you makr the PR ready for review and rebase main, so it can be merged?
Yes, will do tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
Thank you for noticing it!
@FremyCompany to make our CI happy:
|
…erate(), if there is one. But: Let initialization replace cache_position if it's set to None. I assume that if the value is explicitly passed but None, we should initialize anyway.
aa4e0e5
to
21c3373
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…re is one (huggingface#37986) Do not erase a cache_position initialization passed explicitly to generate(), if there is one. But: Let initialization replace cache_position if it's set to None. I assume that if the value is explicitly passed but None, we should initialize anyway.
What does this PR do?
Currently, any
cache_position
passed tomodel.generate()
is silently replaced by the initial initialization. This seems wrong, since there is no need to reinitialize an argument that has already been passed. In some models, being able to pass custom cache_positions which are not 0-initialized is required for proper behavior.This PR adds a check to the initialization code of model_kwargs['cache_position'] to return early if that argument was already provided.
There is no linked issue because I fixed the bug before filing an issue. PR is thus easier.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
I don't think that this warrants any documentation change, the current behavior is not described anywhere. I could add a test, but it would be very artificial. I think merging as-is is fine, but I welcome feedback.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @zucchini-nlp @gante