Skip to content

Do not erase a cache_position passed explicitly to generate(), if there is one #37986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

FremyCompany
Copy link
Contributor

@FremyCompany FremyCompany commented May 6, 2025

What does this PR do?

Currently, any cache_position passed to model.generate() is silently replaced by the initial initialization. This seems wrong, since there is no need to reinitialize an argument that has already been passed. In some models, being able to pass custom cache_positions which are not 0-initialized is required for proper behavior.

This PR adds a check to the initialization code of model_kwargs['cache_position'] to return early if that argument was already provided.

There is no linked issue because I fixed the bug before filing an issue. PR is thus easier.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

I don't think that this warrants any documentation change, the current behavior is not described anywhere. I could add a test, but it would be very artificial. I think merging as-is is fine, but I welcome feedback.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @zucchini-nlp @gante

@github-actions github-actions bot marked this pull request as draft May 6, 2025 21:18
Copy link
Contributor

github-actions bot commented May 6, 2025

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I am curious, is the logic to create build positions from scratch not working in some cases?

Not many users know about cache_position or pass a custom new position, so I guess this is for a custom usage with compressed cache

@FremyCompany
Copy link
Contributor Author

The use case is related to Mamba models, so the logic for initializing the cache position unfortunately doesn't work because there is no way to know how many tokens were already processed, and starting with index 0 disables the cache since it is supposed not to be initialized yet for the first token.

@FremyCompany
Copy link
Contributor Author

Calling generate is the easiest way to continue generating after the first batch of tokens. Calling forward directly would also work but then it's necessary to reimplement all the sampling stuff which generate usually does for you in a very convenient way.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, makes sense. For special models we don't infer cache position correctly. Can you makr the PR ready for review and rebase main, so it can be merged?

@FremyCompany
Copy link
Contributor Author

Yes, will do tomorrow.

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Thank you for noticing it!

@gante
Copy link
Member

gante commented May 8, 2025

@FremyCompany to make our CI happy:

  • Tag the PR as Ready for review (button at the bottom of the PR page)
  • run make fixup on your terminal, inside the transformers root folder (this runs automated code formatting)
  • commit the changes

…erate(), if there is one.

But: Let initialization replace cache_position if it's set to None. I assume that if the value is explicitly passed but None, we should initialize anyway.
@FremyCompany FremyCompany force-pushed the keep_cache_position branch from aa4e0e5 to 21c3373 Compare May 9, 2025 09:19
@FremyCompany FremyCompany marked this pull request as ready for review May 9, 2025 09:19
@zucchini-nlp zucchini-nlp enabled auto-merge (squash) May 9, 2025 10:43
@zucchini-nlp zucchini-nlp merged commit 774dc27 into huggingface:main May 9, 2025
21 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
…re is one (huggingface#37986)

Do not erase a cache_position initialization passed explicitly to generate(), if there is one.

But: Let initialization replace cache_position if it's set to None. I assume that if the value is explicitly passed but None, we should initialize anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants