Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support min_new_tokens generation config in pytorch engine #1096

Merged
merged 13 commits into from
Feb 29, 2024

Conversation

grimoire
Copy link
Collaborator

@grimoire grimoire commented Feb 1, 2024

tested on chat and benchmark_pytorch_throughput

Copy link
Collaborator

@AllentDan AllentDan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected behavior of the model for min_new_tokens?
image

@grimoire
Copy link
Collaborator Author

@AllentDan Fixed, eos would be ignored before reaching min_new_tokens

@AllentDan
Copy link
Collaborator

image

Got this.

@grimoire
Copy link
Collaborator Author

image

Got this.

@lvhan028 is this OK?

@lvhan028
Copy link
Collaborator

@irexyc what are the behavior of transformers and turbomind if min_new_token is set?

@irexyc
Copy link
Collaborator

irexyc commented Feb 19, 2024

what are the behavior of transformers and turbomind if min_new_token is set?

set score of eos token to -inf when generated token length < min_new_tokens

https://github.com/huggingface/transformers/blob/main/src/transformers/generation/logits_process.py#L160

@lvhan028
Copy link
Collaborator

@AllentDan @irexyc I think PyTorch engine did the right thing. When min_new_tokens is set, eos_id and stop_words tokens should be banned when the generated token number is less than min_new_tokens

@AllentDan
Copy link
Collaborator

@AllentDan @irexyc I think PyTorch engine did the right thing. When min_new_tokens is set, eos_id and stop_words tokens should be banned when the generated token number is less than min_new_tokens

Is it possible that stop_words got banned, but other possible new tokens are used when n>1? min_new_tokens to me, is a minimal number of newly generated tokens while not output tokens like <eoa>, <Bot> to users.

@lvhan028
Copy link
Collaborator

stop_words should be banned. They cannot be generated.

@lvhan028
Copy link
Collaborator

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig, PytorchEngineConfig

# test torch min new tokens
pipe = pipeline('/workspace/models-140/InternLM/internlm2-chat-7b/', 
                backend_config=PytorchEngineConfig())

response = pipe('hi', gen_config=GenerationConfig(
    min_new_tokens=100
))

print(response)

The generated response is:

Response(text='你好!有什么我可以帮助你的吗?', generate_token_len=8, input_token_len=103, session_id=0, finish_reason='stop')

I think generated_token_len shouldn't be less than min_new_tokens

@grimoire
Copy link
Collaborator Author

@lvhan028
https://github.com/grimoire/lmdeploy/blob/637cc512027b41986fcfabba8f2011e682aa37e5/lmdeploy/messages.py#L87

gen_config conversion does not include min_new_tokens

@lvhan028 lvhan028 requested a review from irexyc February 29, 2024 12:03
@lvhan028 lvhan028 changed the title Support torch min new tokens Support min_new_tokens generation config in pytorch engine Feb 29, 2024
@lvhan028 lvhan028 merged commit 16da6ae into InternLM:main Feb 29, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants