-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drop stop words #1823
drop stop words #1823
Conversation
Fixed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@lvhan028 Is the behavior aligned with turbomind? |
Turbomind caches the stop_words but not the lmdeploy/lmdeploy/turbomind/turbomind.py Line 751 in da439df
|
"Since we won't give stop words to user, we should not cache it too." Regarding the baichuan model, I think we should experiment with the |
Sure it does import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
def main():
model_path = '/path/to/Baichuan2-13B-Chat/'
eos = ''
eos = '</s>'
messages = [
{"role": "user", "content": "Do you know John Wick?"},
{"role": "assistant", "content": f"Yes, it is a movie. {eos}"},
{"role": "user", "content": "Tell me more about it."},
]
tokenizer = AutoTokenizer.from_pretrained(model_path,
revision="v2.0",
use_fast=False,
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path,
revision="v2.0",
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained(model_path, revision="v2.0")
with torch.inference_mode():
response = model.chat(tokenizer, messages)
print(response)
if __name__ == '__main__':
main() output with eos
output w/o eos
|
I can align the behavior with TurboMind, but putting the logic in the template is more reasonable. |
fix for #1754
Stop words should NOT be cached.