Describe the bug
To avoid discrepancies of different tokenizer versions between training and generation framework, generation framework should just get the tokens in, special token_id's (ex: stop, eos token) and it should return just tokens.
Steps/Code to reproduce bug
The current vllm implementation has reference to the tokenizer.
The fix for this issue should include a test which ensures we don't add tokenizer to any generation backend.
Describe the bug
To avoid discrepancies of different tokenizer versions between training and generation framework, generation framework should just get the tokens in, special token_id's (ex: stop, eos token) and it should return just tokens.
Steps/Code to reproduce bug
The current vllm implementation has reference to the tokenizer.
The fix for this issue should include a test which ensures we don't add tokenizer to any generation backend.