Adding chat template to vllm decode.#2978
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
36ae2d0 to
b2ab71b
Compare
b2ab71b to
5a52d69
Compare
5a52d69 to
883dd10
Compare
|
hi @NicoGrande, but I keep getting the error below. However, I tried commenting out this line and it works, and the generation looks good to me now. |
Hi @ChingTsai ! To include |
|
Hi @NicoGrande , I just tested out your PR, it seems like it only supports unscanned model, meaning that only models with scan_layers=false can successfully ran. When using scanned model, I ran into error where the model generates unreasonable tokens. As we use scanned models for SFT, I would like to ask if this PR could also support scanned models? cc: @ChingTsai |
Hi @cychiuak! Thank you for flagging this! I think we should be able to add some functionality to support this. The codepath will likely involve unrolling the scanned checkpoint and then using the unscanned version for inference. I can add this to my TODO list and update this PR with this feature request when this is ready. |
|
Thanks @NicoGrande ! Supporting scanned checkpoints would significantly enhance the training user experiment. |
+1 Thank you @NicoGrande for taking on this! This feature will unblock us. Otherwise, the current only viable workflow is to convert the ckpt in orbax format to HF (safetensors) format and run the inference, which is a pain in hyperparameter searches. Would be great if you can help to add the support. Thanks! |
96690ab to
b706213
Compare
b706213 to
0b68629
Compare
|
I will work to get this PR merged for now and open a follow-up PR with support for scanned checkpoints @ChingTsai @cychiuak |
bvandermoon
left a comment
There was a problem hiding this comment.
Generally LGTM, left some questions but not all of them are related to this PR
0b68629 to
ed447f1
Compare
bvandermoon
left a comment
There was a problem hiding this comment.
LGTM, were you planning to remove the enable_expert_parallel flag?
ed447f1 to
912d0c9
Compare
Description
Adds optional support for chat templates in
vllm_decode.py. This helps achieve similar responses to the native vLLM model implementation when compared to MaxText on vLLM.FIXES: b/476253050
Tests
Running the following command resulted in the following diff with respect to the vLLM native model:
https://diff.googleplex.com/#key=j8DNmAFF934Q
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.