-
I'm using bartowski's 4bit quantization of DeepSeek-R1-Distill-Llama-8B-GGUF (https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF/resolve/main/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf?download=true). When I run it through llama-cpp directly (using llama-simple-chat), it seems to emit the initial token fine. But when I run it through node-llama-cpp (using the general chat wrapper, as the parser crashes trying to extract the jinja2 template when using the auto wrapper), it doesn't (ie: it just starts directly inside the chain of thought, even though it still emits the final token). Has anyone experienced this? Edit: My bad, should have just tried the obvious thing first. Updating |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I've created a PR on First-class support for DeepSeek and chain of thought will come in the next week or so. |
Beta Was this translation helpful? Give feedback.
I've created a PR on
@huggingface/jinja
to address this exact issue.I'll release a new version of
node-llama-cpp
in the next few hours with various fixes and improvements for DeepSeek, including the updated@huggingface/jinja
version.First-class support for DeepSeek and chain of thought will come in the next week or so.