I'm trying to get this working with my local vllm server on fedora 41. Here's the current server command I'm running:
vllm serve Qwen/Qwen2.5-Coder-32B-Instruct --tensor-parallel-size 4 --enable-auto-tool-choice --tool-call-parser hermes --gpu-memory-utilization 0.8 --max-model-len 30000 --trust-remote-code
It seems that the model generates the tool calls fine, but but opencode just shows them as text instead of calling them. Is support for this feature coming later? Am I missing an easy workaround?
