-
Notifications
You must be signed in to change notification settings - Fork 238
Closed
Description
I am using Optillm as a proxy to my locally hosted models (llama.cpp, VLLM, sglang).
These locally hosted models have some additional parameters that the standard OpenAI API doesn't have.
Two of them are top_k which is another sampling parameter, and chat_template_kwargs which dynamically modifies the prompt template (for e.g. setting the reasoning level).
I would like to include these parameters in the request to Optillm to forward to my locally hosted models, since they make a big difference in generation quality, but Optillm gives a 400 Bad Request error.
Here is the test curl command for reference:
curl -X POST http://127.0.0.1:8101/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3-32b-awq","messages":[{"role":"system","content":"You are a concise assistant."},{"role":"user","content":"I have this mermaid diagram, but its giving me an error. How do I fix it? Can you think step by step and give me the reasoning leading up to the final conclusion. Please include at least 10 thinking steps enclosed in <think></think>, then give me 10 alternate solutions.\n\nMermaid diagram: flowchart TD\n %% ====================== Build Pipeline ======================\n subgraph BuildPipeline[\"Build Pipeline\"]\n Dev[Developer Machine] -->|1. Build Go\nbinary (static, cross-compiled)| GoBin[Go Static Binary]\n GoBin -->|2. Build Docker image (scratch)| DockerImg[Docker Image]\n DockerImg -->|3. Push to registry| Registry[Container Registry]\n end\n\nError: Error: Error: Parse error on line 4:\n...|1. Build Go\nbinary (static, cross-compi\n-----------------------^\nExpecting \"SQE\", \"DOUBLECIRCLEEND\", \"PE\", \"-)\", \"STADIUMEND\", \"SUBROUTINEEND\", \"PIPE\", \"CYLINDEREND\", \"DIAMOND_STOP\", \"TAGEND\", \"TRAPEND\", \"INVTRAPEND\", \"UNICODE_TEXT\", \"TEXT\", \"TAGSTART\", got \"PS\""}],"temperature":0.75,"max_tokens":8192,"top_p": 0.85,"top_k": 25,"n": 1,"stream": false, "chat_template_kwargs": {"enable_thinking": true}}'
Would it be possible to forward these parameters?
Metadata
Metadata
Assignees
Labels
No labels