Skip to content

Optillm unable to serve requests with 'top_k' or 'chat_template_kwargs' parameters #233

@itsmeknt

Description

@itsmeknt

I am using Optillm as a proxy to my locally hosted models (llama.cpp, VLLM, sglang).

These locally hosted models have some additional parameters that the standard OpenAI API doesn't have.

Two of them are top_k which is another sampling parameter, and chat_template_kwargs which dynamically modifies the prompt template (for e.g. setting the reasoning level).

I would like to include these parameters in the request to Optillm to forward to my locally hosted models, since they make a big difference in generation quality, but Optillm gives a 400 Bad Request error.

Here is the test curl command for reference:

curl -X POST http://127.0.0.1:8101/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"qwen3-32b-awq","messages":[{"role":"system","content":"You are a concise assistant."},{"role":"user","content":"I have this mermaid diagram, but its giving me an error. How do I fix it? Can you think step by step and give me the reasoning leading up to the final conclusion. Please include at least 10 thinking steps enclosed in <think></think>, then give me 10 alternate solutions.\n\nMermaid diagram: flowchart TD\n  %% ====================== Build Pipeline ======================\n  subgraph BuildPipeline[\"Build Pipeline\"]\n    Dev[Developer Machine] -->|1. Build Go\nbinary (static, cross-compiled)| GoBin[Go Static Binary]\n    GoBin -->|2. Build Docker image &lpar;scratch&rpar;| DockerImg[Docker Image]\n    DockerImg -->|3. Push to registry| Registry[Container Registry]\n  end\n\nError: Error: Error: Parse error on line 4:\n...|1. Build Go\nbinary (static, cross-compi\n-----------------------^\nExpecting \"SQE\", \"DOUBLECIRCLEEND\", \"PE\", \"-)\", \"STADIUMEND\", \"SUBROUTINEEND\", \"PIPE\", \"CYLINDEREND\", \"DIAMOND_STOP\", \"TAGEND\", \"TRAPEND\", \"INVTRAPEND\", \"UNICODE_TEXT\", \"TEXT\", \"TAGSTART\", got \"PS\""}],"temperature":0.75,"max_tokens":8192,"top_p": 0.85,"top_k": 25,"n": 1,"stream": false, "chat_template_kwargs": {"enable_thinking": true}}'

Would it be possible to forward these parameters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions