Skip to content

Conversation

@rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Dec 2, 2025

Overview:

Draft to test out the current state of logprobs support

  • Frontend already supports logprobs
  • Backend is not propagating out the logprobs information in the LLMEngineOutput responses that get yielded

Details:

Example server setup:

uv venv venv
source venv/bin/activate

uv pip install ai-dynamo[vllm]==0.7.0

python -m dynamo.frontend &

python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --connector none &

Example completions request:

MODEL="Qwen/Qwen3-0.6B"

curl -X POST http://localhost:8000/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "'${MODEL}'",
    "prompt": "What is the test plan?",
    "logprobs": 2,
    "temperature": 0,
    "max_tokens": 10
  }' | jq

Example chat completions request:

MODEL="Qwen/Qwen3-0.6B"

curl -X POST http://localhost:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "'${MODEL}'",
    "messages": [{"role": "user", "content": "What is the test plan?"}],
    "logprobs": true,
    "top_logprobs": 2,
    "temperature": 0,
    "max_tokens": 50
  }' | jq

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@rmccorm4 rmccorm4 changed the title PoC: vllm logprobs support for completions and chat/completions PoC: logprobs support for vLLM backend Dec 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants