Skip to content

[Frontend] Expose custom args in OpenAI APIs #16862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 38 commits into from
Jun 19, 2025

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Apr 18, 2025

Add a vllm_xargs: Optional[dict[str, Union[str,int,float]]] field to CompletionRequest, ChatCompletionRequest and TranscriptionRequest (these are the only OpenAIBaseModel subclasses which had a logits_processors field in v0.) This field is injected into SamplingParams.extra_args via SamplingParams.from_optional(); each dict key/value pair in extra_args becomes an assignment to an attribute of sampling_params.

Purpose

Enable extensible features such as logits processors and plugins to receive arbitrary custom arguments via the REST API. Mirror SamplingParams.extra_args in the REST API.

Test plan

Does not require additional unit tests (when logitsprocs extensibility is introduced later, this will implicitly test custom args). Pre-existing unit tests must pass so we know pre-existing features are not being broken.

Test results

N/A

Documentation changes

  • SamplingParams docstring clarifies that extra_args may plumb custom args to logitsprocs, plugins, etc. (previous it just said logitsprocs)
  • In vllm/entrypoints/openai/protocol.py: for CompletionRequest, ChatCompletionRequest, and TranscriptionRequest, move vllm_xargs definition inside the # --8<-- [start:completion-extra-params] section and add more detail to the description string.

Final note

The pre-existing behavior of protocol.py ChatCompletionRequest and CompletionRequest is that kv_transfer_params is passed into the engine via SamplingParams.extra_args; this PR simply merges vllm_xargs into SamplingParams.extra_args alongside kv_transfer_params. In the future it may be worth considering whether SamplingParams.extra_args is the best pathway for plumbing kv_transfer_params into the engine; it would seem to break the convention that SamplingParams.extra_args is not intended for "in-tree" functionality.

RFC: #17191

Fixes #16802

Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Apr 18, 2025
@njhill
Copy link
Member

njhill commented Apr 18, 2025

Thanks @afeldman-nm! It would be good to include a test that shows how these can be passed via the OpenAI client sdk using its extra_body option: https://github.com/openai/openai-python?tab=readme-ov-file#undocumented-request-params

I'm unsure whether we want these new custom args to be in a nested json object (as you've done here) or just extra top-level args.

@afeldman-nm
Copy link
Contributor Author

Thanks @njhill . Agree regarding the unit test. I need to think a bit about the right way to do it

Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
@mergify mergify bot added the v1 label Apr 22, 2025
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
@afeldman-nm afeldman-nm marked this pull request as ready for review April 23, 2025 15:03
@afeldman-nm
Copy link
Contributor Author

@afeldman-nm
Copy link
Contributor Author

Thanks for your review @comaniac . After chatting with Cody, I think this interface change is sufficiently impactful to merit an RFC which I will write and share shortly.

Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
@afeldman-nm afeldman-nm requested a review from aarnphm as a code owner June 18, 2025 02:33
@mergify mergify bot removed the needs-rebase label Jun 18, 2025
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
@afeldman-nm
Copy link
Contributor Author

@afeldman-nm Glad to see it is still in progress. My use case is passing in truncate_prompt_tokens sampling parameters. Can we just unit test what we can test now and add more comprehensive unit tests when the logits process work is done?

Hi @helloworld1 - working on getting this PR landed as-is

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
@afeldman-nm afeldman-nm deleted the afeldman-nm/extra_args branch June 18, 2025 18:16
@afeldman-nm afeldman-nm restored the afeldman-nm/extra_args branch June 18, 2025 18:16
@afeldman-nm afeldman-nm reopened this Jun 18, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm LGTM, just a couple of minor comments.

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
afeldman-nm and others added 2 commits June 18, 2025 15:18
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 18, 2025
@njhill njhill changed the title [V1] vLLM OpenAI API custom args [Frontend] Expose custom args in OpenAI APIs Jun 19, 2025
@njhill njhill merged commit dfada85 into vllm-project:main Jun 19, 2025
78 checks passed
@njhill njhill deleted the afeldman-nm/extra_args branch June 19, 2025 00:41
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Yang Wang <elainewy@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support custom args in OpenAI (chat) completion requests
4 participants