Skip to content

[V1] Logits processors extensibility #19912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 340 commits into
base: main
Choose a base branch
from

Conversation

afeldman-nm
Copy link
Contributor

@afeldman-nm afeldman-nm commented Jun 20, 2025

Purpose

Enable V1 logits processors support to be extended with custom logits processors.

New Python interface for custom logitsprocs

    # Specify logitproc by entrypoint
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_entrypoints=["dummy_logitproc"],
    )

    # Specify logitproc by fully-qualified name
    llm = LLM(
        model="facebook/opt-125m",
        logits_processors_fqns=["vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor"],
    )

New CLI interface for custom logitsprocs

# Engine CLI args for specifying logitsprocs by entrypoint
--logits-processors-entrypoints=dummy_logitproc,<other logitproc>,...

# Engine CLI args for specifying logitsprocs by fully-qualified name
--logits-processors-fqns=vllm.v1.sample.logits_processor.impls:DummyLogitsProcessor,<other logitproc>,...

Test Plan

(WIP)

  • Configuring a contrived custom logits proc
  • E2E tests using REST API and Python interfaces
  • Wrap a V0-style logits proc to create a V1-style logits proc

Test Result

WIP

(Optional) Documentation Update

WIP

Fixes #17799
Fixes #12678

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm it's looking a lot better now. A few remaining comments but getting close I think.

Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afeldman-nm


# (index, params, output_tok_ids, prompt_tok_ids) tuples for new
# requests added to the batch.
AddedRequest = tuple[int, Union[SamplingParams, PoolingParams], list[int],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to support LP for pooling requests? Maybe we should just keep this as SamplingParams for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I guess since logitsprocs are enabled at server init time, we can't have a given request fail because it "enabled" some particular logitsproc. So the proper behavior is probably just to bypass logitsprocs logic for pooling requests

Copy link

mergify bot commented Jul 22, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @afeldman-nm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 22, 2025
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
@mergify mergify bot removed the needs-rebase label Jul 22, 2025
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Comment on lines +26 to +27
AddedRequest = tuple[list[int], int, Union[SamplingParams, PoolingParams],
list[int]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering whether we should keep this to just SamplingParams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend performance Performance-related issues speculative-decoding v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC]: Logits processor extensibility [Bug]: V1 engine ignores logits processors and min-p sampling
4 participants