-
Notifications
You must be signed in to change notification settings - Fork 939
Feat/vllm provider #3067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/vllm provider #3067
Conversation
Change-Id: I86abbfb5c718225fab3dc6fd44cf64c3996e3a01
Change-Id: I001382122ac90abc185f0235ead9fd0dcf9cc106
新增 vLLM AI 服务提供者支持变更概述
变更文件
时序图sequenceDiagram
participant HC as HttpContext
participant VP as vllmProvider
participant PC as ProviderConfig
HC->>VP: OnRequestHeaders(apiName)
VP->>PC: handleRequestHeaders(provider, ctx, apiName)
HC->>VP: OnRequestBody(apiName, body)
alt unsupported API
VP-->>HC: ActionContinue with error
else supported API
VP->>PC: handleRequestBody(provider, cache, ctx, apiName, body)
end
HC->>VP: TransformRequestHeaders(apiName, headers)
opt direct custom path
VP->>util: OverwriteRequestPathHeader(headers, customPath)
else capability-based routing
VP->>util: OverwriteRequestPathHeaderByCapability(headers, apiName, capabilities)
end
opt custom domain set
VP->>util: OverwriteRequestHostHeader(headers, customDomain)
else fallback to server host
VP->>util: OverwriteRequestHostHeader(headers, defaultOrConfiguredHost)
end
opt API tokens present
VP->>util: OverwriteRequestAuthorizationHeader(headers, Bearer token)
end
VP->>headers: Del("Content-Length")
HC->>VP: TransformRequestBody(apiName, body)
VP->>PC: defaultTransformRequestBody(ctx, apiName, body)
HC->>VP: GetApiName(path)
HC->>VP: TransformResponseHeaders(apiName, headers)
VP->>headers: Del("Content-Length")
HC->>VP: TransformResponseBody(apiName, body)
HC->>VP: OnStreamingResponseBody(name, chunk, isLastChunk)
💡 小贴士与 lingma-agents 交流的方式📜 直接回复评论
📜 在代码行处标记
📜 在讨论中提问
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3067 +/- ##
===========================================
+ Coverage 35.91% 45.97% +10.06%
===========================================
Files 69 137 +68
Lines 11576 20747 +9171
===========================================
+ Hits 4157 9539 +5382
- Misses 7104 10731 +3627
- Partials 315 477 +162
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
johnlanni
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
feat: Add vLLM provider to ai-proxy to support rerank API (GitHub alibaba#3067) See merge request framework/ai-gateway!5
Ⅰ. Describe what this PR did
This PR adds support for vLLM as a new AI provider in the ai-proxy plugin, enabling Higress to proxy requests to vLLM inference services.
Key Features:
OpenAI-Compatible API Support: Implements support for multiple OpenAI-compatible APIs including:
/v1/chat/completions)/v1/completions)/v1/models)/v1/embeddings)/v1/rerank)Flexible Configuration:
vllmCustomUrlfor specifying custom vLLM service endpointsvllmServerHostconfigurationAuthentication Options:
Standard Provider Features:
Implementation Details:
ProviderConfigvllm-service.cluster.localas default domain)Ⅱ. Does this pull request fix one issue?
This PR adds a new feature rather than fixing a specific issue. It enables integration with vLLM, a popular high-throughput LLM inference engine.
Ⅲ. Why don't you add test cases (unit test/integration test)?
Test cases follow the existing pattern in the ai-proxy plugin:
ProviderConfig.handleRequestBody()anddefaultTransformRequestBody()Ⅳ. Describe how to verify it
Prerequisites:
Verification Steps:
Ⅴ. Special notes for reviews
Design Philosophy: The implementation closely follows the established patterns from
openaiProviderand other providers, ensuring consistency across the codebase.Authentication Flexibility: Unlike some providers that require API tokens, vLLM supports both authenticated and unauthenticated deployments. The implementation handles both scenarios gracefully.
Legacy Compatibility: The provider maintains backward compatibility with the legacy
vllmServerHostconfiguration while encouraging migration to the more flexiblevllmCustomUrlapproach.Path Handling: The provider includes intelligent path detection (
isVllmDirectPath) to distinguish between:/v1/chat/completions)Future Extensions: The response transformation methods (
TransformResponseBody,OnStreamingResponseBody) are currently pass-through implementations but provide hooks for future vLLM-specific transformations if needed.