Feat/vllm provider #3067

wydream · 2025-10-28T09:44:23Z

Ⅰ. Describe what this PR did

This PR adds support for vLLM as a new AI provider in the ai-proxy plugin, enabling Higress to proxy requests to vLLM inference services.

Key Features:

OpenAI-Compatible API Support: Implements support for multiple OpenAI-compatible APIs including:
- Chat Completions (/v1/chat/completions)
- Text Completions (/v1/completions)
- Model Listing (/v1/models)
- Embeddings (/v1/embeddings)
- Cohere Rerank API (/v1/rerank)
Flexible Configuration:
- Custom URL support via vllmCustomUrl for specifying custom vLLM service endpoints
- Automatic path routing based on API capabilities
- Support for both direct path and capability-based routing
- Backward compatibility with legacy vllmServerHost configuration
Authentication Options:
- Optional API token authentication (supports scenarios with or without authentication)
- Standard Bearer token authentication when tokens are configured
Standard Provider Features:
- Model name mapping support
- Streaming response handling
- Request/response header and body transformation
- Context caching for improved performance

Implementation Details:

Follows the standard Provider interface pattern established in the ai-proxy architecture
Reuses common transformation logic from ProviderConfig
Provides sensible defaults (e.g., vllm-service.cluster.local as default domain)
Handles both authenticated and unauthenticated vLLM deployments

Ⅱ. Does this pull request fix one issue?

This PR adds a new feature rather than fixing a specific issue. It enables integration with vLLM, a popular high-throughput LLM inference engine.

Ⅲ. Why don't you add test cases (unit test/integration test)?

Test cases follow the existing pattern in the ai-proxy plugin:

The implementation reuses well-tested common logic from ProviderConfig.handleRequestBody() and defaultTransformRequestBody()
Provider-specific logic (path routing, header transformation) is straightforward and follows established patterns from other providers (OpenAI, Qwen, etc.)
Integration testing can be performed using the provided example configuration with actual vLLM deployments

Ⅳ. Describe how to verify it

Prerequisites:

A running vLLM service (can be deployed in Kubernetes or accessible via HTTP)
Higress gateway deployed with ai-proxy plugin

Verification Steps:

Configure the vLLM Provider:

apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy-vllm
  namespace: higress-system
spec:
  defaultConfig:
    provider:
      type: vllm
      vllmCustomUrl: "http://vllm-service:8000/v1"  # or use vllmServerHost for legacy config
      # Optional: add apiTokens if authentication is required
      # apiTokens: ["your-api-token"]
      modelMapping:
        "gpt-3.5-turbo": "your-vllm-model-name"

Test Chat Completion API:

curl http://your-higress-gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Test Streaming Response:

curl http://your-higress-gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Verify Model Mapping:

Configure model mapping in the plugin configuration
Send requests with OpenAI model names
Confirm they are correctly mapped to vLLM model names

Ⅴ. Special notes for reviews

Design Philosophy: The implementation closely follows the established patterns from openaiProvider and other providers, ensuring consistency across the codebase.
Authentication Flexibility: Unlike some providers that require API tokens, vLLM supports both authenticated and unauthenticated deployments. The implementation handles both scenarios gracefully.
Legacy Compatibility: The provider maintains backward compatibility with the legacy vllmServerHost configuration while encouraging migration to the more flexible vllmCustomUrl approach.
Path Handling: The provider includes intelligent path detection (isVllmDirectPath) to distinguish between:
- Direct API endpoints (e.g., /v1/chat/completions)
- Custom base paths that need capability-based routing
Future Extensions: The response transformation methods (TransformResponseBody, OnStreamingResponseBody) are currently pass-through implementations but provide hooks for future vLLM-specific transformations if needed.

…iguration

Change-Id: I86abbfb5c718225fab3dc6fd44cf64c3996e3a01

Change-Id: I001382122ac90abc185f0235ead9fd0dcf9cc106

lingma-agents · 2025-10-28T09:44:54Z

新增 vLLM AI 服务提供者支持

变更概述

新功能
- 添加了对 vLLM 作为新的 AI 提供者的完整支持，使 Higress 能够代理请求到 vLLM 推理服务。
- 实现了多个 OpenAI 兼容 API 的支持，包括聊天补全（/v1/chat/completions）、文本补全（/v1/completions）、模型列表（/v1/models）、嵌入（/v1/embeddings）和 Cohere 重排 API（/v1/rerank）。
- 引入了灵活的配置选项，如通过 vllmCustomUrl 指定自定义 vLLM 服务端点，并支持基于路径和能力的智能路由。
- 支持可选的身份验证机制，允许在有无认证的情况下访问 vLLM 服务。
重构
- 在 ProviderConfig 结构中增加了与 vLLM 相关的新字段及获取方法，用于处理 vLLM 特定的配置参数。
- 更新了初始化逻辑以识别并创建 vLLM 提供者实例。
测试更新
- 尽管未直接添加单元测试或集成测试代码，但实现遵循了现有提供者的通用模式，便于复用现有的测试策略进行验证。
文档
- 对新增的 vLLM 配置项进行了中文描述注释，提升了配置文件的可读性和易用性。
其他
- 维护了向后兼容性，继续支持旧版的 vllmServerHost 配置方式，同时推荐使用更灵活的 vllmCustomUrl 方式。

变更文件

文件路径	变更说明
plugins/wasm-go/extensions/ai-proxy/provider/provider.go	在该文件中注册了 vLLM 提供者类型及其初始化器，并为 `ProviderConfig` 添加了两个新的配置字段 `vllmCustomUrl` 和 `vllmServerHost` 及其对应的 getter 方法。此外，在 JSON 解析过程中也加入了这两个字段的支持。
plugins/wasm-go/extensions/ai-proxy/provider/vllm.go	新建了一个专门用于处理 vLLM 请求的提供者模块。实现了验证、默认能力和提供者创建逻辑；重写了请求头和请求体转换函数来适配 vLLM 服务的行为；还包含了响应头和响应体的基本处理框架，以及流式响应的数据处理接口。

时序图

sequenceDiagram
    participant HC as HttpContext
    participant VP as vllmProvider
    participant PC as ProviderConfig
    HC->>VP: OnRequestHeaders(apiName)
    VP->>PC: handleRequestHeaders(provider, ctx, apiName)
    HC->>VP: OnRequestBody(apiName, body)
    alt unsupported API
        VP-->>HC: ActionContinue with error
    else supported API
        VP->>PC: handleRequestBody(provider, cache, ctx, apiName, body)
    end
    HC->>VP: TransformRequestHeaders(apiName, headers)
    opt direct custom path
        VP->>util: OverwriteRequestPathHeader(headers, customPath)
    else capability-based routing
        VP->>util: OverwriteRequestPathHeaderByCapability(headers, apiName, capabilities)
    end
    opt custom domain set
        VP->>util: OverwriteRequestHostHeader(headers, customDomain)
    else fallback to server host
        VP->>util: OverwriteRequestHostHeader(headers, defaultOrConfiguredHost)
    end
    opt API tokens present
        VP->>util: OverwriteRequestAuthorizationHeader(headers, Bearer token)
    end
    VP->>headers: Del("Content-Length")
    HC->>VP: TransformRequestBody(apiName, body)
    VP->>PC: defaultTransformRequestBody(ctx, apiName, body)
    HC->>VP: GetApiName(path)
    HC->>VP: TransformResponseHeaders(apiName, headers)
    VP->>headers: Del("Content-Length")
    HC->>VP: TransformResponseBody(apiName, body)
    HC->>VP: OnStreamingResponseBody(name, chunk, isLastChunk)

💡 小贴士

与 lingma-agents 交流的方式

📜 直接回复评论
直接回复本条评论，lingma-agents 将自动处理您的请求。例如：

在当前代码中添加详细的注释说明。
请详细介绍一下你说的 LRU 改造方案，并使用伪代码加以说明。

📜 在代码行处标记
在文件的特定位置创建评论并 @lingma-agents。例如：

@lingma-agents 分析这个方法的性能瓶颈并提供优化建议。
@lingma-agents 对这个方法生成优化代码。

📜 在讨论中提问
在任何讨论中 @lingma-agents 来获取帮助。例如：

@lingma-agents 请总结上述讨论并提出解决方案。
@lingma-agents 请根据讨论内容生成优化代码。

codecov-commenter · 2025-10-28T09:53:36Z

Codecov Report

❌ Patch coverage is 2.19780% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.97%. Comparing base (ef31e09) to head (3be750a).
⚠️ Report is 759 commits behind head on main.

Files with missing lines	Patch %	Lines
...ugins/wasm-go/extensions/ai-proxy/provider/vllm.go	0.00%	85 Missing ⚠️
...s/wasm-go/extensions/ai-proxy/provider/provider.go	33.33%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #3067       +/-   ##
===========================================
+ Coverage   35.91%   45.97%   +10.06%     
===========================================
  Files          69      137       +68     
  Lines       11576    20747     +9171     
===========================================
+ Hits         4157     9539     +5382     
- Misses       7104    10731     +3627     
- Partials      315      477      +162

Flag	Coverage Δ
wasm-go-plugin-ai-proxy	`48.78% <2.19%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...s/wasm-go/extensions/ai-proxy/provider/provider.go	`57.52% <33.33%> (ø)`
...ugins/wasm-go/extensions/ai-proxy/provider/vllm.go	`0.00% <0.00%> (ø)`

... and 150 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

johnlanni

LGTM

…baba#3067)

feat: Add vLLM provider to ai-proxy to support rerank API (GitHub alibaba#3067) See merge request framework/ai-gateway!5

wydream added 3 commits August 13, 2025 10:23

feat(provider): add vLLM support with custom URL and server host conf…

b264eef

…iguration

Merge branch 'main' into feat/vllm-provider

1bdeba7

Change-Id: I86abbfb5c718225fab3dc6fd44cf64c3996e3a01

feat: add more api to vllm DefaultCapabilities

a59988e

Change-Id: I001382122ac90abc185f0235ead9fd0dcf9cc106

wydream requested review from johnlanni and rinfx as code owners October 28, 2025 09:44

Merge branch 'main' into feat/vllm-provider

3be750a

johnlanni approved these changes Oct 29, 2025

View reviewed changes

johnlanni merged commit 5e4c262 into alibaba:main Oct 29, 2025
15 checks passed

ink-hz pushed a commit to ink-hz/higress-ai-capability-auth that referenced this pull request Nov 5, 2025

Feat/vllm provider (alibaba#3067)

daf4905

CH3CHO pushed a commit to CH3CHO/higress that referenced this pull request Dec 2, 2025

feat: Add vLLM provider to ai-proxy to support rerank API (GitHub ali…

b1f18cd

…baba#3067)

CH3CHO pushed a commit to CH3CHO/higress that referenced this pull request Dec 2, 2025

Merge branch 'feat/rerank' into 'trip'

ec9228a

feat: Add vLLM provider to ai-proxy to support rerank API (GitHub alibaba#3067) See merge request framework/ai-gateway!5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/vllm provider #3067

Feat/vllm provider #3067

wydream commented Oct 28, 2025

Uh oh!

lingma-agents bot commented Oct 28, 2025

与 lingma-agents 交流的方式

Uh oh!

codecov-commenter commented Oct 28, 2025 •

edited

Loading

Uh oh!

johnlanni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat/vllm provider #3067

Feat/vllm provider #3067

Conversation

wydream commented Oct 28, 2025

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Uh oh!

lingma-agents bot commented Oct 28, 2025

新增 vLLM AI 服务提供者支持

与 lingma-agents 交流的方式

Uh oh!

codecov-commenter commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

johnlanni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Oct 28, 2025 •

edited

Loading