Skip to content

Conversation

@wydream
Copy link
Collaborator

@wydream wydream commented Oct 28, 2025

Ⅰ. Describe what this PR did

This PR adds support for vLLM as a new AI provider in the ai-proxy plugin, enabling Higress to proxy requests to vLLM inference services.

Key Features:

  • OpenAI-Compatible API Support: Implements support for multiple OpenAI-compatible APIs including:

    • Chat Completions (/v1/chat/completions)
    • Text Completions (/v1/completions)
    • Model Listing (/v1/models)
    • Embeddings (/v1/embeddings)
    • Cohere Rerank API (/v1/rerank)
  • Flexible Configuration:

    • Custom URL support via vllmCustomUrl for specifying custom vLLM service endpoints
    • Automatic path routing based on API capabilities
    • Support for both direct path and capability-based routing
    • Backward compatibility with legacy vllmServerHost configuration
  • Authentication Options:

    • Optional API token authentication (supports scenarios with or without authentication)
    • Standard Bearer token authentication when tokens are configured
  • Standard Provider Features:

    • Model name mapping support
    • Streaming response handling
    • Request/response header and body transformation
    • Context caching for improved performance

Implementation Details:

  • Follows the standard Provider interface pattern established in the ai-proxy architecture
  • Reuses common transformation logic from ProviderConfig
  • Provides sensible defaults (e.g., vllm-service.cluster.local as default domain)
  • Handles both authenticated and unauthenticated vLLM deployments

Ⅱ. Does this pull request fix one issue?

This PR adds a new feature rather than fixing a specific issue. It enables integration with vLLM, a popular high-throughput LLM inference engine.

Ⅲ. Why don't you add test cases (unit test/integration test)?

Test cases follow the existing pattern in the ai-proxy plugin:

  • The implementation reuses well-tested common logic from ProviderConfig.handleRequestBody() and defaultTransformRequestBody()
  • Provider-specific logic (path routing, header transformation) is straightforward and follows established patterns from other providers (OpenAI, Qwen, etc.)
  • Integration testing can be performed using the provided example configuration with actual vLLM deployments

Ⅳ. Describe how to verify it

Prerequisites:

  • A running vLLM service (can be deployed in Kubernetes or accessible via HTTP)
  • Higress gateway deployed with ai-proxy plugin

Verification Steps:

  1. Configure the vLLM Provider:
apiVersion: extensions.higress.io/v1alpha1
kind: WasmPlugin
metadata:
  name: ai-proxy-vllm
  namespace: higress-system
spec:
  defaultConfig:
    provider:
      type: vllm
      vllmCustomUrl: "http://vllm-service:8000/v1"  # or use vllmServerHost for legacy config
      # Optional: add apiTokens if authentication is required
      # apiTokens: ["your-api-token"]
      modelMapping:
        "gpt-3.5-turbo": "your-vllm-model-name"
  1. Test Chat Completion API:
curl http://your-higress-gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
  1. Test Streaming Response:
curl http://your-higress-gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'
  1. Verify Model Mapping:
  • Configure model mapping in the plugin configuration
  • Send requests with OpenAI model names
  • Confirm they are correctly mapped to vLLM model names

Ⅴ. Special notes for reviews

  1. Design Philosophy: The implementation closely follows the established patterns from openaiProvider and other providers, ensuring consistency across the codebase.

  2. Authentication Flexibility: Unlike some providers that require API tokens, vLLM supports both authenticated and unauthenticated deployments. The implementation handles both scenarios gracefully.

  3. Legacy Compatibility: The provider maintains backward compatibility with the legacy vllmServerHost configuration while encouraging migration to the more flexible vllmCustomUrl approach.

  4. Path Handling: The provider includes intelligent path detection (isVllmDirectPath) to distinguish between:

    • Direct API endpoints (e.g., /v1/chat/completions)
    • Custom base paths that need capability-based routing
  5. Future Extensions: The response transformation methods (TransformResponseBody, OnStreamingResponseBody) are currently pass-through implementations but provide hooks for future vLLM-specific transformations if needed.

Change-Id: I86abbfb5c718225fab3dc6fd44cf64c3996e3a01
Change-Id: I001382122ac90abc185f0235ead9fd0dcf9cc106
@lingma-agents
Copy link

lingma-agents bot commented Oct 28, 2025

新增 vLLM AI 服务提供者支持

变更概述
  • 新功能

    • 添加了对 vLLM 作为新的 AI 提供者的完整支持,使 Higress 能够代理请求到 vLLM 推理服务。
    • 实现了多个 OpenAI 兼容 API 的支持,包括聊天补全(/v1/chat/completions)、文本补全(/v1/completions)、模型列表(/v1/models)、嵌入(/v1/embeddings)和 Cohere 重排 API(/v1/rerank)。
    • 引入了灵活的配置选项,如通过 vllmCustomUrl 指定自定义 vLLM 服务端点,并支持基于路径和能力的智能路由。
    • 支持可选的身份验证机制,允许在有无认证的情况下访问 vLLM 服务。
  • 重构

    • ProviderConfig 结构中增加了与 vLLM 相关的新字段及获取方法,用于处理 vLLM 特定的配置参数。
    • 更新了初始化逻辑以识别并创建 vLLM 提供者实例。
  • 测试更新

    • 尽管未直接添加单元测试或集成测试代码,但实现遵循了现有提供者的通用模式,便于复用现有的测试策略进行验证。
  • 文档

    • 对新增的 vLLM 配置项进行了中文描述注释,提升了配置文件的可读性和易用性。
  • 其他

    • 维护了向后兼容性,继续支持旧版的 vllmServerHost 配置方式,同时推荐使用更灵活的 vllmCustomUrl 方式。
变更文件
文件路径 变更说明
plugins/​wasm-go/​extensions/​ai-proxy/​provider/​provider.​go 在该文件中注册了 vLLM 提供者类型及其初始化器,并为 `ProviderConfig` 添加了两个新的配置字段 `vllmCustomUrl` 和 `vllmServerHost` 及其对应的 getter 方法。此外,在 JSON 解析过程中也加入了这两个字段的支持。
plugins/​wasm-go/​extensions/​ai-proxy/​provider/​vllm.​go 新建了一个专门用于处理 vLLM 请求的提供者模块。实现了验证、默认能力和提供者创建逻辑;重写了请求头和请求体转换函数来适配 vLLM 服务的行为;还包含了响应头和响应体的基本处理框架,以及流式响应的数据处理接口。
时序图
sequenceDiagram
    participant HC as HttpContext
    participant VP as vllmProvider
    participant PC as ProviderConfig
    HC->>VP: OnRequestHeaders(apiName)
    VP->>PC: handleRequestHeaders(provider, ctx, apiName)
    HC->>VP: OnRequestBody(apiName, body)
    alt unsupported API
        VP-->>HC: ActionContinue with error
    else supported API
        VP->>PC: handleRequestBody(provider, cache, ctx, apiName, body)
    end
    HC->>VP: TransformRequestHeaders(apiName, headers)
    opt direct custom path
        VP->>util: OverwriteRequestPathHeader(headers, customPath)
    else capability-based routing
        VP->>util: OverwriteRequestPathHeaderByCapability(headers, apiName, capabilities)
    end
    opt custom domain set
        VP->>util: OverwriteRequestHostHeader(headers, customDomain)
    else fallback to server host
        VP->>util: OverwriteRequestHostHeader(headers, defaultOrConfiguredHost)
    end
    opt API tokens present
        VP->>util: OverwriteRequestAuthorizationHeader(headers, Bearer token)
    end
    VP->>headers: Del("Content-Length")
    HC->>VP: TransformRequestBody(apiName, body)
    VP->>PC: defaultTransformRequestBody(ctx, apiName, body)
    HC->>VP: GetApiName(path)
    HC->>VP: TransformResponseHeaders(apiName, headers)
    VP->>headers: Del("Content-Length")
    HC->>VP: TransformResponseBody(apiName, body)
    HC->>VP: OnStreamingResponseBody(name, chunk, isLastChunk)
Loading

💡 小贴士

与 lingma-agents 交流的方式

📜 直接回复评论
直接回复本条评论,lingma-agents 将自动处理您的请求。例如:

  • 在当前代码中添加详细的注释说明。

  • 请详细介绍一下你说的 LRU 改造方案,并使用伪代码加以说明。

📜 在代码行处标记
在文件的特定位置创建评论并 @lingma-agents。例如:

  • @lingma-agents 分析这个方法的性能瓶颈并提供优化建议。

  • @lingma-agents 对这个方法生成优化代码。

📜 在讨论中提问
在任何讨论中 @lingma-agents 来获取帮助。例如:

  • @lingma-agents 请总结上述讨论并提出解决方案。

  • @lingma-agents 请根据讨论内容生成优化代码。

@codecov-commenter
Copy link

codecov-commenter commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 2.19780% with 89 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.97%. Comparing base (ef31e09) to head (3be750a).
⚠️ Report is 759 commits behind head on main.

Files with missing lines Patch % Lines
...ugins/wasm-go/extensions/ai-proxy/provider/vllm.go 0.00% 85 Missing ⚠️
...s/wasm-go/extensions/ai-proxy/provider/provider.go 33.33% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #3067       +/-   ##
===========================================
+ Coverage   35.91%   45.97%   +10.06%     
===========================================
  Files          69      137       +68     
  Lines       11576    20747     +9171     
===========================================
+ Hits         4157     9539     +5382     
- Misses       7104    10731     +3627     
- Partials      315      477      +162     
Flag Coverage Δ
wasm-go-plugin-ai-proxy 48.78% <2.19%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...s/wasm-go/extensions/ai-proxy/provider/provider.go 57.52% <33.33%> (ø)
...ugins/wasm-go/extensions/ai-proxy/provider/vllm.go 0.00% <0.00%> (ø)

... and 150 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@johnlanni johnlanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnlanni johnlanni merged commit 5e4c262 into alibaba:main Oct 29, 2025
15 checks passed
ink-hz pushed a commit to ink-hz/higress-ai-capability-auth that referenced this pull request Nov 5, 2025
CH3CHO pushed a commit to CH3CHO/higress that referenced this pull request Dec 2, 2025
CH3CHO pushed a commit to CH3CHO/higress that referenced this pull request Dec 2, 2025
feat: Add vLLM provider to ai-proxy to support rerank API (GitHub alibaba#3067)

See merge request framework/ai-gateway!5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants