Skip to content

Vision mode not triggered when images selected in Claude CLI flow #10

@KlementMultiverse

Description

@KlementMultiverse

Problem

When using claude CLI with image selection, the vision preprocessing is skipped even though vision.enabled: true in config.yaml. Images are passed directly to Cursor API without OCR/vision processing, causing:

  1. The image handling path in src/openai-handler.ts doesn't detect that CLI requests contain images
  2. No vision mode logic executes before sending to Cursor
  3. Related to 使用claude cli 选择图片后不进vision #8 — users report images selected in CLI are ignored

Root Cause

The Anthropic Messages API flow (used by claude CLI) sends images in the content array as ImageBlockParam objects. The current vision preprocessing in converter.ts only processes OpenAI-style image objects (with url or base64 fields in specific locations), not Anthropic-style image blocks.

src/index.ts routes /v1/messages requests directly to the converter without checking for image content first. The vision check should happen before protocol conversion, but currently happens only in openai-handler.ts (post-conversion).

Expected Behavior

When Claude CLI sends a request with:

{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": "..."}},
      {"type": "text", "text": "analyze this image"}
    ]
  }]
}

The system should:

  1. Detect image blocks in messages[].content[]
  2. Extract base64 data before calling Cursor API
  3. Run OCR or vision API (per vision.mode config)
  4. Replace image blocks with text description in the prompt
  5. Send text-only request to Cursor, inject vision results into system prompt

Why This Matters

The vision feature (v2.3.0) is only functional for OpenAI clients (ChatBox, LobeChat) but broken for the primary use case: Claude CLI integration with Claude Code. This defeats the purpose of image support in a Claude-focused proxy.

Solution Scope

Add a preprocessImages() function in converter.ts that:

  • Detects ImageBlockParam objects in Anthropic message format
  • Extracts and processes images before cursor-client.ts makes the API call
  • Handles both OCR and external vision API modes
  • Returns modified messages with vision results injected

Call this in the Anthropic message handler before converting to Cursor format.


Contributed by Klement Gunndu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions