Skip to content

feat: canonical model client types, protocols, and LiteLLM bridge adapter#359

Merged
nabinchha merged 24 commits intomainfrom
nm/overhaul-model-facade-guts-pr1
Mar 5, 2026
Merged

feat: canonical model client types, protocols, and LiteLLM bridge adapter#359
nabinchha merged 24 commits intomainfrom
nm/overhaul-model-facade-guts-pr1

Conversation

@nabinchha
Copy link
Contributor

📋 Summary

Introduces a provider-agnostic ModelClient adapter boundary under engine/models/clients/ as the foundation for replacing LiteLLM with native provider adapters. This is PR-1 of the model facade overhaul described in plans/343/model-facade-overhaul-plan-step-1.md.

PR-1 is intentionally non-invasive: no ModelFacade call-site changes, no provider routing changes, no retry/throttle migration.

🔄 Changes

✨ Added

  • Canonical request/response types (clients/types.py) — ChatCompletionRequest/Response, EmbeddingRequest/Response, ImageGenerationRequest/Response, plus supporting types (Usage, ToolCall, ImagePayload, AssistantMessage)
  • ModelClient protocol (clients/base.py) — Composable protocols for chat completion, embeddings, and image generation with sync/async parity, capability checks, and close/aclose lifecycle methods
  • Canonical error taxonomy (clients/errors.py) — ProviderError with 13 ProviderErrorKind values, deterministic HTTP status mapping (401→AUTHENTICATION, 403→PERMISSION_DENIED, etc.), and proper Python exception chaining via __cause__
  • LiteLLM bridge adapter (clients/adapters/litellm_bridge.py) — Wraps the existing LiteLLM router and normalizes responses into canonical types. Handles all three operations (chat, embeddings, images) with robust parsing for tool calls, image format waterfall (data URIs, base64, URLs, dicts), list-of-blocks content coercion, and usage extraction
  • Architecture notes (model-facade-overhaul-pr-1-architecture-notes.md) — Documents canonical adapter boundary contract, bridge purpose, and planned follow-on
  • 33 unit tests across test_client_errors.py and test_litellm_bridge.py — Covers error mapping (12 parametrized status codes), exception chaining, sync/async completion, embeddings, image generation (chat + diffusion paths), list-content coercion, tool-call argument serialization, empty-choices edge case, and lifecycle methods

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • clients/errors.py_extract_response_text tries structured JSON extraction before falling back to raw text. This matters because httpx response.text contains the full body (including raw JSON), so without this priority order every error message from providers with structured error bodies would be a raw JSON blob
  • clients/adapters/litellm_bridge.py — Image parsing waterfall in _parse_image_payload handles 5+ response formats. Verify coverage against known provider response shapes
  • clients/base.pyModelClient protocol includes close/aclose lifecycle methods for PR-2 registry/resource teardown wiring

🤖 Generated with AI

@nabinchha nabinchha requested a review from a team as a code owner February 28, 2026 00:11
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR establishes a provider-agnostic ModelClient adapter boundary under engine/models/clients/ as PR-1 of the model-facade overhaul. It introduces canonical request/response types (ChatCompletionRequest/Response, EmbeddingRequest/Response, ImageGenerationRequest/Response), a ModelClient composable Protocol hierarchy, a canonical error taxonomy (ProviderError + 13 ProviderErrorKind values with deterministic HTTP-status mapping), a LiteLLMBridgeClient that wraps the existing LiteLLM router, and shared parsing helpers with full sync/async parity. The changes are intentionally non-invasive (no call-site changes to ModelFacade) and are well-covered by 33 unit tests.

Previously flagged critical issues have been verified as resolved:

  • HTTP-date Retry-After parsing now handles both delay-seconds and HTTP-date formats
  • FastAPI list-format detail arrays are correctly extracted in error messages
  • acompletion no longer blocks the event loop (uses aparse_chat_completion_response)
  • Image download failures log at WARNING level for operator visibility
  • total_tokens auto-computation uses coerced token values

One remaining concern:

  • _handle_non_provider_errors sets ProviderError.message = str(exc), which can expose raw JSON blobs for LiteLLM errors that embed full HTTP response bodies in their string representation. The project already has _extract_structured_message and _extract_response_text helpers in errors.py, but they require an HttpResponse object unavailable at this call site.

Confidence Score: 4/5

  • Safe to merge as a non-invasive PR-1 foundation; one minor style concern about error message formatting.
  • This PR establishes a solid foundation for the model facade overhaul with well-tested, non-invasive changes. All previously-flagged critical issues (event-loop blocking, HTTP-date retry parsing, FastAPI error extraction, token computation) have been resolved. The remaining concern is a style-level issue (error message clarity when exceptions are stringified) that doesn't affect correctness but would improve observability. No existing call-sites are modified, and the new code is comprehensively tested.
  • packages/data-designer-engine/src/data_designer/engine/models/clients/adapters/litellm_bridge.py (error message extraction in _handle_non_provider_errors)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LiteLLMBridgeClient
    participant _handle_non_provider_errors
    participant LiteLLMRouter
    participant parsing

    Caller->>LiteLLMBridgeClient: completion(ChatCompletionRequest)
    LiteLLMBridgeClient->>_handle_non_provider_errors: enter context
    LiteLLMBridgeClient->>LiteLLMRouter: router.completion(model, messages, **optional_fields)
    alt Provider error
        LiteLLMRouter-->>_handle_non_provider_errors: raises Exception (w/ status_code)
        _handle_non_provider_errors-->>Caller: raises ProviderError(kind, message=str(exc))
    else Success
        LiteLLMRouter-->>LiteLLMBridgeClient: raw LiteLLM response
        LiteLLMBridgeClient->>_handle_non_provider_errors: exit (no error)
        LiteLLMBridgeClient->>parsing: parse_chat_completion_response(response)
        parsing->>parsing: extract_tool_calls, extract_images_from_chat_message, extract_usage
        parsing-->>LiteLLMBridgeClient: ChatCompletionResponse
        LiteLLMBridgeClient-->>Caller: ChatCompletionResponse
    end

    Caller->>LiteLLMBridgeClient: generate_image(ImageGenerationRequest)
    alt messages is not None
        LiteLLMBridgeClient->>LiteLLMRouter: router.completion(model, messages, **image_kwargs)
        LiteLLMRouter-->>LiteLLMBridgeClient: chat response
        LiteLLMBridgeClient->>parsing: extract_images_from_chat_response(response)
    else messages is None
        LiteLLMBridgeClient->>LiteLLMRouter: router.image_generation(prompt, model, **image_kwargs)
        LiteLLMRouter-->>LiteLLMBridgeClient: image response
        LiteLLMBridgeClient->>parsing: extract_images_from_image_response(response)
    end
    parsing-->>LiteLLMBridgeClient: list[ImagePayload]
    LiteLLMBridgeClient-->>Caller: ImageGenerationResponse
Loading

Last reviewed commit: 61024c0

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(moved to line comments below)

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran some smoke tests against edge cases in the bridge helpers. Most things held up well -- malformed responses, missing choices, None messages, non-list embeddings all degrade gracefully. Plain text content doesn't false-positive as an image either.

One thing that came up: _extract_usage returns a Usage(input_tokens=None, output_tokens=None, ...) object (not None) when providers return token counts as strings (e.g. "10" instead of 10). That's because _to_int_or_none only handles int and float, so the string passes the is None guard on line 319 but then gets coerced to None by _to_int_or_none. End result: callers can't rely on if usage is not None to check if usage data was actually captured. Probably worth adding string handling to _to_int_or_none.

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking ahead to PR-3+: a bunch of the parsing helpers in the bridge (_parse_image_payload, _extract_images_from_chat_message, _extract_usage, _coerce_message_content) will be needed by the native OpenAI adapter too -- the image extraction waterfall in the plan is basically the same logic. Right now they're private functions inside the bridge module, which is supposed to be temporary. Might be worth pulling them into a shared clients/parsing.py (or similar) before PR-2 adds dependencies on the bridge. Easier to move now than later.

Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for 2 small but meaningful things before we merge:

  1. In LiteLLMBridgeClient, raw router/LiteLLM exceptions can still leak out. At this layer, we should normalize those into ProviderError/ProviderErrorKind so callers only see the canonical error shape.

  2. There’s a small usage parsing edge case: if a provider returns token counts as strings (like "10"), _extract_usage can return a Usage object where token fields end up None. That makes usage is not None look like we captured usage when we didn’t. Can we either parse numeric strings in _to_int_or_none or return None when normalized usage is empty?

Everything else I left is optional cleanup/follow-up.

…provements

- Wrap all LiteLLM router calls in try/except to normalize raw exceptions
  into canonical ProviderError at the bridge boundary (blocking review item)
- Extract reusable response-parsing helpers into clients/parsing.py for
  shared use across future native adapters
- Add async image parsing path using httpx.AsyncClient to avoid blocking
  the event loop in agenerate_image
- Add retry_after field to ProviderError for future retry engine support
- Fix _to_int_or_none to parse numeric strings from providers
- Create test conftest.py with shared mock_router/bridge_client fixtures
- Parametrize duplicate image generation and error mapping tests
- Add tests for exception wrapping across all bridge methods
@nabinchha
Copy link
Contributor Author

Requesting changes for 2 small but meaningful things before we merge:

  1. In LiteLLMBridgeClient, raw router/LiteLLM exceptions can still leak out. At this layer, we should normalize those into ProviderError/ProviderErrorKind so callers only see the canonical error shape.
  2. There’s a small usage parsing edge case: if a provider returns token counts as strings (like "10"), _extract_usage can return a Usage object where token fields end up None. That makes usage is not None look like we captured usage when we didn’t. Can we either parse numeric strings in _to_int_or_none or return None when normalized usage is empty?

Everything else I left is optional cleanup/follow-up.

Thanks @andreatgretel! Addressed in ec5ed9b

@nabinchha nabinchha requested a review from andreatgretel March 4, 2026 16:56
…larity

- Parse RFC 7231 HTTP-date strings in Retry-After header (used by
  Azure and Anthropic during rate-limiting) in addition to numeric
  delay-seconds
- Clarify collect_non_none_optional_fields docstring explaining why
  f.default is None is the correct check for optional field forwarding
- Add tests for HTTP-date and garbage Retry-After values
- Fix misleading comment about prompt field defaults in _IMAGE_EXCLUDE
- Handle list-format detail arrays in _extract_structured_message for
  FastAPI/Pydantic validation errors
- Document scope boundary for vision content in collect_raw_image_candidates
Comment on lines +120 to +128
def parse_image_payload(raw_image: Any) -> ImagePayload | None:
try:
result = resolve_image_payload(raw_image)
if isinstance(result, str):
return ImagePayload(b64_data=load_image_url_to_base64(result), mime_type=None)
return result
except Exception:
logger.debug("Unable to parse image payload from response object.", exc_info=True)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent image download failures produce empty response with no diagnostic

parse_image_payload (and its async twin aparse_image_payload) catch every exception from load_image_url_to_base64 and log it only at DEBUG level before returning None. Since parse_image_list filters out None values, a caller receives an ImageGenerationResponse.images = [] when all images were URL-format and every download failed — identical to the response for a provider that simply returned no images.

DALL-E 3 (and similar diffusion APIs) commonly return temporary presigned URLs when response_format="url" is requested. If the network download fails (transient 5xx, expired URL, DNS failure), the entire image result is silently lost. There is no WARNING/ERROR log, no raised exception, and no way for callers to distinguish "no images generated" from "images generated but download failed".

Consider logging at WARNING level so operators can diagnose missing images in production:

def parse_image_payload(raw_image: Any) -> ImagePayload | None:
    try:
        result = resolve_image_payload(raw_image)
        if isinstance(result, str):
            return ImagePayload(b64_data=load_image_url_to_base64(result), mime_type=None)
        return result
    except Exception:
        logger.warning("Failed to parse image payload from response object; image dropped.", exc_info=True)
        return None

Same change applies to aparse_image_payload at line 131.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/models/clients/parsing.py
Line: 120-128

Comment:
**Silent image download failures produce empty response with no diagnostic**

`parse_image_payload` (and its async twin `aparse_image_payload`) catch every exception from `load_image_url_to_base64` and log it only at `DEBUG` level before returning `None`. Since `parse_image_list` filters out `None` values, a caller receives an `ImageGenerationResponse.images = []` when all images were URL-format and every download failed — identical to the response for a provider that simply returned no images.

DALL-E 3 (and similar diffusion APIs) commonly return temporary presigned URLs when `response_format="url"` is requested. If the network download fails (transient 5xx, expired URL, DNS failure), the entire image result is silently lost. There is no `WARNING`/`ERROR` log, no raised exception, and no way for callers to distinguish "no images generated" from "images generated but download failed".

Consider logging at `WARNING` level so operators can diagnose missing images in production:

```python
def parse_image_payload(raw_image: Any) -> ImagePayload | None:
    try:
        result = resolve_image_payload(raw_image)
        if isinstance(result, str):
            return ImagePayload(b64_data=load_image_url_to_base64(result), mime_type=None)
        return result
    except Exception:
        logger.warning("Failed to parse image payload from response object; image dropped.", exc_info=True)
        return None
```

Same change applies to `aparse_image_payload` at line 131.

How can I resolve this? If you propose a fix, please make it concise.

andreatgretel
andreatgretel previously approved these changes Mar 5, 2026
Copy link
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! all the original feedback was addressed. left two minor nits on the image generation exception scope and the total_tokens coercion order, neither blocking.

@nabinchha
Copy link
Contributor Author

looks good! all the original feedback was addressed. left two minor nits on the image generation exception scope and the total_tokens coercion order, neither blocking.

Thanks @andreatgretel! All should be resolved now.

@nabinchha nabinchha requested a review from andreatgretel March 5, 2026 19:30
@nabinchha nabinchha merged commit 68a7f71 into main Mar 5, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants