Skip to content

[BUG] InternalInstructor discards base_url when creating instructor client — breaks OpenAI-compatible endpoints #5204

@dnivra26

Description

@dnivra26

Description

When using an OpenAI-compatible provider with a custom base_url (e.g., self-hosted vLLM, Ollama, or any non-OpenAI endpoint), CrewAI's Converter/InternalInstructor silently discards the base_url and hits api.openai.com instead.

The root cause is in internal_instructor.py_create_instructor_client() extracts only the model name and provider string from self.llm.model, then calls:

return instructor.from_provider(f"{provider}/{model_string}")

The base_url from the LLM object is never forwarded. instructor.from_provider() creates a fresh OpenAI client defaulting to api.openai.com/v1/.

Note: There is a _get_llm_extra_kwargs() method that forwards base_url, but it is guarded behind is_litellm=True, so non-LiteLLM OpenAI-compatible providers are still affected.

Steps to Reproduce

  1. Configure a CrewAI agent with an OpenAI-compatible LLM that has a custom base_url (e.g., vLLM, Ollama, or any self-hosted endpoint)
  2. Set output_pydantic or output_json on a Task so that CrewAI invokes the Converter
  3. The Converter creates an InternalInstructor via converter.py:145-152, passing the full LLM object (which has base_url set)
  4. InternalInstructor._create_instructor_client() (internal_instructor.py:76-101) extracts only self.llm.model and self.llm.provider, then calls instructor.from_provider(f"{provider}/{model_string}")base_url is lost
  5. The instructor client sends requests to api.openai.com instead of the configured endpoint
  6. Result: ConnectTimeout / ConverterError in environments that cannot reach api.openai.com

Expected behavior

When a LLM is configured with a custom base_url, the InternalInstructor should forward that base_url to the instructor client so that structured output parsing requests go to the correct endpoint — not to api.openai.com.

Screenshots/Code snippets

The buggy code path (internal_instructor.py:76-101):

def _create_instructor_client(self) -> Any:
    import instructor

    if isinstance(self.llm, str):
        model_string = self.llm
    elif self.llm is not None and hasattr(self.llm, "model"):
        model_string = self.llm.model  # ← extracts model name
    else:
        raise ValueError(...)

    if isinstance(self.llm, str):
        provider = self._extract_provider()
    elif self.llm is not None and hasattr(self.llm, "provider"):
        provider = self.llm.provider  # ← extracts provider
    else:
        provider = "openai"

    # ← base_url is NEVER forwarded here
    return instructor.from_provider(f"{provider}/{model_string}")

The incomplete fix_get_llm_extra_kwargs() exists but is guarded:

def _get_llm_extra_kwargs(self) -> dict[str, Any]:
    # This guard means non-litellm providers never get base_url forwarded
    if not getattr(self.llm, "is_litellm", False):
        return {}  # ← base_url lost for OpenAI-compatible providers

    extra = {}
    for attr in ("api_base", "base_url", "api_key"):
        value = getattr(self.llm, attr, None)
        if value is not None:
            extra[attr] = value
    return extra

Operating System

macOS Sonoma

Python Version

3.12

crewAI Version

latest (main branch)

crewAI Tools Version

latest

Virtual Environment

Venv

Evidence

Traced through the source code:

  1. converter.py:145-152_create_instructor() correctly passes the full LLM object (with base_url):
def _create_instructor(self):
    return InternalInstructor(
        llm=self.llm,  # ← has base_url set
        model=self.model,
        content=self.text,
    )
  1. internal_instructor.py:76-101_create_instructor_client() discards base_url:
model_string = self.llm.model       # only extracts model name
provider = self.llm.provider         # only extracts provider
return instructor.from_provider(f"{provider}/{model_string}")  # base_url lost
  1. base_llm.py:123 — confirms base_url is a defined field on BaseLLM:
base_url: str | None = None
  1. _get_llm_extra_kwargs() exists but is guarded by is_litellm:
if not getattr(self.llm, "is_litellm", False):
    return {}  # non-litellm providers never get base_url forwarded
  1. Result in production: All structured output requests (output_pydantic, output_json) from OpenAI-compatible providers hit api.openai.comConnectTimeoutConverterError.

Possible Solution

Two options:

Option A (minimal fix): In _create_instructor_client(), pass base_url to instructor.from_provider() if the instructor library supports it, or construct an explicit OpenAI(base_url=...) client and use instructor.from_openai(client) instead.

Option B (broader fix): Remove the is_litellm guard from _get_llm_extra_kwargs() so that base_url and api_key are forwarded for all provider types, not just LiteLLM-backed ones. The guard was added because "non-litellm instructor clients (from_provider) don't accept them" — but the fix should be to use a client constructor that does accept them (e.g., instructor.from_openai(OpenAI(base_url=..., api_key=...))) rather than silently dropping the config.

Additional context

This affects anyone using OpenAI-compatible endpoints (vLLM, Ollama remote, Azure OpenAI with custom endpoints, etc.) with structured output tasks (output_pydantic or output_json). The agent's main LLM calls work fine because they go through the LLM class directly, but the Converter/InternalInstructor path bypasses the LLM and creates its own client — losing the base_url in the process.

There are partial fixes on feature branches (commits 9dabb3e, 9bdc7b9 for LiteLLM, and f9ae6c5 for A2A) but none address the non-LiteLLM OpenAI-compatible path on main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions