-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[BUG] InternalInstructor discards base_url when creating instructor client — breaks OpenAI-compatible endpoints #5204
Description
Description
When using an OpenAI-compatible provider with a custom base_url (e.g., self-hosted vLLM, Ollama, or any non-OpenAI endpoint), CrewAI's Converter/InternalInstructor silently discards the base_url and hits api.openai.com instead.
The root cause is in internal_instructor.py — _create_instructor_client() extracts only the model name and provider string from self.llm.model, then calls:
return instructor.from_provider(f"{provider}/{model_string}")The base_url from the LLM object is never forwarded. instructor.from_provider() creates a fresh OpenAI client defaulting to api.openai.com/v1/.
Note: There is a _get_llm_extra_kwargs() method that forwards base_url, but it is guarded behind is_litellm=True, so non-LiteLLM OpenAI-compatible providers are still affected.
Steps to Reproduce
- Configure a CrewAI agent with an OpenAI-compatible LLM that has a custom
base_url(e.g., vLLM, Ollama, or any self-hosted endpoint) - Set
output_pydanticoroutput_jsonon a Task so that CrewAI invokes the Converter - The Converter creates an
InternalInstructorviaconverter.py:145-152, passing the full LLM object (which hasbase_urlset) InternalInstructor._create_instructor_client()(internal_instructor.py:76-101) extracts onlyself.llm.modelandself.llm.provider, then callsinstructor.from_provider(f"{provider}/{model_string}")—base_urlis lost- The instructor client sends requests to
api.openai.cominstead of the configured endpoint - Result:
ConnectTimeout/ConverterErrorin environments that cannot reachapi.openai.com
Expected behavior
When a LLM is configured with a custom base_url, the InternalInstructor should forward that base_url to the instructor client so that structured output parsing requests go to the correct endpoint — not to api.openai.com.
Screenshots/Code snippets
The buggy code path (internal_instructor.py:76-101):
def _create_instructor_client(self) -> Any:
import instructor
if isinstance(self.llm, str):
model_string = self.llm
elif self.llm is not None and hasattr(self.llm, "model"):
model_string = self.llm.model # ← extracts model name
else:
raise ValueError(...)
if isinstance(self.llm, str):
provider = self._extract_provider()
elif self.llm is not None and hasattr(self.llm, "provider"):
provider = self.llm.provider # ← extracts provider
else:
provider = "openai"
# ← base_url is NEVER forwarded here
return instructor.from_provider(f"{provider}/{model_string}")The incomplete fix — _get_llm_extra_kwargs() exists but is guarded:
def _get_llm_extra_kwargs(self) -> dict[str, Any]:
# This guard means non-litellm providers never get base_url forwarded
if not getattr(self.llm, "is_litellm", False):
return {} # ← base_url lost for OpenAI-compatible providers
extra = {}
for attr in ("api_base", "base_url", "api_key"):
value = getattr(self.llm, attr, None)
if value is not None:
extra[attr] = value
return extraOperating System
macOS Sonoma
Python Version
3.12
crewAI Version
latest (main branch)
crewAI Tools Version
latest
Virtual Environment
Venv
Evidence
Traced through the source code:
converter.py:145-152—_create_instructor()correctly passes the full LLM object (withbase_url):
def _create_instructor(self):
return InternalInstructor(
llm=self.llm, # ← has base_url set
model=self.model,
content=self.text,
)internal_instructor.py:76-101—_create_instructor_client()discardsbase_url:
model_string = self.llm.model # only extracts model name
provider = self.llm.provider # only extracts provider
return instructor.from_provider(f"{provider}/{model_string}") # base_url lostbase_llm.py:123— confirmsbase_urlis a defined field onBaseLLM:
base_url: str | None = None_get_llm_extra_kwargs()exists but is guarded byis_litellm:
if not getattr(self.llm, "is_litellm", False):
return {} # non-litellm providers never get base_url forwarded- Result in production: All structured output requests (
output_pydantic,output_json) from OpenAI-compatible providers hitapi.openai.com→ConnectTimeout→ConverterError.
Possible Solution
Two options:
Option A (minimal fix): In _create_instructor_client(), pass base_url to instructor.from_provider() if the instructor library supports it, or construct an explicit OpenAI(base_url=...) client and use instructor.from_openai(client) instead.
Option B (broader fix): Remove the is_litellm guard from _get_llm_extra_kwargs() so that base_url and api_key are forwarded for all provider types, not just LiteLLM-backed ones. The guard was added because "non-litellm instructor clients (from_provider) don't accept them" — but the fix should be to use a client constructor that does accept them (e.g., instructor.from_openai(OpenAI(base_url=..., api_key=...))) rather than silently dropping the config.
Additional context
This affects anyone using OpenAI-compatible endpoints (vLLM, Ollama remote, Azure OpenAI with custom endpoints, etc.) with structured output tasks (output_pydantic or output_json). The agent's main LLM calls work fine because they go through the LLM class directly, but the Converter/InternalInstructor path bypasses the LLM and creates its own client — losing the base_url in the process.
There are partial fixes on feature branches (commits 9dabb3e, 9bdc7b9 for LiteLLM, and f9ae6c5 for A2A) but none address the non-LiteLLM OpenAI-compatible path on main.