[BUG] InternalInstructor discards base_url when creating instructor client — breaks OpenAI-compatible endpoints

### Description

When using an OpenAI-compatible provider with a custom `base_url` (e.g., self-hosted vLLM, Ollama, or any non-OpenAI endpoint), CrewAI's Converter/InternalInstructor silently discards the `base_url` and hits `api.openai.com` instead.

The root cause is in `internal_instructor.py` — `_create_instructor_client()` extracts only the model name and provider string from `self.llm.model`, then calls:

```python
return instructor.from_provider(f"{provider}/{model_string}")
```

The `base_url` from the LLM object is never forwarded. `instructor.from_provider()` creates a fresh OpenAI client defaulting to `api.openai.com/v1/`.

**Note:** There is a `_get_llm_extra_kwargs()` method that forwards `base_url`, but it is guarded behind `is_litellm=True`, so non-LiteLLM OpenAI-compatible providers are still affected.

### Steps to Reproduce

1. Configure a CrewAI agent with an OpenAI-compatible LLM that has a custom `base_url` (e.g., vLLM, Ollama, or any self-hosted endpoint)
2. Set `output_pydantic` or `output_json` on a Task so that CrewAI invokes the Converter
3. The Converter creates an `InternalInstructor` via `converter.py:145-152`, passing the full LLM object (which has `base_url` set)
4. `InternalInstructor._create_instructor_client()` (`internal_instructor.py:76-101`) extracts only `self.llm.model` and `self.llm.provider`, then calls `instructor.from_provider(f"{provider}/{model_string}")` — `base_url` is lost
5. The instructor client sends requests to `api.openai.com` instead of the configured endpoint
6. Result: `ConnectTimeout` / `ConverterError` in environments that cannot reach `api.openai.com`

### Expected behavior

When a LLM is configured with a custom `base_url`, the InternalInstructor should forward that `base_url` to the instructor client so that structured output parsing requests go to the correct endpoint — not to `api.openai.com`.

### Screenshots/Code snippets

**The buggy code path** (`internal_instructor.py:76-101`):

```python
def _create_instructor_client(self) -> Any:
    import instructor

    if isinstance(self.llm, str):
        model_string = self.llm
    elif self.llm is not None and hasattr(self.llm, "model"):
        model_string = self.llm.model  # ← extracts model name
    else:
        raise ValueError(...)

    if isinstance(self.llm, str):
        provider = self._extract_provider()
    elif self.llm is not None and hasattr(self.llm, "provider"):
        provider = self.llm.provider  # ← extracts provider
    else:
        provider = "openai"

    # ← base_url is NEVER forwarded here
    return instructor.from_provider(f"{provider}/{model_string}")
```

**The incomplete fix** — `_get_llm_extra_kwargs()` exists but is guarded:

```python
def _get_llm_extra_kwargs(self) -> dict[str, Any]:
    # This guard means non-litellm providers never get base_url forwarded
    if not getattr(self.llm, "is_litellm", False):
        return {}  # ← base_url lost for OpenAI-compatible providers

    extra = {}
    for attr in ("api_base", "base_url", "api_key"):
        value = getattr(self.llm, attr, None)
        if value is not None:
            extra[attr] = value
    return extra
```

### Operating System

macOS Sonoma

### Python Version

3.12

### crewAI Version

latest (main branch)

### crewAI Tools Version

latest

### Virtual Environment

Venv

### Evidence

Traced through the source code:

1. **`converter.py:145-152`** — `_create_instructor()` correctly passes the full LLM object (with `base_url`):
```python
def _create_instructor(self):
    return InternalInstructor(
        llm=self.llm,  # ← has base_url set
        model=self.model,
        content=self.text,
    )
```

2. **`internal_instructor.py:76-101`** — `_create_instructor_client()` discards `base_url`:
```python
model_string = self.llm.model       # only extracts model name
provider = self.llm.provider         # only extracts provider
return instructor.from_provider(f"{provider}/{model_string}")  # base_url lost
```

3. **`base_llm.py:123`** — confirms `base_url` is a defined field on `BaseLLM`:
```python
base_url: str | None = None
```

4. **`_get_llm_extra_kwargs()`** exists but is guarded by `is_litellm`:
```python
if not getattr(self.llm, "is_litellm", False):
    return {}  # non-litellm providers never get base_url forwarded
```

5. **Result in production:** All structured output requests (`output_pydantic`, `output_json`) from OpenAI-compatible providers hit `api.openai.com` → `ConnectTimeout` → `ConverterError`.

### Possible Solution

Two options:

**Option A (minimal fix):** In `_create_instructor_client()`, pass `base_url` to `instructor.from_provider()` if the instructor library supports it, or construct an explicit `OpenAI(base_url=...)` client and use `instructor.from_openai(client)` instead.

**Option B (broader fix):** Remove the `is_litellm` guard from `_get_llm_extra_kwargs()` so that `base_url` and `api_key` are forwarded for all provider types, not just LiteLLM-backed ones. The guard was added because "non-litellm instructor clients (from_provider) don't accept them" — but the fix should be to use a client constructor that does accept them (e.g., `instructor.from_openai(OpenAI(base_url=..., api_key=...))`) rather than silently dropping the config.

### Additional context

This affects anyone using OpenAI-compatible endpoints (vLLM, Ollama remote, Azure OpenAI with custom endpoints, etc.) with structured output tasks (`output_pydantic` or `output_json`). The agent's main LLM calls work fine because they go through the LLM class directly, but the Converter/InternalInstructor path bypasses the LLM and creates its own client — losing the `base_url` in the process.

There are partial fixes on feature branches (commits 9dabb3e81, 9bdc7b9ee for LiteLLM, and f9ae6c52d for A2A) but none address the non-LiteLLM OpenAI-compatible path on main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] InternalInstructor discards base_url when creating instructor client — breaks OpenAI-compatible endpoints #5204

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] InternalInstructor discards base_url when creating instructor client — breaks OpenAI-compatible endpoints #5204

Description

Description

Steps to Reproduce

Expected behavior

Screenshots/Code snippets

Operating System

Python Version

crewAI Version

crewAI Tools Version

Virtual Environment

Evidence

Possible Solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions