Skip to content

Add PydanticAIHook to common.ai provider#62546

Merged
kaxil merged 7 commits intoapache:mainfrom
astronomer:aip-99-phase1-pydantic-ai-hook
Feb 27, 2026
Merged

Add PydanticAIHook to common.ai provider#62546
kaxil merged 7 commits intoapache:mainfrom
astronomer:aip-99-phase1-pydantic-ai-hook

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Feb 27, 2026

Adds PydanticAIHook to the common.ai provider — a hook for LLM access via pydantic-ai. This ships the connection and hook foundation for AIP-99. Future PRs will add operators (LLMSQLQueryOperator) and decorators (@task.llm_sql_query) on top.

The hook handles Airflow connection credentials and creates pydantic-ai Model and Agent objects. It works with any provider pydantic-ai supports: OpenAI, Anthropic, Google, Bedrock, Groq, Mistral, Ollama, vLLM, etc.

Usage

from airflow.sdk import dag, task
from airflow.providers.common.ai.hooks.pydantic_ai import PydanticAIHook

@dag(schedule=None)
def my_llm_pipeline():
    @task
    def summarize(text: str) -> str:
        hook = PydanticAIHook(llm_conn_id="openai_default", model_id="openai:gpt-5")
        agent = hook.create_agent(output_type=str, instructions="Summarize concisely.")
        result = agent.run_sync(text)
        return result.output

Connection fields:

Field Purpose Example
Password API key sk-...
Host Base URL (optional) http://localhost:11434/v1 for Ollama
Extra JSON Model identifier {"model": "openai:gpt-5"}

Cloud providers (Bedrock, Vertex) that use native auth chains leave password empty — pydantic-ai picks up AWS_PROFILE, GOOGLE_APPLICATION_CREDENTIALS, etc. automatically.

Why these choices

get_conn() returns pydantic-ai Model, not Agent or Connection. Airflow convention is that get_conn() returns a reusable SDK client (OpenAIHook → OpenAI client, DbApiHook → DBAPI connection). A pydantic-ai Model is the connection-level object (credentials + model ID). An Agent is session-level (binds a model to task-specific config), so it lives in create_agent().

No abstract LLMHook base class. Every Airflow LLM hook (OpenAIHook, CohereHook, GenAIHook) extends BaseHook directly. LLMs don't share a stable interface beyond "send text, get text" — divergence starts immediately with structured output, tools, streaming, vision. Pydantic-ai's Model protocol already handles abstraction. We can extract a base class later if a second framework creates real evidence of what a shared interface should look like.

Credential injection via provider_factory. infer_model() doesn't accept api_key/base_url directly — it takes a provider_factory callback that creates provider instances with credentials. Google Vertex/GLA are special-cased since they use ADC and don't accept api_key.

Known issues

CI image build conflict: pydantic-ai-slim requires opentelemetry-api>=1.28.0, which transitively pulls protobuf>=5.0. This conflicts with yandexcloud (protobuf<5) when all providers are installed together with --resolution highest. Same issue was identified in #61794. The provider is state: not-ready so this doesn't affect releases — it only affects the CI all-providers image build. Waiting for https://lists.apache.org/thread/qbx1b8p3296z5pj1hlg3qfggftgjw4m3

Co-Authored-By: GPK gopidesupavan@gmail.com

@kaxil kaxil force-pushed the aip-99-phase1-pydantic-ai-hook branch from a1bd0dc to 6be3efe Compare February 27, 2026 01:20
Copy link
Member

@gopidesupavan gopidesupavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thanx kaxil :)

@kaxil
Copy link
Member Author

kaxil commented Feb 27, 2026

Cool thanx kaxil :)

All your hard work man 🙏

@kaxil kaxil force-pushed the aip-99-phase1-pydantic-ai-hook branch 2 times, most recently from f035e2c to 2d90a0a Compare February 27, 2026 13:26
Adds a hook for LLM access via pydantic-ai to the common.ai provider.
The hook manages connection credentials and creates pydantic-ai Model
and Agent objects, supporting any provider (OpenAI, Anthropic, Google,
Bedrock, Ollama, vLLM, etc.).

- get_conn() returns a pydantic-ai Model configured with credentials
  from the Airflow connection (api_key, base_url via provider_factory)
- create_agent() creates a pydantic-ai Agent with the hook's model
- test_connection() validates model resolution without an API call
- Connection UI fields: password (API Key), host (base URL), extra (model)
- Google Vertex/GLA providers delegate to default ADC auth

Co-Authored-By: GPK <gopidesupavan@gmail.com>
@kaxil kaxil force-pushed the aip-99-phase1-pydantic-ai-hook branch from 3427dad to fa55516 Compare February 27, 2026 13:53
TypeVar on create_agent() lets mypy propagate the output_type
through Agent[None, OutputT] → RunResult[OutputT] → result.output,
so callers like example_pydantic_ai_hook.py don't need type: ignore.

Also fix black-docs blank line in RST code block.
- Move SQLResult inside task function so Sphinx autoapi doesn't
  document Pydantic BaseModel internals (fixes RST indentation errors)
- Add Groq, Ollama, vLLM to spelling wordlist
- Change "parseable" to "valid" in test_connection docstring
- Remove separate code-block from RST (class is now in exampleinclude)
- Import BaseHook from common.compat.sdk for Airflow 2.x/3.x compat
- Import dag/task from common.compat.sdk in example DAG
- Replace AirflowException with ValueError for model validation
- Use @overload for create_agent so mypy handles the default correctly
@dag-decorated functions must be invoked at module level for
DagBag to discover them. Without the calls, DagBag finds 0 DAGs.
The grpcio>=1.70.0 pin was only applied for Python 3.13 when it was
added in apache#61380, but yandexcloud>=0.328.0 ships generated protobuf
stubs that require grpcio>=1.70.0 at runtime on all Python versions.
@kaxil kaxil merged commit 840bcf3 into apache:main Feb 27, 2026
101 checks passed
AkshayArali pushed a commit to AkshayArali/airflow_630 that referenced this pull request Feb 28, 2026
Adds a hook for LLM access via pydantic-ai to the common.ai provider.
The hook manages connection credentials and creates pydantic-ai Model
and Agent objects, supporting any provider (OpenAI, Anthropic, Google,
Bedrock, Ollama, vLLM, etc.).

- get_conn() returns a pydantic-ai Model configured with credentials
  from the Airflow connection (api_key, base_url via provider_factory)
- create_agent() creates a pydantic-ai Agent with the hook's model
- test_connection() validates model resolution without an API call
- Connection UI fields: password (API Key), host (base URL), extra (model)
- Google Vertex/GLA providers delegate to default ADC auth

TypeVar on create_agent() lets mypy propagate the output_type
through Agent[None, OutputT] → RunResult[OutputT] → result.output,
so callers like example_pydantic_ai_hook.py don't need type: ignore.

Also fix black-docs blank line in RST code block.

- Move SQLResult inside task function so Sphinx autoapi doesn't
  document Pydantic BaseModel internals (fixes RST indentation errors)
- Add Groq, Ollama, vLLM to spelling wordlist
- Change "parseable" to "valid" in test_connection docstring
- Remove separate code-block from RST (class is now in exampleinclude)
- Import BaseHook from common.compat.sdk for Airflow 2.x/3.x compat
- Import dag/task from common.compat.sdk in example DAG
- Replace AirflowException with ValueError for model validation
- Use @overload for create_agent so mypy handles the default correctly


Co-authored-by: GPK <gopidesupavan@gmail.com>
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
Adds a hook for LLM access via pydantic-ai to the common.ai provider.
The hook manages connection credentials and creates pydantic-ai Model
and Agent objects, supporting any provider (OpenAI, Anthropic, Google,
Bedrock, Ollama, vLLM, etc.).

- get_conn() returns a pydantic-ai Model configured with credentials
  from the Airflow connection (api_key, base_url via provider_factory)
- create_agent() creates a pydantic-ai Agent with the hook's model
- test_connection() validates model resolution without an API call
- Connection UI fields: password (API Key), host (base URL), extra (model)
- Google Vertex/GLA providers delegate to default ADC auth

TypeVar on create_agent() lets mypy propagate the output_type
through Agent[None, OutputT] → RunResult[OutputT] → result.output,
so callers like example_pydantic_ai_hook.py don't need type: ignore.

Also fix black-docs blank line in RST code block.

- Move SQLResult inside task function so Sphinx autoapi doesn't
  document Pydantic BaseModel internals (fixes RST indentation errors)
- Add Groq, Ollama, vLLM to spelling wordlist
- Change "parseable" to "valid" in test_connection docstring
- Remove separate code-block from RST (class is now in exampleinclude)
- Import BaseHook from common.compat.sdk for Airflow 2.x/3.x compat
- Import dag/task from common.compat.sdk in example DAG
- Replace AirflowException with ValueError for model validation
- Use @overload for create_agent so mypy handles the default correctly


Co-authored-by: GPK <gopidesupavan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants