Add Langchain hook to common ai provider by vikramkoka · Pull Request #67192 · apache/airflow

vikramkoka · 2026-05-19T15:43:03Z

Summary

Adds LangChainHook to the common.ai provider, bridging an Airflow connection to LangChain chat and embedding models. The hook resolves credentials from a connection of type langchain and dispatches to the right vendor implementation via LangChain's universal initializers (init_chat_model and init_embeddings).

Design rationale

Own `langchain` connection type

Vendor names shouldn't be load-bearing across hook families. An earlier revision of this PR reused the existing pydanticai conn_type, but that conflated the UI: a user opening a "Pydantic AI" connection form while configuring LangChain is a leaky abstraction. The four pydanticai-* connection shapes also have different field layouts (Azure endpoint+deployment, Bedrock IAM, Vertex GCP), so the "shared conn" story silently misrouted for three of them.

Each framework now owns its conn_type. Future LangChain cloud-auth variants (Bedrock, Vertex, Azure) will follow the per-vendor-subclass pattern already established by PydanticAIBedrockHook / PydanticAIVertexHook / PydanticAIAzureHook.

Vendor-agnostic dispatch via `init_chat_model` and `init_embeddings`

The hook uses langchain.chat_models.init_chat_model("provider:name", api_key=..., base_url=...) for chat and the parallel langchain.embeddings.init_embeddings("provider:name", ...) for embeddings. Dispatch covers any provider those initializers support that accepts the api_key + optional base_url credential shape: OpenAI itself, OpenAI-compatible endpoints (Ollama, vLLM, LM Studio) via the openai: prefix + custom host, Anthropic, Groq, Mistral AI chat, DeepSeek, and others.

Providers with bespoke auth (Bedrock, Vertex, Azure for chat; Cohere, HuggingFace, Mistral embeddings, Bedrock embeddings) reject the api_key/base_url kwarg shape and are deferred to per-vendor subclasses. The docs scope the listed providers honestly so users don't hit a ValidationError at runtime trying a provider that looked supported on paper.

Single hook serves chat + embeddings; optional `embed_conn_id`

get_chat_model() reads llm_conn_id + llm_model; get_embedding_model() reads embed_conn_id (falls back to llm_conn_id) + embed_model. The common one-provider case stays a single hook instance. When chat and embeddings live on different API keys (premium chat vs free-tier embeddings), pass an explicit embed_conn_id.

`conn.extra_dejson` for parity with `PydanticAIHook`

The hook parses extra via conn.extra_dejson (matching PydanticAIHook), which swallows JSONDecodeError, returns {} for empty values, and applies secret masking. Sibling consistency matters: a user mis-keying their extra JSON gets the same behavior across both hooks.

`[langchain]` extra is framework-only

pip install apache-airflow-providers-common-ai[langchain] installs only langchain itself. Bundling langchain-openai (or any other vendor's integration package) under a framework-named extra would conflate the framework with a vendor choice -- the same kind of mistake as the conn_type. Users install their vendor's LangChain integration package separately (langchain-openai, langchain-anthropic, langchain-groq, etc.).

Usage

from airflow.providers.common.ai.hooks.langchain import LangChainHook
from airflow.sdk import task


@task
def summarize(text: str) -> str:
    hook = LangChainHook(
        llm_conn_id="langchain_default",
        llm_model="anthropic:claude-3-7-sonnet",
    )
    llm = hook.get_chat_model()
    return llm.invoke(f"Summarise: {text}").content

Configure the langchain_default connection (type langchain) with the API key in password, optionally a custom base URL in host, and optionally extra={"model": "...", "embed_model": "..."} to set default model identifiers on the connection.

See example_langchain_hook.py for chat-only, embedding-only, dual-capability, and separate-conn patterns, and example_langchain_tool_agent.py for an end-to-end ReAct agent demo with HITL review.

Gotchas

Cloud-auth providers (Bedrock, Vertex, Azure) are not covered by the api_key + base_url surface. LangChainBedrockHook / LangChainVertexHook / LangChainAzureHook subclasses (mirroring the pydantic-ai pattern) are deferred to a follow-up.
Cohere, HuggingFace, Mistral embeddings, etc. require provider-specific credential kwargs (cohere_api_key, AWS auth chain, GCP service-account) that this hook does not forward. Same follow-up.
default_conn_name is langchain_default, not pydanticai_default. Users adopting this hook need to create a new langchain connection in the UI rather than reusing an existing pydanticai_default. The per-framework conn_type is the right tradeoff; a back-compat alias would carry the wrong abstraction forward.

Deferred follow-ups

BaseChatHook / BaseAgentHook / BaseEmbeddingHook contract extraction in common.ai. Once that lands, LangChainHook will inherit from BaseChatHook + BaseEmbeddingHook. Operators will dispatch via BaseHook.get_hook(conn_id) instead of hardcoded conn_type checks.
LangChain cloud-auth variants (Bedrock, Vertex, Azure).
@task.langchain decorator, consistent with the absence of @task.pydantic_ai today; will land alongside the BaseChatHook refactor.

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Generated-by: [Claude] following the guidelines

- Adds LangChainHook to bridge Airflow connections to LangChain model constructors (ChatOpenAI, OpenAIEmbeddings), using constructor injection for credentials - Reuses the existing pydanticai connection type so users configure one connection for PydanticAI, LlamaIndex, and LangChain - Follows the same pattern as LlamaIndexHook: _resolve_connection_kwargs() extracts api_key and base_url from the Airflow connection and passes them directly to LangChain constructors - Adds langchain optional dependency extra (langchain>=1.0.0, langchain-openai>=0.3.0) What's included - hooks/langchain.py — LangChainHook(BaseHook) with get_chat_model() and get_embedding_model() - tests/unit/common/ai/hooks/test_langchain.py — full test coverage (init, connection resolution, chat model, embedding model) - docs/hooks/langchain.rst — hook documentation with usage examples - provider.yaml — LangChain integration and hook registration - pyproject.toml — langchain optional dependency extra Design decisions - BaseHook, not BaseAIHook — BaseAIHook is still in development. Will migrate in a follow-up PR once it ships. - Constructor injection — credentials passed as api_key=/base_url= kwargs to LangChain constructors. No environment variable mutation. Matches the LlamaIndexHook pattern. - Shared connection type — reuses pydanticai connection type rather than introducing a new one. One connection works across all three frameworks. - No @task.langchain yet — consistent with LlamaIndex (no @task.llamaindex). Deferred to the BaseAIHook migration PR.

- Own `langchain` connection type instead of reusing `pydanticai`, so the UI is honest about which framework a connection configures. The four pydanticai-* conn shapes don't map uniformly to LangChain either, so the "shared conn" framing silently misrouted for three of them. - Replace hardcoded `langchain_openai.ChatOpenAI` with `langchain.chat_models.init_chat_model("<provider>:<model>", api_key=..., base_url=...)`. Same parallel API for embeddings via `langchain.embeddings.init_embeddings`. Dispatch covers anything those initializers support that accepts the api_key + base_url credential shape (OpenAI, OpenAI-compatible endpoints, Ollama). Providers with bespoke auth (Bedrock, Vertex, Azure, Cohere, HuggingFace, Mistral embeddings) are deferred to per-vendor subclasses, mirroring the pydantic-ai pattern. - `embed_conn_id` (optional, falls back to `llm_conn_id`) keeps the single hook instance ergonomic for the common case while supporting different API keys for chat and embeddings. - Parse `conn.extra` via `conn.extra_dejson` for parity with PydanticAIHook (swallows JSONDecodeError, applies secret masking). - `default_conn_name` resolves at runtime rather than at class-def time, so future per-vendor subclasses (Bedrock/Vertex/Azure) inherit it cleanly. - Example DAG: build the FAISS vectorstore once in `_build_tools` and close over it (the search tool is invoked many times per agent run), drop the eval-based calculator tool, add a `get_current_utc_time` tool instead. - Docs explicitly scope the supported provider list to ones whose embedding/chat classes accept the api_key + base_url surface. - `default_conn_name` is now `langchain_default`.

- Add `langchain>=1.0.0` and `langchain-openai>=0.3.0` to the `dev` dependency group. The test suite uses `@patch("langchain.chat_models. init_chat_model")` and `@patch("langchain.embeddings.init_embeddings")`, which import the target modules at decorator-resolution time. Without langchain in the dev environment, the LangChainHook tests fail at collection. Mirrors the `pydantic-ai-slim[mcp]` line that's there for MCPHook tests. - Drop the stale `# TODO: inherit from BaseChatHook ...` comment from the hook. A future contributor adding the `BaseChatHook` contract will refactor every framework hook in one pass; a per-hook TODO doesn't help and the parenthetical was PR-process commentary that shouldn't be in source.

The `[langchain]` extra previously installed `langchain-openai` alongside `langchain` itself. That conflated the framework with a specific vendor's integration package -- the same kind of vendor-specificity leak we fixed in the conn_type. Users wanting Anthropic, Groq, Mistral AI, etc. would get `langchain-openai` for no reason. - `[langchain]` now installs only `langchain>=1.0.0`. Users install their vendor's LangChain integration package separately (langchain-openai, langchain-anthropic, langchain-groq, etc.). - Drop `langchain-openai` from the `dev` group too. Hook tests mock `init_chat_model` / `init_embeddings`, neither of which imports vendor classes at decorator-resolution time. `langchain` alone is enough for unit tests to pass. - Docs updated to list the per-vendor packages users should install alongside the extra.

Mirrors the pydantic_ai hook docs pattern (and the other operator docs in common.ai): runnable snippets live in an example DAG with START/END markers, and `docs/hooks/langchain.rst` `exampleinclude`s them. The doc prose stops drifting from the code that has to actually work. Adds `example_langchain_hook.py` with four minimal DAGs, one per pattern: - `howto_hook_langchain_chat` -- get_chat_model() + invoke - `howto_hook_langchain_embedding` -- get_embedding_model() + embed_documents - `howto_hook_langchain_chat_and_embedding` -- single hook serves both - `howto_hook_langchain_different_conns` -- explicit embed_conn_id The richer ReAct agent demo stays in `example_langchain_tool_agent.py`.

`hooks/langchain.rst` was an orphan; Sphinx with `-W` would fail with "document isn't included in any toctree". Adds `hooks/index.rst` with a `:glob: *` toctree mirroring the `operators/index.rst` pattern, and points the top-level `Hooks` toctree entry at it. Future hooks added to `hooks/` auto-appear in nav with no top-level edits. `hooks/index.rst` includes a small "Choosing a hook" table covering PydanticAIHook and LangChainHook (MCPHook has no hook guide -- it's documented from the connection page).

LangChain's `BaseMessage.content` is typed `str | list[str | dict]` to support multi-modal responses (text + images + tool calls). The example DAGs only exercise the text-only path, but `summarize() -> str` returning `.content` directly fails mypy with "Incompatible return value type". Wraps `.content` in `str(...)` at the three call sites. The other two sites land inside untyped dicts that mypy doesn't flag, but the consistency matters: the docs reference these snippets via `exampleinclude`, so users copy this code into their own typed `@task` functions. Better to show the pattern that works in both cases.

CI's docs spell-check (sphinxcontrib-spelling) flagged four words not in the global wordlist. Rephrased to use plainer terms instead of padding the wordlist: - "initialisers" / "initializers" -> "entry-point functions" - "dispatchable" -> "accepted" - Dropped the bare verb form "exampleinclude-d" from the example DAG docstring; the docstring now describes the file's content rather than how docs reference it.

…s, cloud URIs Same playbook as #67192 (LangChain) and #67120 (DocumentLoader) plus three LlamaIndex-specific architectural fixes: Critical fixes - Stop mutating LlamaIndex's global ``Settings`` singleton. The previous ``LlamaIndexHook.configure_settings()`` wrote ``Settings.embed_model`` / ``Settings.llm`` process-wide, which leaks across concurrent tasks in the same worker. Replaced with per-call ``embed_model=`` / ``llm=`` parameters on ``VectorStoreIndex(...)`` and ``load_index_from_storage(...)``. - Own ``llamaindex`` connection type instead of squatting on ``pydanticai``. Mirrors the LangChain / CrewAI fix. - Remove ``documents`` from ``EmbeddingOperator.template_fields``. ``list[dict]`` doesn't survive Jinja stringification, and worse, a user document containing literal ``{{ var.value.api_key }}`` would leak secrets into the embedding store. Bind via ``loader.output`` instead. BYO embedding/LLM for non-OpenAI vendors - LlamaIndex doesn't ship an ``init_chat_model`` / ``init_embedding_model`` equivalent (verified in ``llama_index.core.embeddings.utils.resolve_embed_model`` -- only ``"default"`` / ``"local"`` / ``"clip:"`` dispatch). The hook therefore covers OpenAI (matching LlamaIndex's own ``resolve_embed_model("default")`` behaviour) and operators accept a pre-built ``BaseEmbedding`` / ``LLM`` instance to bypass the hook for Cohere / Bedrock / Vertex / HuggingFace / etc. Cloud-URI persistence - ``EmbeddingOperator.persist_dir`` and ``RetrievalOperator.index_persist_dir`` accept storage URIs (``s3://``, ``gs://``, ``azure://``) resolved via ``ObjectStoragePath`` and fsspec, matching the merged ``DocumentLoaderOperator`` pattern. Hook plumbing playbook (mirrors LangChain / CrewAI / DocumentLoader) - ``conn_type = "llamaindex"`` + new ``connection-types`` entry in ``provider.yaml`` with ``embed_model`` / ``llm_model`` conn-fields. - ``default_conn_name`` resolves at runtime via ``llm_conn_id: str | None = None``. - ``_resolve_model`` honours ``conn.extra_dejson`` for parity with the sibling hooks (swallows ``JSONDecodeError``, applies secret masking). - ``get_ui_field_behaviour`` added. - ``[llamaindex]`` extra in ``pyproject.toml`` pinning ``llama-index-core``, ``llama-index-embeddings-openai``, ``llama-index-llms-openai`` (enough to back the hook's default OpenAI return values). Same in the ``dev`` group. Misc operator/test fixes - Wrap lazy ``llama_index`` imports with ``AirflowOptionalProviderFeatureException`` so missing extras surface cleanly. - ``RetrievalOperator`` returns ``{"query": ..., "chunks": [...]}`` (was ``"question"``) and ``chunks[*].node_id`` (was the misleading ``"source"`` key). - ``RetrievalOperator`` raises ``FileNotFoundError`` with a "did you run EmbeddingOperator first?" hint when ``index_persist_dir`` is missing. - All three test files get an autouse fixture stubbing ``llama_index.*`` in ``sys.modules`` so ``@patch`` resolves without ``llama-index-*`` packages installed in CI's non-DB test env (mirrors #67237). - New ``example_llamaindex_hook.py`` with ``[START howto_*]`` markers for the docs to ``exampleinclude``.

@patch

* Add LlamaIndex operators to common.ai provider - Adds LlamaIndexHook to bridge Airflow connections to LlamaIndex's Settings singleton. Reuses the pydanticai connection type, supports separate embedding and LLM connections. - Adds EmbeddingOperator to chunk documents and produce embedding vectors via LlamaIndex's SentenceSplitter. Input is list[dict(text, metadata)] (same shape as DocumentLoaderOperator output), output includes chunks with vectors ready for downstream vector store ingest operators (pgvector, Pinecone, Weaviate). - Adds RetrievalOperator to load a persisted LlamaIndex index and perform similarity search. Output is scored chunks ready for synthesis via LLMOperator. Design notes All LlamaIndex imports are lazy (inside execute() / method bodies), so modules parse without llama-index installed. The hook currently hardcodes OpenAI embedding/LLM providers; a follow-up PR will refactor to use BaseAIHook for provider-agnostic model resolution when it lands. What's included ┌─────────────────────────────────────────┬──────────────────────────────────────────┐ │ File │ Purpose │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ hooks/llamaindex.py │ Hook (~110 lines) │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ operators/llamaindex_embedding.py │ EmbeddingOperator (~110 lines) │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ operators/llamaindex_retrieval.py │ RetrievalOperator (~90 lines) │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ tests/.../test_llamaindex.py │ 12 hook tests │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ tests/.../test_llamaindex_embedding.py │ 10 operator tests │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ tests/.../test_llamaindex_retrieval.py │ 8 operator tests │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ docs/hooks/llamaindex.rst │ Hook docs │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ docs/operators/llamaindex_embedding.rst │ EmbeddingOperator docs │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ docs/operators/llamaindex_retrieval.rst │ RetrievalOperator docs │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ provider.yaml │ Integration, hook, operator registration │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ docs/index.rst │ LlamaIndex Hook in Guides toctree │ ├─────────────────────────────────────────┼──────────────────────────────────────────┤ │ docs/operators/index.rst │ Chooser table rows │ └─────────────────────────────────────────┴──────────────────────────────────────────┘ Test plan - uv run --project providers/common/ai pytest providers/common/ai/tests/unit/common/ai/hooks/test_llamaindex.py -xvs (12 tests) - uv run --project providers/common/ai pytest providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_embedding.py providers/common/ai/tests/unit/common/ai/operators/test_llamaindex_retrieval.py -xvs (18 tests) - Hook: init defaults, separate embed_conn_id, connection kwargs extraction, embedding model, LLM, Settings configuration - EmbeddingOperator: output shape, chunking, index persistence, vector inclusion/omission, splitter params - RetrievalOperator: output shape, chunk keys, top_k forwarding, multiple results, storage context --- Was generative AI tooling used to co-author this PR? - Yes — Claude Code (Opus 4.6) Generated-by: Claude Code (Opus 4.6) following https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions * Refactor LlamaIndex hook + operators: no Settings mutation, BYO models, cloud URIs Same playbook as #67192 (LangChain) and #67120 (DocumentLoader) plus three LlamaIndex-specific architectural fixes: Critical fixes - Stop mutating LlamaIndex's global ``Settings`` singleton. The previous ``LlamaIndexHook.configure_settings()`` wrote ``Settings.embed_model`` / ``Settings.llm`` process-wide, which leaks across concurrent tasks in the same worker. Replaced with per-call ``embed_model=`` / ``llm=`` parameters on ``VectorStoreIndex(...)`` and ``load_index_from_storage(...)``. - Own ``llamaindex`` connection type instead of squatting on ``pydanticai``. Mirrors the LangChain / CrewAI fix. - Remove ``documents`` from ``EmbeddingOperator.template_fields``. ``list[dict]`` doesn't survive Jinja stringification, and worse, a user document containing literal ``{{ var.value.api_key }}`` would leak secrets into the embedding store. Bind via ``loader.output`` instead. BYO embedding/LLM for non-OpenAI vendors - LlamaIndex doesn't ship an ``init_chat_model`` / ``init_embedding_model`` equivalent (verified in ``llama_index.core.embeddings.utils.resolve_embed_model`` -- only ``"default"`` / ``"local"`` / ``"clip:"`` dispatch). The hook therefore covers OpenAI (matching LlamaIndex's own ``resolve_embed_model("default")`` behaviour) and operators accept a pre-built ``BaseEmbedding`` / ``LLM`` instance to bypass the hook for Cohere / Bedrock / Vertex / HuggingFace / etc. Cloud-URI persistence - ``EmbeddingOperator.persist_dir`` and ``RetrievalOperator.index_persist_dir`` accept storage URIs (``s3://``, ``gs://``, ``azure://``) resolved via ``ObjectStoragePath`` and fsspec, matching the merged ``DocumentLoaderOperator`` pattern. Hook plumbing playbook (mirrors LangChain / CrewAI / DocumentLoader) - ``conn_type = "llamaindex"`` + new ``connection-types`` entry in ``provider.yaml`` with ``embed_model`` / ``llm_model`` conn-fields. - ``default_conn_name`` resolves at runtime via ``llm_conn_id: str | None = None``. - ``_resolve_model`` honours ``conn.extra_dejson`` for parity with the sibling hooks (swallows ``JSONDecodeError``, applies secret masking). - ``get_ui_field_behaviour`` added. - ``[llamaindex]`` extra in ``pyproject.toml`` pinning ``llama-index-core``, ``llama-index-embeddings-openai``, ``llama-index-llms-openai`` (enough to back the hook's default OpenAI return values). Same in the ``dev`` group. Misc operator/test fixes - Wrap lazy ``llama_index`` imports with ``AirflowOptionalProviderFeatureException`` so missing extras surface cleanly. - ``RetrievalOperator`` returns ``{"query": ..., "chunks": [...]}`` (was ``"question"``) and ``chunks[*].node_id`` (was the misleading ``"source"`` key). - ``RetrievalOperator`` raises ``FileNotFoundError`` with a "did you run EmbeddingOperator first?" hint when ``index_persist_dir`` is missing. - All three test files get an autouse fixture stubbing ``llama_index.*`` in ``sys.modules`` so ``@patch`` resolves without ``llama-index-*`` packages installed in CI's non-DB test env (mirrors #67237). - New ``example_llamaindex_hook.py`` with ``[START howto_*]`` markers for the docs to ``exampleinclude``. * Rename LlamaIndex operators with framework prefix; fold in #67189 RAG examples Per Kaxil's review r3267387604: ``RetrievalOperator`` / ``EmbeddingOperator`` are too generic in the common.ai namespace -- they risk colliding when other frameworks add their own embedding/retrieval operators. Renamed both with the LlamaIndex prefix: - ``EmbeddingOperator`` -> ``LlamaIndexEmbeddingOperator`` - ``RetrievalOperator`` -> ``LlamaIndexRetrievalOperator`` Renames applied across the two operator modules, three docs RSTs, the two test files, both example DAGs, and the cross-refs in ``docs/operators/index.rst``, ``docs/hooks/llamaindex.rst``, ``docs/operators/document_loader.rst``, and ``docs/hooks/index.rst``. Folds in #67189 (``example_llamaindex_rag.py``) which would otherwise sit blocked waiting for this PR to merge. Rewritten for the new API: - Uses the renamed classes - Drops ``documents="{{ ti.xcom_pull(...) }}"`` Jinja templating (template_fields removed; bind via ``loader.output`` direct) - Switches LlamaIndex operators to ``llamaindex_default`` conn (was ``pydanticai_default``); the synthesis-step ``LLMOperator`` keeps ``pydanticai_default`` because it's pydantic-ai-backed (different framework, intentional split documented in the module docstring) - Adds explicit ``embed_model="text-embedding-3-small"`` to every embedding/retrieval call (new operator validation requires it) - Fixes the string-reference task chains (``load >> "build_index"`` -> ``load >> build_index``) which weren't valid task dependencies Closes #67189. * Address code-review findings on LlamaIndex operators - Fix ObjectStoragePath conn_id mangling: pass raw URI to LlamaIndex persist_dir= and supply target.fs separately. str(target) returns s3://<conn_id>@<bucket>/..., which fsspec misinterprets. - Add documents / embed_model / embed_conn_id to template_fields so XComArg resolution fires. The previous "list[dict] doesn't survive stringification" rationale was wrong; Templater unwraps resolvables before Jinja. - Default llm_conn_id to None on both operators; LlamaIndexHook resolves to default_conn_name at runtime. Hard-coding "llamaindex_default" undid the hook's careful runtime resolution. - Add embed_conn_id pass-through for separate embedding credentials. - Replace isinstance(str) duck-typing with hasattr-based BaseEmbedding check; raise TypeError with a clear pointer instead of letting an unresolved XComArg or random object explode later. - Hoist 'import os' and 'from pathlib import Path' to module top. - Pad RST title underlines and refresh docs/tests to match the new surface. * Fix mypy on LlamaIndex embedding operator - Pass persist_dir as a typed str arg to _persist so the existing None-narrowing # type: ignore comments can go away. - Cast SentenceSplitter nodes to list[TextNode] for the .text access: the splitter only ever returns TextNode, but the base get_nodes_from_documents signature is typed as list[BaseNode]. * Install llama-index in tests instead of stubbing sys.modules llama-index-core / -embeddings-openai / -llms-openai were declared in the common.ai provider's dev dependency group but missing from uv.lock, so CI never actually installed them. The tests papered over that by faking out llama_index.* in sys.modules with MagicMocks. Refresh uv.lock so the packages get installed, then drop the sys.modules manipulation: - test_llamaindex.py: remove the autouse _stub_llama_index_modules fixture entirely; @patch resolves against the real modules. - test_llamaindex_embedding.py / test_llamaindex_retrieval.py: replace the _stub_li fixture (sys.modules setitem) with a smaller _li fixture that uses monkeypatch.setattr against real llama_index.core symbols. * Apply ruff lint/format fixes --------- Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>

vikramkoka added 2 commits May 19, 2026 16:34

vikramkoka requested review from gopidesupavan and kaxil as code owners May 19, 2026 15:43

boring-cyborg Bot added area:providers kind:documentation provider:common-ai labels May 19, 2026

vikramkoka changed the title ~~Aip99 langchain~~ Add Langchain hook to common ai provider May 19, 2026

Prab-27 reviewed May 19, 2026

View reviewed changes

Comment thread providers/common/ai/pyproject.toml

kaxil added 7 commits May 19, 2026 22:01

kaxil approved these changes May 19, 2026

View reviewed changes

kaxil merged commit dcdd124 into main May 19, 2026
8 checks passed

kaxil deleted the aip99-langchain branch May 19, 2026 23:27

This was referenced May 19, 2026

Add error codes mapping with doc pages generation and static check #65423

Draft

Add Apache Arrow provider #52330

Open

Add GCSToAzureBlobStorageOperator for GCS to Azure Blob transfer #64966

Open

kaxil mentioned this pull request May 20, 2026

Fix LangChain hook tests failing when langchain is not installed #67237

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Langchain hook to common ai provider#67192

Add Langchain hook to common ai provider#67192
kaxil merged 9 commits into
mainfrom
aip99-langchain

vikramkoka commented May 19, 2026 •

edited by kaxil

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vikramkoka commented May 19, 2026 • edited by kaxil Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design rationale

Own langchain connection type

Vendor-agnostic dispatch via init_chat_model and init_embeddings

Single hook serves chat + embeddings; optional embed_conn_id

conn.extra_dejson for parity with PydanticAIHook

[langchain] extra is framework-only

Usage

Gotchas

Deferred follow-ups

Was generative AI tooling used to co-author this PR?

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vikramkoka commented May 19, 2026 •

edited by kaxil

Loading

Own `langchain` connection type

Vendor-agnostic dispatch via `init_chat_model` and `init_embeddings`

Single hook serves chat + embeddings; optional `embed_conn_id`

`conn.extra_dejson` for parity with `PydanticAIHook`

`[langchain]` extra is framework-only