AIP-99: Add DataFusionToolset#62850
Conversation
ab2578f to
7aac74a
Compare
7aac74a to
5076eaf
Compare
|
should we create new toolset? for objectstores eg ObjectStoreSQLToolSet(SQLToolset)? so that the current SQLToolset will be much cleaner? |
yes, separate please |
2c3c1e6 to
312c826
Compare
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Show resolved
Hide resolved
ee6fc35 to
77320ad
Compare
|
ooo my bad old habbits debuging .. |
There was a problem hiding this comment.
Pull request overview
Adds a new DataFusionToolset to the Common AI provider, enabling pydantic-ai agents to discover tables, inspect schemas, and run SQL queries against object-store-backed datasets via Apache DataFusion (through DataFusionEngine from providers-common-sql).
Changes:
- Introduces
DataFusionToolsetwithlist_tables,get_schema, andquerytools, plus lazyDataFusionEngineinitialization. - Adds unit tests validating tool registration, tool behavior, and engine lazy creation/caching.
- Extends toolset documentation to include
DataFusionToolsetand its security posture.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py |
Implements the new DataFusion-backed toolset and tool call dispatch. |
providers/common/ai/tests/unit/common/ai/toolsets/test_datafusion.py |
Adds unit coverage for initialization, tool behavior, errors, and engine resolution. |
providers/common/ai/docs/toolsets.rst |
Documents DataFusionToolset, its parameters, and security defaults. |
Comments suppressed due to low confidence (1)
providers/common/ai/docs/toolsets.rst:31
- The intro now says “Three toolsets are included”, but the bullet list still only includes HookToolset and SQLToolset. Please add a DataFusionToolset bullet and update the nearby wording (e.g. “Both implement…”) so the section stays consistent.
Three toolsets are included:
- :class:`~airflow.providers.common.ai.toolsets.hook.HookToolset` — generic
adapter for any Airflow Hook.
- :class:`~airflow.providers.common.ai.toolsets.sql.SQLToolset` — curated
You can also share your feedback on Copilot code review. Take the survey.
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Outdated
Show resolved
Hide resolved
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py
Outdated
Show resolved
Hide resolved
* Add objectstorage support to SQLToolset via DataFusion * Add DataFusionToolset * Update tests * Resolve comments * Resolve comments * Resolve comments

Summary
Add DataFusionToolset to accept datasource_configs to work with objectstores enabling LLM agents to query files on object stores (S3, local filesystem, Iceberg) through Apache DataFusion.
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.