Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -837,6 +837,7 @@ instanceTemplates
InstanceType
instanceType
instantiation
InstructionPart
integrations
ints
intvl
Expand Down
1 change: 1 addition & 0 deletions providers/common/ai/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ Dependent package
================================================================================================================== =================
`apache-airflow-providers-common-compat <https://airflow.apache.org/docs/apache-airflow-providers-common-compat>`_ ``common.compat``
`apache-airflow-providers-common-sql <https://airflow.apache.org/docs/apache-airflow-providers-common-sql>`_ ``common.sql``
`apache-airflow-providers-git <https://airflow.apache.org/docs/apache-airflow-providers-git>`_ ``git``
`apache-airflow-providers-standard <https://airflow.apache.org/docs/apache-airflow-providers-standard>`_ ``standard``
================================================================================================================== =================

Expand Down
2 changes: 1 addition & 1 deletion providers/common/ai/docs/operators/agent.rst
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ Parameters
- ``output_type``: Expected output type (default: ``str``). Set to a Pydantic
``BaseModel`` for structured output.
- ``toolsets``: List of pydantic-ai toolsets (``SQLToolset``, ``HookToolset``,
etc.).
``AgentSkillsToolset`` for :ref:`agent-skills`, etc.).
- ``enable_tool_logging``: Wrap each toolset in
:class:`~airflow.providers.common.ai.toolsets.logging.LoggingToolset` so that
every tool call is logged in real time. Default ``True``.
Expand Down
95 changes: 95 additions & 0 deletions providers/common/ai/docs/toolsets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,101 @@ This works because PydanticAI's MCP server classes implement
code instead of being managed through Airflow connections and secret backends.


.. _agent-skills:

``AgentSkillsToolset``
----------------------

:class:`~airflow.providers.common.ai.toolsets.skills.AgentSkillsToolset` loads
`Agent Skills <https://agentskills.io>`__ -- ``SKILL.md`` bundles (instructions,
and optionally scripts and resources) that the model discovers and loads *on
demand*. Only a compact catalog of skill names and descriptions sits in the
prompt until the model decides it needs one, so a large skill library costs few
tokens until used (progressive disclosure).

It is backed by the community `pydantic-ai-skills
<https://github.com/DougTrajano/pydantic-ai-skills>`__ package (MIT); native
progressive disclosure is in flight upstream in `pydantic/pydantic-ai#5230
<https://github.com/pydantic/pydantic-ai/pull/5230>`__. Install the optional
extra to use it:

.. code-block:: bash

pip install "apache-airflow-providers-common-ai[skills]"

Each source is a local directory or a connection-resolved
:class:`~airflow.providers.common.ai.skills.GitSkills`. Sources are resolved when
the agent enters the toolset, on the worker -- never while the DAG processor
parses the file -- so a Git token is never baked into the serialized DAG, and
cloned repositories are removed when the run ends.

A local directory of ``SKILL.md`` bundles:

.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_agent_skills.py
:language: python
:start-after: [START howto_operator_agent_skills_local]
:end-before: [END howto_operator_agent_skills_local]

A Git repository, with credentials from an Airflow connection:

.. exampleinclude:: /../../ai/src/airflow/providers/common/ai/example_dags/example_agent_skills.py
:language: python
:start-after: [START howto_operator_agent_skills_git]
:end-before: [END howto_operator_agent_skills_git]

For a private repository, point ``conn_id`` at a
:doc:`git connection <apache-airflow-providers-git:connections/git>`; credentials
are resolved through the Git provider's ``GitHook`` (an HTTPS token in the
connection password, or an SSH key in the connection's extra). A plain ``http://``
URL with ``conn_id`` is rejected so a credential is never sent in cleartext, and a
``repo_url`` that embeds a username/password is rejected (use ``conn_id``). After
cloning, the credential is stripped from the checkout's ``.git/config``. As with
any ``git clone``, the worker's own git configuration (credential helpers, SSH
agent) may still apply, so run workers without ambient git credentials if you
need strict isolation.

.. warning::

Skill bundles can contain scripts that the agent may run on the worker via
the ``run_skill_script`` tool. For a remote source, anyone who can modify the
repository can introduce code that executes on your worker, outside DAG
review and versioning. Point ``GitSkills`` at a trusted repository, pin
``branch`` to a trusted ref, and treat skill contents as code that runs in
your environment.

Parameters
^^^^^^^^^^

- ``sources``: List of skill sources -- local directory paths and/or
:class:`~airflow.providers.common.ai.skills.GitSkills`.
- ``exclude_tools``: Optional set of skill tool names to hide from the agent
(e.g. ``{"run_skill_script"}`` to disable on-worker script execution).

Using Agent Skills with other frameworks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``AgentSkillsToolset`` is a standard pydantic-ai toolset, so it also works with a
plain ``pydantic_ai.Agent`` you build yourself, not just ``AgentOperator``.

Because Agent Skills is a cross-framework format, the connection handling is also
reusable through :func:`~airflow.providers.common.ai.skills.resolve_skills`, which
resolves sources to local ``SKILL.md`` directories that any loader accepts:

.. code-block:: python

from airflow.providers.common.ai.skills import GitSkills, resolve_skills

sources = ["./skills", GitSkills(repo_url="https://github.com/org/skills", conn_id="github_skills")]
with resolve_skills(sources) as dirs:
# LangChain DeepAgents
agent = create_deep_agent(model="openai:gpt-5.4", skills=dirs)
# ...or Strands
agent = Agent(plugins=[AgentSkills(skills=dirs)])

``resolve_skills`` needs the Git provider (for ``GitSkills``) but not pydantic-ai,
and removes any cloned directories when the ``with`` block exits.


Security
--------

Expand Down
14 changes: 14 additions & 0 deletions providers/common/ai/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,15 @@ dependencies = [
"google" = ["pydantic-ai-slim[google]"]
"openai" = ["pydantic-ai-slim[openai]"]
"mcp" = ["pydantic-ai-slim[mcp]"]
# Agent Skills (agentskills.io) support. pydantic-ai-skills provides the toolset
# (pulls in pydantic-ai-slim>=1.74 transitively; the provider base floor stays
# 1.71); the git provider supplies GitHook + GitPython for cloning GitSkills with
# credentials from a `git` connection. Native progressive disclosure is tracked
# upstream in pydantic/pydantic-ai#5230; revisit this extra once that lands.
"skills" = [
"apache-airflow-providers-git>=0.4.0",
"pydantic-ai-skills>=0.11.0",
]
"avro" = [
'fastavro>=1.10.0; python_version < "3.14"',
'fastavro>=1.12.1; python_version >= "3.14"',
Expand All @@ -105,6 +114,9 @@ dependencies = [
]
"pdf" = ["pypdf>=4.0.0"]
"docx" = ["python-docx>=1.0.0"]
"git" = [
"apache-airflow-providers-git"
]

[dependency-groups]
dev = [
Expand All @@ -113,10 +125,12 @@ dev = [
"apache-airflow-devel-common",
"apache-airflow-providers-common-compat",
"apache-airflow-providers-common-sql",
"apache-airflow-providers-git",
"apache-airflow-providers-standard",
# Additional devel dependencies (do not remove this line and add extra development dependencies)
"sqlglot>=30.0.0",
"pydantic-ai-slim[mcp]",
"pydantic-ai-skills>=0.11.0",
"apache-airflow-providers-common-sql[datafusion]",
"langchain>=1.0.0",
"llama-index-core>=0.13.0",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""Example DAGs demonstrating Agent Skills with ``AgentOperator``.

`Agent Skills <https://agentskills.io>`__ are ``SKILL.md`` bundles the model
discovers and loads on demand (progressive disclosure). They are passed to the
agent as an ``AgentSkillsToolset`` in the operator's ``toolsets=`` list. Skill
sources are resolved when the task runs, on the worker (not while the DAG
processor parses the file), so a Git token resolved from an Airflow connection
is never baked into the serialized DAG.

These DAGs need the optional ``skills`` extra::

pip install "apache-airflow-providers-common-ai[skills]"
"""

from __future__ import annotations

from pathlib import Path

from airflow.providers.common.ai.operators.agent import AgentOperator
from airflow.providers.common.ai.skills import GitSkills
from airflow.providers.common.ai.toolsets.skills import AgentSkillsToolset
from airflow.providers.common.ai.toolsets.sql import SQLToolset
from airflow.providers.common.compat.sdk import dag

# Skills ship next to this DAG file; resolve relative to __file__ so the path
# holds regardless of the dag-processor's working directory.
SKILLS_DIR = Path(__file__).parent / "skills"


# ---------------------------------------------------------------------------
# 1. Local filesystem skills (a directory of SKILL.md bundles)
# ---------------------------------------------------------------------------


# [START howto_operator_agent_skills_local]
@dag(tags=["example"])
def example_agent_skills_local():
AgentOperator(
task_id="reporter",
prompt="How many orders did our top 5 customers place last month?",
llm_conn_id="pydanticai_default",
system_prompt="You are a data analyst. Consult your skills before writing SQL.",
toolsets=[
AgentSkillsToolset(sources=[str(SKILLS_DIR)]),
SQLToolset(
db_conn_id="postgres_default",
allowed_tables=["customers", "orders"],
max_rows=50,
),
],
)


# [END howto_operator_agent_skills_local]

example_agent_skills_local()


# ---------------------------------------------------------------------------
# 2. Remote skills from a Git repo, credentials from an Airflow connection
# ---------------------------------------------------------------------------
# ``github_skills`` is a git connection (HTTPS token in the password, or an SSH
# key in the extra). The DAG only references it by id; no credential is inlined.


# [START howto_operator_agent_skills_git]
@dag(tags=["example"])
def example_agent_skills_git():
AgentOperator(
task_id="support_agent",
prompt="Summarize our refund policy and apply it to order 12345.",
llm_conn_id="pydanticai_default",
system_prompt="You are a support agent. Load the relevant skill before answering.",
toolsets=[
AgentSkillsToolset(
sources=[
GitSkills(
repo_url="https://github.com/my-org/agent-skills",
conn_id="github_skills",
path="skills",
),
],
),
],
)


# [END howto_operator_agent_skills_git]

example_agent_skills_git()
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
name: sql-reporting
description: Conventions and review steps for writing analytics SQL against the warehouse. Use whenever the task involves querying tables, building a report, or aggregating metrics.
license: Apache-2.0
---
<!-- SPDX-License-Identifier: Apache-2.0
https://www.apache.org/licenses/LICENSE-2.0 -->

# SQL Reporting Skill

Apply this skill before writing or running any analytics SQL so reports stay
consistent and safe.

## When to Use This Skill

Use this skill when the task involves:

- Querying warehouse tables for a metric, report, or dashboard figure
- Aggregating rows (counts, sums, rolling windows)
- Cross-referencing two or more tables

## Conventions

1. Always `SELECT` explicit column names, never `SELECT *`.
2. Filter on a partition/date column first to bound the scan.
3. Alias aggregates with snake_case names (`order_count`, not `count(*)`).
4. Cap exploratory queries with `LIMIT` unless an aggregate already collapses
the result set.
5. Prefer `COUNT(DISTINCT ...)` over a sub-query when de-duplicating.

## Review Checklist (run before returning an answer)

- [ ] No `SELECT *`.
- [ ] A date or partition predicate is present.
- [ ] Every aggregate has an explicit alias.
- [ ] The query reads only from tables the task actually needs.

## Output Format

Return the final SQL in a fenced ```sql block, then one sentence describing
what the query returns.
Loading
Loading