Skip to content

Add Agent Skills support to the Common AI provider#67786

Merged
kaxil merged 3 commits into
apache:mainfrom
astronomer:prototype-skills-toolset
May 31, 2026
Merged

Add Agent Skills support to the Common AI provider#67786
kaxil merged 3 commits into
apache:mainfrom
astronomer:prototype-skills-toolset

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented May 30, 2026

Adds Agent Skills support to the common.ai provider as AgentSkillsToolset, a pydantic-ai toolset (alongside SQLToolset, HookToolset, MCPToolset). Skills are SKILL.md bundles the model discovers and loads on demand (progressive disclosure), so a large skill library costs few tokens until a skill is actually used.

from airflow.providers.common.ai.operators.agent import AgentOperator
from airflow.providers.common.ai.skills import GitSkills
from airflow.providers.common.ai.toolsets.skills import AgentSkillsToolset

AgentOperator(
    task_id="agent",
    prompt="...",
    llm_conn_id="pydanticai_default",
    toolsets=[
        AgentSkillsToolset(sources=[
            "./skills",                                   # local SKILL.md directory
            GitSkills(repo_url="https://github.com/my-org/agent-skills",
                      conn_id="github_skills", path="skills"),  # private repo (git connection)
        ]),
    ],
)

Backed by the community pydantic-ai-skills package (MIT), pulled in only through the optional skills extra:

pip install "apache-airflow-providers-common-ai[skills]"

Why

pydantic-ai has no native skills primitive yet; native progressive disclosure is in flight upstream in pydantic/pydantic-ai#5230. This wires the community implementation in behind a small toolset so users get connection-based skill loading today, with a surface that maps onto the native primitive when it lands.

Design notes

  • Resolved at run time, not parse time. The underlying SkillsToolset loads its registries eagerly at construction, so building a Git-backed toolset in the DAG body would clone the repo while the DAG processor parses the file and bake the token into the serialized DAG. AgentSkillsToolset instead resolves connections and clones on __aenter__ (when the agent enters the toolset, on the worker) and removes cloned directories on __aexit__. A Git token is never present in the serialized DAG; only the conn_id is.

  • Reusable beyond the operator. AgentSkillsToolset is a normal pydantic-ai AbstractToolset, so it also works with a raw pydantic_ai.Agent you build yourself (anywhere the Airflow connection backend is reachable). The operator is unchanged, skills are just a toolset.

  • Framework-portable core. Because Agent Skills is a cross-framework format, the connection handling is exposed framework-agnostically through resolve_skills(...), which returns local SKILL.md directories that any loader accepts (it needs only GitPython, no pydantic-ai):

    from airflow.providers.common.ai.skills import GitSkills, resolve_skills
    
    with resolve_skills(["./skills", GitSkills(repo_url="https://...", conn_id="github_skills")]) as dirs:
        create_deep_agent(model="openai:gpt-5.4", skills=dirs)   # LangChain DeepAgents
        Agent(plugins=[AgentSkills(skills=dirs)])                # Strands

    resolve_skills needs the Git provider (for GitSkills) but not pydantic-ai.

  • Why not pydantic-ai-skills' GitSkillsRegistry? It clones git too, but does its own credential handling (a token arg plus a silent GITHUB_TOKEN env fallback, no Airflow-connection or SSH-key support). To take credentials from an Airflow git connection (HTTPS token or SSH key) and to keep resolve_skills() usable by non-pydantic frameworks (it returns plain SKILL.md directories, so LangChain/Strands users don't need pydantic-ai-skills just to clone), the clone goes through GitHook + GitPython instead. The tradeoff is some duplicated clone/scrub logic.

  • Git, local only for now. Object storage (S3/GCS) is deferred so the recursive-download layout and lifecycle can be verified against a real bucket first.

Security and gotchas

  • Skill bundles can contain scripts the agent may run on the worker through pydantic-ai-skills' run_skill_script tool. This keeps the upstream default and is documented: point GitSkills at a trusted repository and pin branch to a trusted ref.
  • GitSkills credentials come from an Airflow git connection resolved through the Git provider's GitHook (HTTPS token in the password, or an SSH key in the extra). Credentials come only from the connection: omitting conn_id gives an anonymous clone, and plain http:// with conn_id is rejected so a credential is never sent in cleartext. After cloning, the token is stripped from the checkout's .git/config. As with any git clone, the worker's own git configuration (credential helpers, SSH agent) can still apply, so run workers without ambient git credentials if you need strict isolation.
  • The skills extra pulls apache-airflow-providers-git (GitHook + GitPython) and pydantic-ai-skills (which requires pydantic-ai-slim>=1.74; the provider base floor stays at 1.71).

Follow-ups

  • Object-storage skill sources (S3/GCS via ObjectStoragePath).
  • Migrate to native pydantic-ai on-demand capabilities once #5230 ships.

@kaxil kaxil force-pushed the prototype-skills-toolset branch from fa2989b to c9a0249 Compare May 30, 2026 21:42
Add AgentSkillsToolset, a pydantic-ai toolset that loads agentskills.io SKILL.md bundles from a local directory or a Git repository. Git credentials come from an Airflow git connection (HTTPS token or SSH key) resolved through the Git provider's GitHook: cleartext http and credential-bearing URLs are rejected, interactive credential prompts are disabled, and the token is stripped from the clone's .git/config. Sources are resolved on the worker when the agent enters the toolset, so a token is never baked into the serialized DAG, and clones are removed when the run ends. Pass it via AgentOperator's toolsets=, or use it with a raw pydantic-ai Agent. The framework-agnostic resolve_skills() helper returns local SKILL.md directories for other Agent Skills loaders (LangChain DeepAgents, Strands).
@kaxil kaxil force-pushed the prototype-skills-toolset branch from c9a0249 to 25903aa Compare May 30, 2026 21:47
Comment thread providers/common/ai/pyproject.toml Outdated
Comment thread providers/common/ai/src/airflow/providers/common/ai/skills.py
@kaxil kaxil merged commit 86d8b47 into apache:main May 31, 2026
140 checks passed
@kaxil kaxil deleted the prototype-skills-toolset branch May 31, 2026 01:07
@github-actions
Copy link
Copy Markdown
Contributor

Backport failed to create: v3-2-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 86d8b47 v3-2-test

This should apply the commit to the v3-2-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants