Add PolicyExposureReport to AgentOperator by gopidesupavan · Pull Request #64433 · apache/airflow

gopidesupavan · 2026-03-29T22:09:41Z

What

Implemented a first pass of policy exposure reporting for common.ai AgentOperator.

This adds a structured, XCom-backed PolicyExposureReport snapshot for each task run so users can inspect the configured AI access surface without reading operator code. The report captures task identity, LLM metadata, attached toolset exposure summaries, runtime notes, and a deterministic risk summary.

For each task the output is stored in the airflow_common_ai_policy_exposure xcom key

Ideally this kind of view is very useful in large entierprises if anyone is using llm its easy way to track what exactly the run uses configs instead of in the logs or input from the dag.

eg:

{
  "llm": {
    "model_id": "google-gla:gemini-2.5-pro",
    "llm_conn_id": "pydanticai_default",
    "connection_type": "pydanticai"
  },
  "risk": {
    "level": "low",
    "reasons": [
      "configured access includes runtime controls"
    ]
  },
  "task": {
    "dag_id": "example_data_analyst_agent",
    "run_id": "manual__2026-03-29T22:09:00.476052+00:00",
    "task_id": "analyze",
    "map_index": -1,
    "operator_type": "AgentOperator"
  },
  "approval": {
    "enable_hitl_review": true,
    "max_hitl_iterations": 5
  },
  "toolsets": [
    {
      "summary": "Read-only SQL access to selected database tables.",
      "resources": [
        {
          "name": "postgres_default",
          "details": {},
          "category": "database",
          "access_mode": "read"
        },
        {
          "name": "customers",
          "details": {},
          "category": "table",
          "access_mode": "read"
        },
        {
          "name": "orders",
          "details": {},
          "category": "table",
          "access_mode": "read"
        }
      ],
      "risk_flags": [],
      "toolset_id": "sql-postgres_default",
      "toolset_type": "SQLToolset"
    }
  ],
  "captured_at": "2026-03-29T22:09:03.444516Z",
  "runtime_notes": [
    "tool logging enabled",
    "human-in-the-loop review enabled"
  ]
}

Was generative AI tooling used to co-author this PR?

Yes (please specify the tool below)

Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
When adding dependency, check compliance with the ASF 3rd Party License Policy.
For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

kaxil · 2026-04-01T01:46:22Z

+        ti = context["task_instance"]
+        llm_exposure = LLMExposure(
+            llm_conn_id=self.llm_conn_id,
+            connection_type=self.llm_hook.conn_type,


self.llm_hook.conn_type triggers the @cached_property, which resolves the Airflow connection and instantiates the hook. If the connection is misconfigured, the outer try/except swallows the error and the task then fails inside _build_agent() with a different traceback. The user sees both a "Failed to build policy exposure report" warning and the actual connection error, which can be confusing.

Consider wrapping just the conn_type access:

try: connection_type = self.llm_hook.conn_type except Exception: connection_type = None

Or just use self.llm_conn_id here and skip resolving the connection type in the report.

kaxil · 2026-04-01T01:46:22Z

+            )
+        ]
+        for table_name in sorted(self._allowed_tables or ()):
+            resources.append(ResourceExposure(category="table", name=table_name, access_mode="read"))


Per-table resources are always access_mode="read", even when self._allow_writes is True. The database-level entry correctly says "read_write", but the individual table entries are inconsistent. Should this be:

access_mode="read_write" if self._allow_writes else "read"

kaxil · 2026-04-01T01:46:22Z

+    """Return True when a hook method name suggests write-like side effects."""
+    return any(
+        token in method_name.lower()
+        for token in ("create", "delete", "drop", "update", "insert", "write", "post", "put", "patch")


This misses some common mutating patterns: remove, send, execute, upload, publish. For example, S3Hook.delete_objects would match, but S3Hook.upload_file or SlackHook.send_message would not.

kaxil · 2026-04-01T01:46:22Z

+    if runtime_notes:
+        return PolicyRiskSummary(level="low", reasons=["configured access includes runtime controls"])


The reason "configured access includes runtime controls" fires when runtime_notes has entries like "tool logging enabled" or "durable replay enabled". These are observability features, not governance controls. Something like "no external tool access configured" (same as the no-runtime-notes branch) might be more accurate here, since the risk level is "low" either way.

Copilot

Pull request overview

Adds a first-pass “Policy Exposure Report” for common.ai’s AgentOperator, persisting a structured, configuration-derived snapshot of LLM/tool access to XCom for governance/observability.

Changes:

Introduces PolicyExposureReport Pydantic models plus helpers to summarize toolset/resource exposure and deterministically classify risk.
Writes the policy exposure snapshot to XCom (airflow_common_ai_policy_exposure) at the start of AgentOperator.execute(), best-effort.
Implements describe_policy_exposure() across common toolsets (SQL/MCP/Hook/DataFusion) with unit tests and docs.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
providers/common/ai/src/airflow/providers/common/ai/utils/policy_exposure.py	New models/helpers for toolset exposure, report structure, and deterministic risk classification.
providers/common/ai/src/airflow/providers/common/ai/operators/agent.py	Builds and pushes the report to XCom at task start.
providers/common/ai/src/airflow/providers/common/ai/toolsets/sql.py	Adds exposure description for SQL toolset (db/tables + write flags).
providers/common/ai/src/airflow/providers/common/ai/toolsets/mcp.py	Adds exposure description for MCP server + prefix.
providers/common/ai/src/airflow/providers/common/ai/toolsets/hook.py	Adds exposure description for hook methods + mutating-method heuristics.
providers/common/ai/src/airflow/providers/common/ai/toolsets/datafusion.py	Adds exposure description for datasources/URIs + wildcard flags.
providers/common/ai/tests/unit/common/ai/utils/test_policy_exposure.py	New unit tests for unwrap/describe fallback and risk classification logic.
providers/common/ai/tests/unit/common/ai/toolsets/test_sql.py	Tests `SQLToolset.describe_policy_exposure()`.
providers/common/ai/tests/unit/common/ai/toolsets/test_mcp.py	Tests `MCPToolset.describe_policy_exposure()`.
providers/common/ai/tests/unit/common/ai/toolsets/test_hook.py	Tests `HookToolset.describe_policy_exposure()` and mutating flags.
providers/common/ai/tests/unit/common/ai/toolsets/test_datafusion.py	Tests `DataFusionToolset.describe_policy_exposure()` and wildcard flags.
providers/common/ai/tests/unit/common/ai/operators/test_agent.py	Extends operator tests to validate report XCom push and runtime notes.
providers/common/ai/tests/unit/common/ai/decorators/test_agent.py	Adjusts decorator tests to provide a `task_instance` context for XCom push.
providers/common/ai/docs/operators/agent.rst	Documents the new XCom-backed policy exposure report.
providers/common/ai/AGENTS.md	Adds guidance to implement `describe_policy_exposure()` on toolsets.

Copilot · 2026-04-02T00:48:23Z

+
+
+def test_describe_toolset_exposure_uses_base_toolset_for_wrappers():
+    wrapped = MagicMock()


MagicMock() is created without a spec, which can hide bugs by allowing any attribute/method access. Consider using MagicMock(spec=["describe_policy_exposure", "id"]) (or a concrete toolset class) so the mock matches the expected toolset surface.

Suggested change

wrapped = MagicMock()

wrapped = MagicMock(spec=["describe_policy_exposure", "id"])

Copilot · 2026-04-02T00:48:23Z

+    ti.run_id = "run_1"
+    ti.task_id = "test"
+    ti.map_index = map_index
+    ti.xcom_push = MagicMock()


ti is already created with a spec that includes xcom_push, so reassigning ti.xcom_push = MagicMock() is redundant and also creates an unspecced mock. Prefer relying on the existing ti.xcom_push mock created by MagicMock(spec=...), or assign a specced callable mock if you need custom behavior.

Suggested change

ti.xcom_push = MagicMock()

Copilot · 2026-04-02T00:48:23Z

+            ),
+            patch("airflow.providers.common.ai.durable.storage._get_base_path"),
+            patch("pydantic_ai.models.infer_model", autospec=True, return_value=MagicMock()),
+            patch("pydantic_ai.models.wrapper.infer_model", side_effect=lambda model: model),
+        ):


The patched infer_model uses return_value=MagicMock() without a spec, which can mask interface mismatches with the real model object. Use a specced mock (or a lightweight fake) that matches the attributes accessed by CachingModel/agent.override in this code path.

Copilot · 2026-04-02T00:48:24Z

+    ti.run_id = "run_1"
+    ti.task_id = "test"
+    ti.map_index = -1
+    ti.xcom_push = MagicMock()


Similar to the operator tests, ti is created with a spec that already includes xcom_push, so reassigning ti.xcom_push = MagicMock() is redundant and leaves you with an unspecced callable mock. Prefer relying on the existing ti.xcom_push mock from MagicMock(spec=...), or replace it with a specced callable if needed.

Suggested change

ti.xcom_push = MagicMock()

Copilot · 2026-04-02T00:48:24Z

+            resources.append(
+                ResourceExposure(
+                    category="uri",
+                    name=config.uri,
+                    access_mode="read_write" if self._allow_writes else "read",
+                    details={"table_name": config.table_name},
+                )


describe_policy_exposure() currently stores the raw config.uri value into the XCom-backed policy exposure report. URIs can sometimes include sensitive details (e.g. bucket/key names, local paths, or embedded credentials/query params depending on scheme), and XCom values may be broadly visible. Consider redacting or normalizing URIs before persisting (e.g., strip credentials/query params, or only store scheme + bucket/table identifier).

Copilot · 2026-04-02T00:48:24Z

+    describe = getattr(base_toolset, "describe_policy_exposure", None)
+    if callable(describe):
+        try:
+            exposure = describe()
+            if isinstance(exposure, ToolsetExposure):
+                return exposure
+            return ToolsetExposure(
+                toolset_type=type(base_toolset).__name__,
+                toolset_id=_get_toolset_id(base_toolset),
+                summary="Toolset returned an invalid policy exposure report.",
+                risk_flags=["invalid toolset exposure report"],
+            )
+        except Exception:
+            return ToolsetExposure(
+                toolset_type=type(base_toolset).__name__,
+                toolset_id=_get_toolset_id(base_toolset),
+                summary="Toolset exposure details are unavailable because report generation failed.",
+                risk_flags=["toolset exposure report failed"],
+            )


describe_toolset_exposure() swallows exceptions from describe_policy_exposure() and returns a fallback exposure, but it does not log the failure. The docs for the policy exposure report state that toolset report generation failures are logged; consider logging the exception here (or re-raising and logging at the caller) so operators have visibility into why a toolset’s exposure report was unavailable.

Copilot · 2026-04-02T00:48:24Z

+            resources.append(
+                ResourceExposure(
+                    category="uri",
+                    name=config.uri,
+                    access_mode="read_write" if self._allow_writes else "read",
+                    details={"table_name": config.table_name},
+                )
+            )


DataSourceConfig.uri is optional for catalog-managed/table-provider formats (e.g. Iceberg) and may be an empty string. This implementation always adds a ResourceExposure(category="uri", name=config.uri, ...), which can yield empty/meaningless URI entries in the report. Consider only emitting a uri resource when config.uri is non-empty (and/or adding a different resource describing the catalog/table identifier from config.options when is_table_provider is true).

Suggested change

resources.append(

ResourceExposure(

category="uri",

name=config.uri,

access_mode="read_write" if self._allow_writes else "read",

details={"table_name": config.table_name},

)

)

if config.uri:

resources.append(

ResourceExposure(

category="uri",

name=config.uri,

access_mode="read_write" if self._allow_writes else "read",

details={"table_name": config.table_name},

)

)

gopidesupavan requested a review from kaxil as a code owner March 29, 2026 22:09

boring-cyborg bot added area:providers provider:common-ai labels Mar 29, 2026

gopidesupavan force-pushed the add-policy-exposer-report branch from 462dab1 to f94014a Compare March 29, 2026 22:10

Add PolicyExposureReport to AgentOperator

659064b

gopidesupavan force-pushed the add-policy-exposer-report branch from f94014a to 659064b Compare March 29, 2026 22:11

gopidesupavan marked this pull request as draft March 30, 2026 21:06

kaxil reviewed Apr 1, 2026

View reviewed changes

kaxil requested a review from Copilot April 2, 2026 00:42

Copilot started reviewing on behalf of kaxil April 2, 2026 00:42 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PolicyExposureReport to AgentOperator#64433

Add PolicyExposureReport to AgentOperator#64433
gopidesupavan wants to merge 1 commit intoapache:mainfrom
gopidesupavan:add-policy-exposer-report

gopidesupavan commented Mar 29, 2026 •

edited

Loading

Uh oh!

kaxil Apr 1, 2026

Uh oh!

kaxil Apr 1, 2026

Uh oh!

kaxil Apr 1, 2026

Uh oh!

kaxil Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if runtime_notes:
		return PolicyRiskSummary(level="low", reasons=["configured access includes runtime controls"])



		def test_describe_toolset_exposure_uses_base_toolset_for_wrappers():
		wrapped = MagicMock()

	wrapped = MagicMock()
	wrapped = MagicMock(spec=["describe_policy_exposure", "id"])

Conversation

gopidesupavan commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Was generative AI tooling used to co-author this PR?

Uh oh!

kaxil Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

kaxil Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gopidesupavan commented Mar 29, 2026 •

edited

Loading