Skip to content

Task SDK: Add Variable.keys() to list variable keys by prefix#66022

Open
junyeong0619 wants to merge 2 commits intoapache:mainfrom
junyeong0619:feat/variable-search
Open

Task SDK: Add Variable.keys() to list variable keys by prefix#66022
junyeong0619 wants to merge 2 commits intoapache:mainfrom
junyeong0619:feat/variable-search

Conversation

@junyeong0619
Copy link
Copy Markdown

@junyeong0619 junyeong0619 commented Apr 28, 2026

Airflow 3 removed direct ORM access from tasks, which means the old
Variable.all() no longer works. This PR adds Variable.keys() as a lazy
replacement, going through the Execution API like get/set/delete do.

closes: #61166

What

  • Add Variable.keys(prefix=None) classmethod to the Task SDK that returns
    a lazy proxy over the variable keys (fetched on first access)
  • Add GET /variables/keys?prefix= endpoint to the Execution API
  • Register the new endpoint as API version 2026-04-28

Scope

Metastore only — secrets backends don't currently expose a listing primitive
in Airflow, and extending BaseSecretsBackend needs a separate design
discussion. Documented in the docstring. This is also distinct from AIP-103
(Task State), which is for tasks persisting their own state across retries.

Why a lazy keys() instead of a list() that returns full Variable objects

Per discussion on the dev list (thread),
returning all variables (key + value) at once doesn't scale well — for large
variable sets the response can be huge, and callers often only need a subset
of values. keys() returns just the matching keys, and the user can call
Variable.get(key) for the values they actually need:

for key in Variable.keys(prefix="team_a_config_"):
    val = Variable.get(key)

How

Follows the exact same layered architecture as the existing Variable.get():

Layer get keys
Public API Variable.get() Variable.keys()
Execution bridge _get_variable() _get_variable_keys()
Comms request GetVariable GetVariableKeys
Comms response VariableResult VariableKeysResult
Shared handler handle_get_variable() handle_get_variable_keys()
SDK client VariableOperations.get() VariableOperations.keys()
Execution API route GET /variables/{key} GET /variables/keys?prefix=

The keys endpoint uses SessionDep + SQLAlchemy select(Variable.key)
directly, consistent with the rest of the Execution API.

All three supervisor types (TaskRunnerSupervisor, triggerer, DFP) are updated.

Testing

  • airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_variables.py — TestGetVariableKeys: 4 cases (no-prefix, with-prefix, no-match, empty DB)
  • airflow-core/tests/unit/api_fastapi/execution_api/versions/v2026_04_28/test_variables.py — endpoint not available in previous version
  • task-sdk/tests/task_sdk/definitions/test_variables.py — TestVariableKeys: 3 parametrized cases (verifying lazy fetch) + caching test
  • task-sdk/tests/task_sdk/execution_time/test_supervisor.py — GetVariableKeys request/response handling

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Opus 4.7

Generated-by: Claude Opus 4.7 following the guidelines

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented Apr 28, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@junyeong0619
Copy link
Copy Markdown
Author

junyeong0619 commented Apr 29, 2026

After investigating #61595, I've identified a significant inconsistency in the Variable API that could lead to user confusion regarding data integrity.

Currently, Variable.get() provides a consistent lookup experience by traversing the Cache, Secrets Backend, and Metadata DB. However, due to the lack of an abstracted list method across all secret backends, Variable.list() (and the FastAPI get_variables endpoint) only queries the Metadata DB.

This creates a functional inconsistency:

Variable.get("secret_key") returns a value from AWS Secrets Manager.

Variable.list() does not show "secret_key" because it isn't stored in the DB.

To prevent user confusion regarding data integrity, I suggest making this behavior explicit. Renaming the internal or public method to something like list_from_db() would clearly communicate that the results are limited to the metastore.

Providing this explicit naming will help users immediately understand why their secret-backed variables are missing from the list without needing to dive into the core source code.

I would love to hear your thoughts on this direction.

airflow-core/src/airflow/api_fastapi/core_api/routes/public/variables.py:118-127:

def get_variables(
  limit: QueryLimit,
  offset: QueryOffset,
  order_by: Annotated[
      SortParam,
      Depends(
          SortParam(
              ["key", "id", "_val", "description", "is_encrypted", "team_name"],
              Variable,
          ).dynamic_depends()
      ),
  ],
  readable_variables_filter: ReadableVariablesFilterDep,
  session: SessionDep,
  variable_key_pattern: QueryVariableKeyPatternSearch,
  variable_key_prefix_pattern: QueryVariableKeyPrefixPatternSearch,
) -> VariableCollectionResponse:
  """Get all Variables entries."""
  variable_select, total_entries = paginated_select(
      statement=select(Variable), # DB-only query, no secrets backend traversal
      filters=[variable_key_pattern, variable_key_prefix_pattern, readable_variables_filter],
      order_by=order_by,
      offset=offset,
      limit=limit,
      session=session,
  )

  variables = session.scalars(variable_select) 

  return VariableCollectionResponse(
      variables=variables,
      total_entries=total_entries,
  )

@junyeong0619 junyeong0619 force-pushed the feat/variable-search branch from 7e2f534 to bc2cc50 Compare May 5, 2026 01:43
@junyeong0619
Copy link
Copy Markdown
Author

junyeong0619 commented May 5, 2026

Hi @ashb, just rebased onto the latest main — would love to get your thoughts on this when you have a chance!

@junyeong0619 junyeong0619 changed the title Task SDK: Add Variable.list() to list variables by prefix Task SDK: Add Variable.keys() to list variable keys by prefix May 5, 2026
@junyeong0619 junyeong0619 force-pushed the feat/variable-search branch from f5bd2ca to 330a2eb Compare May 5, 2026 09:38
@junyeong0619
Copy link
Copy Markdown
Author

Hi @amoghrajesh, thanks for the feedback! I agree the lazy iterator
pattern scales much better for large variable sets. I've updated the
PR to use Variable.keys(prefix=...) returning a list[str] of keys
instead, and the user can call Variable.get(key) for the values they
actually need:

for key in Variable.keys(prefix="team_a_config_"):
    val = Variable.get(key)

The endpoint is now GET /variables/keys?prefix= returning just keys.
Would appreciate another look when you have time!

@junyeong0619 junyeong0619 force-pushed the feat/variable-search branch from 330a2eb to 0bcf88e Compare May 5, 2026 11:02
@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 5, 2026
Per dev list feedback, wrap the result in lazy_object_proxy.Proxy so the
Execution API call only happens on first access (iteration, indexing,
len, etc.) and is cached for subsequent accesses. Matches the pattern
already used for template context values.

Also clarifies in the docstring that only keys stored in the metadata
database are returned — secrets backends are not consulted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:DAG-processing area:task-sdk area:Triggerer ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task SDK: no supported way to list Variable keys/values (replacement for ORM Variable.all) in Airflow 3

2 participants