Task SDK: Add Variable.keys() to list variable keys by prefix#66022
Task SDK: Add Variable.keys() to list variable keys by prefix#66022junyeong0619 wants to merge 2 commits intoapache:mainfrom
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
|
After investigating #61595, I've identified a significant inconsistency in the Variable API that could lead to user confusion regarding data integrity. Currently, Variable.get() provides a consistent lookup experience by traversing the Cache, Secrets Backend, and Metadata DB. However, due to the lack of an abstracted list method across all secret backends, Variable.list() (and the FastAPI get_variables endpoint) only queries the Metadata DB. This creates a functional inconsistency: Variable.get("secret_key") returns a value from AWS Secrets Manager. Variable.list() does not show "secret_key" because it isn't stored in the DB. To prevent user confusion regarding data integrity, I suggest making this behavior explicit. Renaming the internal or public method to something like list_from_db() would clearly communicate that the results are limited to the metastore. Providing this explicit naming will help users immediately understand why their secret-backed variables are missing from the list without needing to dive into the core source code. I would love to hear your thoughts on this direction. airflow-core/src/airflow/api_fastapi/core_api/routes/public/variables.py:118-127: def get_variables(
limit: QueryLimit,
offset: QueryOffset,
order_by: Annotated[
SortParam,
Depends(
SortParam(
["key", "id", "_val", "description", "is_encrypted", "team_name"],
Variable,
).dynamic_depends()
),
],
readable_variables_filter: ReadableVariablesFilterDep,
session: SessionDep,
variable_key_pattern: QueryVariableKeyPatternSearch,
variable_key_prefix_pattern: QueryVariableKeyPrefixPatternSearch,
) -> VariableCollectionResponse:
"""Get all Variables entries."""
variable_select, total_entries = paginated_select(
statement=select(Variable), # DB-only query, no secrets backend traversal
filters=[variable_key_pattern, variable_key_prefix_pattern, readable_variables_filter],
order_by=order_by,
offset=offset,
limit=limit,
session=session,
)
variables = session.scalars(variable_select)
return VariableCollectionResponse(
variables=variables,
total_entries=total_entries,
) |
7e2f534 to
bc2cc50
Compare
|
Hi @ashb, just rebased onto the latest main — would love to get your thoughts on this when you have a chance! |
f5bd2ca to
330a2eb
Compare
|
Hi @amoghrajesh, thanks for the feedback! I agree the lazy iterator The endpoint is now |
330a2eb to
0bcf88e
Compare
Per dev list feedback, wrap the result in lazy_object_proxy.Proxy so the Execution API call only happens on first access (iteration, indexing, len, etc.) and is cached for subsequent accesses. Matches the pattern already used for template context values. Also clarifies in the docstring that only keys stored in the metadata database are returned — secrets backends are not consulted.
Airflow 3 removed direct ORM access from tasks, which means the old
Variable.all() no longer works. This PR adds Variable.keys() as a lazy
replacement, going through the Execution API like get/set/delete do.
closes: #61166
What
Variable.keys(prefix=None)classmethod to the Task SDK that returnsa lazy proxy over the variable keys (fetched on first access)
GET /variables/keys?prefix=endpoint to the Execution APIScope
Metastore only — secrets backends don't currently expose a listing primitive
in Airflow, and extending
BaseSecretsBackendneeds a separate designdiscussion. Documented in the docstring. This is also distinct from AIP-103
(Task State), which is for tasks persisting their own state across retries.
Why a lazy keys() instead of a list() that returns full Variable objects
Per discussion on the dev list (thread),
returning all variables (key + value) at once doesn't scale well — for large
variable sets the response can be huge, and callers often only need a subset
of values.
keys()returns just the matching keys, and the user can callVariable.get(key)for the values they actually need:How
Follows the exact same layered architecture as the existing
Variable.get():Variable.get()Variable.keys()_get_variable()_get_variable_keys()GetVariableGetVariableKeysVariableResultVariableKeysResulthandle_get_variable()handle_get_variable_keys()VariableOperations.get()VariableOperations.keys()GET /variables/{key}GET /variables/keys?prefix=The keys endpoint uses
SessionDep+ SQLAlchemyselect(Variable.key)directly, consistent with the rest of the Execution API.
All three supervisor types (TaskRunnerSupervisor, triggerer, DFP) are updated.
Testing
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Opus 4.7 following the guidelines