-
Notifications
You must be signed in to change notification settings - Fork 17.1k
Unify task/asset state storage between Core API and Execution API #67547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
| # under the License. | ||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from typing import Annotated | ||
|
|
||
| from fastapi import Depends, HTTPException, status | ||
|
|
@@ -85,7 +86,9 @@ def list_asset_states( | |
| session=session, | ||
| ) | ||
| rows = session.execute(paginated).all() | ||
| entries = [AssetStateResponse(key=r.key, value=r.value, updated_at=r.updated_at) for r in rows] | ||
| entries = [ | ||
| AssetStateResponse(key=r.key, value=json.loads(r.value), updated_at=r.updated_at) for r in rows | ||
| ] | ||
| return AssetStateCollectionResponse(asset_states=entries, total_entries=total_entries) | ||
|
|
||
|
|
||
|
|
@@ -115,7 +118,7 @@ def get_asset_state( | |
| status_code=status.HTTP_404_NOT_FOUND, | ||
| detail=f"Asset state key {key!r} not found", | ||
| ) | ||
| return AssetStateResponse(key=row.key, value=row.value, updated_at=row.updated_at) | ||
| return AssetStateResponse(key=row.key, value=json.loads(row.value), updated_at=row.updated_at) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above — no non-JSON rows can exist since both write paths always do json.dumps(...) before storing. |
||
|
|
||
|
|
||
| @asset_state_router.put( | ||
|
|
@@ -131,7 +134,7 @@ def set_asset_state( | |
| session: SessionDep, | ||
| ) -> None: | ||
| """Set an asset state value. Creates or overwrites the key.""" | ||
| get_state_backend().set(AssetScope(asset_id=asset_id), key, body.value, session=session) | ||
| get_state_backend().set(AssetScope(asset_id=asset_id), key, json.dumps(body.value), session=session) | ||
|
|
||
|
|
||
| @asset_state_router.delete( | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
| # under the License. | ||
| from __future__ import annotations | ||
|
|
||
| import json | ||
| from typing import Annotated | ||
|
|
||
| from fastapi import Depends, HTTPException, Query, status | ||
|
|
@@ -87,7 +88,9 @@ def list_task_states( | |
| ) | ||
| rows = session.execute(paginated).all() | ||
| entries = [ | ||
| TaskStateResponse(key=r.key, value=r.value, updated_at=r.updated_at, expires_at=r.expires_at) | ||
| TaskStateResponse( | ||
| key=r.key, value=json.loads(r.value), updated_at=r.updated_at, expires_at=r.expires_at | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In a list endpoint this is worse than in Suggest wrapping the decode in a try/except (skip+log, or return a sentinel) so a single legacy/odd row doesn't 500 the whole listing. Same pattern needed at
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3.3 hasn't shipped, so the pre-#67418 row scenario cannot exist in any deployed cluster. For the custom backend concern: the core API currently calls
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Both write paths (task execution and core rest API) guarantee the DB always contains valid JSON: the execution API does |
||
| ) | ||
| for r in rows | ||
| ] | ||
| return TaskStateCollectionResponse(task_states=entries, total_entries=total_entries) | ||
|
|
@@ -127,7 +130,7 @@ def get_task_state( | |
| detail=f"Task state key {key!r} not found", | ||
| ) | ||
| return TaskStateResponse( | ||
| key=row.key, value=row.value, updated_at=row.updated_at, expires_at=row.expires_at | ||
| key=row.key, value=json.loads(row.value), updated_at=row.updated_at, expires_at=row.expires_at | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. No such rows can exist since both write paths (execution API and Core API PUT) always store |
||
| ) | ||
|
|
||
|
|
||
|
|
@@ -162,7 +165,7 @@ def set_task_state( | |
| ) | ||
| scope = _get_scope(dag_id, dag_run_id, task_id, map_index) | ||
| try: | ||
| get_state_backend().set(scope, key, body.value, session=session) | ||
| get_state_backend().set(scope, key, json.dumps(body.value), session=session) | ||
| except ValueError as e: | ||
| raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=str(e)) from e | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
json.loads500 risk noted ontask_state.py:92-- one bad row in the asset's state table will 500 the whole list endpoint. Worth defensive decoding here, especially since asset state is meant to be a long-lived watermark (so legacy rows from pre-#67418 are more likely to still be around than transient task state).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as task state — both write paths always store json.dumps(...), so no non-JSON rows can exist. The long-lived watermark concern doesn't apply since 3.3 hasn't shipped yet.