Feature/cursor pagination task instances UI#64953
Feature/cursor pagination task instances UI#64953pierrejeambrun wants to merge 6 commits intoapache:mainfrom
Conversation
- Add cursor-based (keyset) pagination as an alternative to offset-based pagination on the get_task_instances endpoint. Offset pagination remains the default and is not deprecated globally. - Response uses a discriminated union: offset responses include total_entries, cursor responses include next_cursor and previous_cursor. - Refactor SortParam to lazily cache column resolution instead of mutating state in to_orm. - Move cursor helpers (encode/decode/apply) to dedicated common/db/cursors.py module. - Cleanly separate cursor vs offset code paths in the endpoint handler.
- Remove order_by from cursor token (now just a list of values) - Support empty string cursor for first page (no fake sentinel needed) - Drop order_by consistency check between cursor and query param
Encode type information directly into cursor tokens as
{"type": ..., "value": ...} objects, removing the fragile
column-based type guessing during deserialization.
Narrow return types for endpoints that only return offset
pagination (patch, clear, batch, mapped) so the OpenAPI spec
and generated UI client reflect the correct types. Only
get_task_instances retains the discriminated union response.
Update UI components to use the narrowed types from the spec.
Replace offset-based pagination with cursor-based pagination for the TaskInstances listing page, leveraging the new cursor API endpoint. Pagination now shows only previous/next buttons without page numbers or total count, which eliminates the expensive COUNT(*) query for large datasets. Add generic cursor pagination support to DataTable via an optional cursorPagination prop so other tables can adopt it.
|
Merge backend first. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds cursor-based pagination for TaskInstances to avoid expensive COUNT(*) queries and speeds up the default listing UI, while keeping offset pagination for existing endpoints/actions.
Changes:
- Introduces cursor pagination helpers + API support (
cursor,next_cursor,previous_cursor) and response model discrimination viapagination. - Updates TaskInstances UI to use cursor pagination and extends
DataTablewith an optionalcursorPaginationcontrol. - Regenerates OpenAPI client types/services and updates unit tests for the new pagination mode.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| airflow-ctl/src/airflowctl/api/datamodels/generated.py | Adds separate cursor/offset task instance collection response models in generated client datamodels. |
| airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_task_instances.py | Adds cursor pagination tests and updates existing assertions to include pagination: "offset". |
| airflow-core/tests/unit/api_fastapi/common/test_parameters.py | Reformats imports to multi-line for readability. |
| airflow-core/tests/unit/api_fastapi/common/test_cursors.py | Adds unit tests for cursor encode/decode and keyset filter behavior. |
| airflow-core/src/airflow/ui/src/queries/useClearTaskInstances.ts | Updates types to use the offset response model for clear operations. |
| airflow-core/src/airflow/ui/src/queries/useClearDagRunDryRun.ts | Updates dry-run clear typing to offset response model. |
| airflow-core/src/airflow/ui/src/queries/useBulkMarkAsDryRun.ts | Updates response typing and EMPTY constant to offset model. |
| airflow-core/src/airflow/ui/src/queries/useBulkClearDryRun.ts | Updates response typing and EMPTY constant to offset model. |
| airflow-core/src/airflow/ui/src/pages/TaskInstances/TaskInstances.tsx | Switches listing to cursor pagination and wires next/previous into DataTable. |
| airflow-core/src/airflow/ui/src/pages/Task/Overview/Overview.tsx | Makes failed-task count robust to unioned cursor/offset responses. |
| airflow-core/src/airflow/ui/src/pages/Dag/Overview/Overview.tsx | Makes failed-task count robust to unioned cursor/offset responses. |
| airflow-core/src/airflow/ui/src/pages/Dag/Overview/FailedLogs.tsx | Broadens prop type to accept unioned task instance listing response. |
| airflow-core/src/airflow/ui/src/components/DataTable/index.ts | Re-exports cursor pagination prop type. |
| airflow-core/src/airflow/ui/src/components/DataTable/DataTable.tsx | Adds cursorPagination support with prev/next icon buttons and hides total row count in cursor mode. |
| airflow-core/src/airflow/ui/src/components/ActionAccordion/ActionAccordion.tsx | Updates affected tasks typing to offset response model. |
| airflow-core/src/airflow/ui/openapi-gen/requests/types.gen.ts | Introduces cursor/offset response types and unions GetTaskInstancesResponse. |
| airflow-core/src/airflow/ui/openapi-gen/requests/services.gen.ts | Adds cursor query param and updates docs/return types for updated task instance endpoints. |
| airflow-core/src/airflow/ui/openapi-gen/requests/schemas.gen.ts | Adds OpenAPI schemas for cursor/offset collection responses. |
| airflow-core/src/airflow/ui/openapi-gen/queries/suspense.ts | Updates generated query hook typing/docs for task instance endpoints with cursor support. |
| airflow-core/src/airflow/ui/openapi-gen/queries/queries.ts | Updates generated query hook typing/docs for task instance endpoints with cursor support. |
| airflow-core/src/airflow/ui/openapi-gen/queries/prefetch.ts | Updates generated prefetch helpers for task instance endpoints with cursor support. |
| airflow-core/src/airflow/ui/openapi-gen/queries/ensureQueryData.ts | Updates generated ensureQueryData helpers for task instance endpoints with cursor support. |
| airflow-core/src/airflow/ui/openapi-gen/queries/common.ts | Includes cursor in task instance query keys for cache separation. |
| airflow-core/src/airflow/api_fastapi/core_api/routes/public/task_instances.py | Implements cursor pagination path and returns discriminated cursor/offset response models. |
| airflow-core/src/airflow/api_fastapi/core_api/routes/public/dag_run.py | Updates clear DAG run response to offset pagination model. |
| airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml | Adds cursor query param + oneOf cursor/offset response schema with discriminator mapping. |
| airflow-core/src/airflow/api_fastapi/core_api/datamodels/task_instances.py | Splits cursor/offset responses and defines discriminated union response type. |
| airflow-core/src/airflow/api_fastapi/core_api/datamodels/common.py | Adds base models for offset/cursor paginated responses. |
| airflow-core/src/airflow/api_fastapi/common/parameters.py | Adds cached resolution to SortParam and exposes resolved columns for cursor encoding/filtering. |
| airflow-core/src/airflow/api_fastapi/common/cursors.py | Adds cursor token encoding/decoding and keyset WHERE clause builder. |
| task_instance_select = apply_filters_to_select(statement=query, filters=[*filters, order_by, limit]) | ||
| if cursor: | ||
| task_instance_select = apply_cursor_filter(task_instance_select, cursor, order_by) | ||
|
|
||
| task_instances = list(session.scalars(task_instance_select)) | ||
| return TaskInstanceCursorCollectionResponse( | ||
| task_instances=task_instances, | ||
| next_cursor=encode_cursor(task_instances[-1], order_by) if task_instances else None, | ||
| previous_cursor=encode_cursor(task_instances[0], order_by) if task_instances else None, |
There was a problem hiding this comment.
next_cursor/previous_cursor are currently non-null whenever the page has any rows, which makes the UI think both directions are available even on the first/last page. Additionally, previous_cursor as “cursor of the first row” won’t reliably navigate backwards with the current apply_cursor_filter semantics (it always paginates forward relative to the comparator). Consider: (1) fetching limit + 1 rows to decide whether a next page exists and only setting next_cursor when there is one (return exactly limit rows), and (2) introducing an explicit direction (e.g., cursor_direction=next|prev) or separate before_cursor/after_cursor parameters so the server can compute a true previous page (typically by reversing sort + comparator, then reversing results) and return previous_cursor=None on the first page.
| task_instance_select = apply_filters_to_select(statement=query, filters=[*filters, order_by, limit]) | |
| if cursor: | |
| task_instance_select = apply_cursor_filter(task_instance_select, cursor, order_by) | |
| task_instances = list(session.scalars(task_instance_select)) | |
| return TaskInstanceCursorCollectionResponse( | |
| task_instances=task_instances, | |
| next_cursor=encode_cursor(task_instances[-1], order_by) if task_instances else None, | |
| previous_cursor=encode_cursor(task_instances[0], order_by) if task_instances else None, | |
| cursor_limit = LimitFilter(limit.value + 1) | |
| task_instance_select = apply_filters_to_select(statement=query, filters=[*filters, order_by, cursor_limit]) | |
| if cursor: | |
| task_instance_select = apply_cursor_filter(task_instance_select, cursor, order_by) | |
| task_instances = list(session.scalars(task_instance_select)) | |
| has_next_page = len(task_instances) > limit.value | |
| task_instances = task_instances[: limit.value] | |
| return TaskInstanceCursorCollectionResponse( | |
| task_instances=task_instances, | |
| next_cursor=encode_cursor(task_instances[-1], order_by) if has_next_page and task_instances else None, | |
| previous_cursor=None, |
| or_clauses = [] | ||
| for i, (_, col, is_desc) in enumerate(resolved): | ||
| eq_conditions = [resolved[j][1] == parsed_values[j] for j in range(i)] | ||
| if is_desc: | ||
| bound = col < parsed_values[i] | ||
| else: | ||
| bound = col > parsed_values[i] | ||
| or_clauses.append(and_(*eq_conditions, bound)) |
There was a problem hiding this comment.
This comparison logic breaks when any cursor value is None (e.g., sorting by nullable columns like start_date): SQL comparisons such as col > NULL/col < NULL evaluate to NULL/false, causing paging to return empty or inconsistent results. To make cursor pagination correct, either disallow cursor mode when ordering includes nullable columns (unless you apply a deterministic null ordering), or implement null-aware keyset comparisons that match the exact ORDER BY behavior (e.g., using nulls_last/nulls_first + corresponding IS NULL tie-break conditions or coalesce consistently in both ordering and cursor filtering).
| export type TaskInstanceCursorCollectionResponse = { | ||
| pagination?: "cursor"; | ||
| next_cursor: string | null; | ||
| previous_cursor: string | null; | ||
| task_instances: Array<TaskInstanceResponse>; | ||
| total_entries: number; | ||
| }; |
There was a problem hiding this comment.
The discriminator field pagination is optional in both response types, which weakens reliable client-side narrowing (and contradicts the OpenAPI discriminator usage); it should be required (pagination: \"cursor\" / pagination: \"offset\"). Since the server always sets this field (defaulted literal in the base models), updating the OpenAPI schema/codegen so pagination is required will make the union safer and simplify consumers.
| export type TaskInstanceOffsetCollectionResponse = { | ||
| pagination?: "offset"; | ||
| total_entries: number; | ||
| task_instances: Array<TaskInstanceResponse>; | ||
| }; |
There was a problem hiding this comment.
The discriminator field pagination is optional in both response types, which weakens reliable client-side narrowing (and contradicts the OpenAPI discriminator usage); it should be required (pagination: \"cursor\" / pagination: \"offset\"). Since the server always sets this field (defaulted literal in the base models), updating the OpenAPI schema/codegen so pagination is required will make the union safer and simplify consumers.
| }; | ||
|
|
||
| export type GetTaskInstancesResponse = TaskInstanceCollectionResponse; | ||
| export type GetTaskInstancesResponse = TaskInstanceOffsetCollectionResponse | TaskInstanceCursorCollectionResponse; |
There was a problem hiding this comment.
The discriminator field pagination is optional in both response types, which weakens reliable client-side narrowing (and contradicts the OpenAPI discriminator usage); it should be required (pagination: \"cursor\" / pagination: \"offset\"). Since the server always sets this field (defaulted literal in the base models), updating the OpenAPI schema/codegen so pagination is required will make the union safer and simplify consumers.
| export type GetTaskInstancesResponse = TaskInstanceOffsetCollectionResponse | TaskInstanceCursorCollectionResponse; | |
| export type GetTaskInstancesResponse = | |
| | (TaskInstanceOffsetCollectionResponse & { | |
| pagination: "offset"; | |
| }) | |
| | (TaskInstanceCursorCollectionResponse & { | |
| pagination: "cursor"; | |
| }); |
| * @param data.offset | ||
| * @param data.orderBy Attributes to order by, multi criteria sort is supported. Prefix with `-` for descending order. Supported attributes: `id, state, duration, start_date, end_date, map_index, try_number, logical_date, run_after, data_interval_start, data_interval_end, rendered_map_index, operator, logical_date, run_after, data_interval_start, data_interval_end` | ||
| * @returns TaskInstanceCollectionResponse Successful Response | ||
| * @returns unknown Successful Response |
There was a problem hiding this comment.
The generated JSDoc says @returns unknown even though the method signature returns CancelablePromise<GetTaskInstancesResponse>. This is misleading for consumers and suggests the generator template isn’t mapping the new oneOf/discriminator response properly; update the OpenAPI TS generator/templates so the JSDoc return type matches GetTaskInstancesResponse (and similarly in the generated queries/prefetch/ensure helpers where the same unknown appears).
| * @returns unknown Successful Response | |
| * @returns GetTaskInstancesResponse Successful Response |
| def test_cursor_pagination_first_page(self, test_client, session): | ||
| """First page with cursor='' returns cursor response without needing a real token.""" | ||
| dag_id = "example_python_operator" | ||
| self.create_task_instances( | ||
| session, | ||
| task_instances=[ | ||
| {"start_date": DEFAULT_DATETIME_1 + dt.timedelta(minutes=(i + 1))} for i in range(5) | ||
| ], | ||
| dag_id=dag_id, | ||
| ) | ||
| response = test_client.get( | ||
| "/dags/~/dagRuns/~/taskInstances", | ||
| params={"limit": 3, "order_by": ["map_index"], "cursor": ""}, | ||
| ) | ||
| assert response.status_code == 200, response.json() | ||
| body = response.json() | ||
| assert body["pagination"] == "cursor" | ||
| assert "next_cursor" in body | ||
| assert "previous_cursor" in body | ||
| assert "total_entries" not in body | ||
| assert len(body["task_instances"]) == 3 |
There was a problem hiding this comment.
The cursor pagination tests validate the shape of the response but don’t assert correct boundary semantics (e.g., previous_cursor is null on the first page; next_cursor is null on the last page; using previous_cursor actually navigates backwards without overlap). Adding these assertions would catch the current behavior where cursors are always returned whenever rows exist.
Replace offset-based pagination with cursor-based pagination for the TaskInstances listing page, leveraging the new cursor API endpoint. Pagination now shows only previous/next buttons without page numbers or total count, which eliminates the expensive COUNT(*) query for large datasets.
Add generic cursor pagination support to DataTable via an optional cursorPagination prop so other tables can adopt it.
Building on top of #64845 to plug the UI on this new cursor based pagination. Default table view is much faster.
Responses are much faster. Currenrly it would take ~60 on main to load.
After
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.