Skip to content

Fix/import error total entries#67856

Closed
GayathriSrividya wants to merge 3 commits into
apache:mainfrom
GayathriSrividya:fix/import-error-total-entries
Closed

Fix/import error total entries#67856
GayathriSrividya wants to merge 3 commits into
apache:mainfrom
GayathriSrividya:fix/import-error-total-entries

Conversation

@GayathriSrividya
Copy link
Copy Markdown
Contributor

@GayathriSrividya GayathriSrividya commented Jun 1, 2026

closes: #67525

When a single Python file contains multiple DAGs, GET /api/v2/importErrors returns an inflated total_entries count — one entry per DAG instead of one per file. This causes pagination to be incorrect and leads to duplicate display of the same import error.

Root cause: The query joined ImportError with DagModel via the DagWarning association, producing one row per DAG per import error. The total_entries count was taken from this joined result, multiplying the actual number of distinct import errors by the number of DAGs in the file.

Fix: Use .distinct() on the base query so total_entries reflects unique import error rows. The DAG association query is kept separate (used only for populating the filename/stacktrace fields), ensuring the count is always one-per-file regardless of how many DAGs share that file.

Comment thread airflow-core/newsfragments/67525.bugfix.rst Outdated
@Pedrinhonitz
Copy link
Copy Markdown
Contributor

I believe this PR has a similar idea to this one #67550

@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries branch from 1c6f1ec to 3adc6c1 Compare June 1, 2026 19:17
@GayathriSrividya
Copy link
Copy Markdown
Contributor Author

I believe this PR has a similar idea to this one #67550

Thanks for pointing this out.

I checked #67550, and I agree both PRs address the same underlying issue from #67525. I opened this PR because the bug is not only about the inflated total_entries; pagination also needs to happen on distinct ParseImportError objects before fetching the joined DAG rows.

This PR uses a two-step approach:

  1. paginate/count using a deduplicated ParseImportError statement, so total_entries, limit, and offset operate on import-error objects rather than joined DAG rows;
  2. fetch the full DAG associations only for the paginated import-error IDs, so authorization/redaction logic still has the DAG context;
  3. restore the final response order using the paginated ID order after the second query.

The regression test also covers both the normal request and limit=1 for a single import-error file mapped to multiple DAGs.

I understand #67550 was opened earlier and has a very similar direction. If maintainers prefer that PR, I’m happy to close this one. But if the order-preserving two-query shape or the added test coverage here is useful, I’m happy to adapt this PR based on your feedback.

@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries branch 5 times, most recently from d366425 to 0209607 Compare June 2, 2026 08:04
@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries branch 2 times, most recently from 35de696 to b23b5d5 Compare June 2, 2026 16:31
Gayathri Srividya Rajavarapu added 3 commits June 3, 2026 08:28
… multiple DAGs

When a single import-error file mapped to N DAGs, the previous query
JOINed ParseImportError with file_dags_cte producing N rows per error.
paginated_select then counted those N rows, inflating total_entries and
applying LIMIT/OFFSET against joined rows rather than distinct errors.

Fix uses a two-query approach:
1. dedup_stmt with DISTINCT - one row per import error for correct count
   and pagination via paginated_select
2. import_errors_stmt - full join only for the paginated IDs to gather
   dag_id associations for authorization/stacktrace redaction

Closes apache#67525
Add keyword-only marker (*,) before session parameter in the new
import_error_with_multiple_dags fixture. The check-no-new-provide-session-positional
prek hook requires all @provide_session-decorated functions to declare session
as keyword-only.
@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries branch from b23b5d5 to f919a12 Compare June 3, 2026 02:58
@GayathriSrividya
Copy link
Copy Markdown
Contributor Author

Closing this as it duplicates my earlier PR #67640, which targets the same issue (#67525) with the same two-query approach. Since #67550 has already received a maintainer approval from @pierrejeambrun and been assigned to the Airflow 3.2.3 milestone, I'm closing both my PRs to let that one proceed.


Drafted-by: GitHub Copilot (Claude Sonnet 4.6); reviewed by @GayathriSrividya before posting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/api/v2/importErrors inflates total_entries when one import-error file maps to multiple DAGs

2 participants