Fix/import error total entries#67856
Conversation
|
I believe this PR has a similar idea to this one #67550 |
1c6f1ec to
3adc6c1
Compare
Thanks for pointing this out. I checked #67550, and I agree both PRs address the same underlying issue from #67525. I opened this PR because the bug is not only about the inflated This PR uses a two-step approach:
The regression test also covers both the normal request and I understand #67550 was opened earlier and has a very similar direction. If maintainers prefer that PR, I’m happy to close this one. But if the order-preserving two-query shape or the added test coverage here is useful, I’m happy to adapt this PR based on your feedback. |
d366425 to
0209607
Compare
|
Yes, just referencing it to facilitate the review of your request. I also noticed that the PR is duplicated with #67640. I also ask that you ensure the success of the quality tests performed on the request.
|
35de696 to
b23b5d5
Compare
… multiple DAGs When a single import-error file mapped to N DAGs, the previous query JOINed ParseImportError with file_dags_cte producing N rows per error. paginated_select then counted those N rows, inflating total_entries and applying LIMIT/OFFSET against joined rows rather than distinct errors. Fix uses a two-query approach: 1. dedup_stmt with DISTINCT - one row per import error for correct count and pagination via paginated_select 2. import_errors_stmt - full join only for the paginated IDs to gather dag_id associations for authorization/stacktrace redaction Closes apache#67525
Add keyword-only marker (*,) before session parameter in the new import_error_with_multiple_dags fixture. The check-no-new-provide-session-positional prek hook requires all @provide_session-decorated functions to declare session as keyword-only.
b23b5d5 to
f919a12
Compare
|
Closing this as it duplicates my earlier PR #67640, which targets the same issue (#67525) with the same two-query approach. Since #67550 has already received a maintainer approval from @pierrejeambrun and been assigned to the Airflow 3.2.3 milestone, I'm closing both my PRs to let that one proceed. Drafted-by: GitHub Copilot (Claude Sonnet 4.6); reviewed by @GayathriSrividya before posting |
closes: #67525
When a single Python file contains multiple DAGs, GET /api/v2/importErrors returns an inflated total_entries count — one entry per DAG instead of one per file. This causes pagination to be incorrect and leads to duplicate display of the same import error.
Root cause: The query joined ImportError with DagModel via the DagWarning association, producing one row per DAG per import error. The total_entries count was taken from this joined result, multiplying the actual number of distinct import errors by the number of DAGs in the file.
Fix: Use .distinct() on the base query so total_entries reflects unique import error rows. The DAG association query is kept separate (used only for populating the filename/stacktrace fields), ensuring the count is always one-per-file regardless of how many DAGs share that file.