fix: compute historical metrics percentages from uncapped counts#67549
fix: compute historical metrics percentages from uncapped counts#67549MD-Mushfiqur123 wants to merge 0 commit into
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
henry3260
left a comment
There was a problem hiding this comment.
I think the percentages can still be misleading when any state hits STATE_COUNT_CAP, since the per-state counts remain capped while the total is uncapped.
I'm also not sure we should add potentially expensive uncapped count queries just to make these percentages precise.
|
Friendly ping — this PR is ready for review. All CI checks are passing. |
|
Thanks for the review @henry3260! Re: misleading percentages when capped — The per-state counts are still capped at 1000, and the frontend already shows "N+" when a count reaches that cap (see Re: expensive COUNT() queries* — Both An alternative would be to remove the caps entirely, but that was intentionally avoided to keep the endpoint fast for deployments with millions of task instances. The current approach is a minimal, pragmatic improvement. |
IMO, we cannot assume these queries are cheap just because they are simple My concern is that the existing capped queries are bounded by |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Yes, this count were omitted on purpose. On tables with 10 millions of rows a simple count can be very long depending on the joints, etc... performed.
Current behavior is expected, % are computed based on the front-end information (truncated) returned. We can improve this if that's confusing but computing a real hard count() on the entire table is not a possibility.
That's why we introduced cursor based pagination and reworked this dashboard page. (It would takes seconds to answer on big tables)
|
Thank you for the detailed review @pierrejeambrun. I understand now — the COUNT(*) queries were omitted on purpose due to performance concerns on large tables. I will revert this PR to keep the original behavior. Would it be acceptable to add a |
628f11a to
1d5150e
Compare
Problem
The dashboard "Historical Metrics" section shows incorrect percentages because the API endpoint
/ui/dashboard/historical_metrics_datacaps per-state counts atSTATE_COUNT_CAP = 1000(for performance), but the frontend computes the total as the sum of these capped values and uses it as the denominator for percentage calculations.For example, if
successhas 100,000 task instances and every other state has 0, the API returnssuccess: 1000(capped) while all others are 0. The frontend computestotal = 1000and showssuccess: 100%. With many states hitting the cap, the total itself is wrong and percentages are meaningless.Fix
Return the real uncapped total counts (
dag_run_total_countandtask_instance_total_count) alongside the existing capped per-state counts. The frontend now uses these uncapped totals as the denominator for percentage and progress-bar computations.This is a minimal, backward-compatible change that:
COUNT(*)queries (fast aggregate queries with the same filters)Closes #67336