Skip to content

Fix for duplicate Rookie authors selection in Top Rookie PR list#65914

Merged
potiuk merged 1 commit into
apache:mainfrom
Srabasti:fix_multiple_rookies
Apr 27, 2026
Merged

Fix for duplicate Rookie authors selection in Top Rookie PR list#65914
potiuk merged 1 commit into
apache:mainfrom
Srabasti:fix_multiple_rookies

Conversation

@Srabasti
Copy link
Copy Markdown
Contributor

This is a fix to ensure duplicate authors do not show up in Top Rookie PR list.

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Sonnet 4.6 Adaptive

Fix duplicate rookie authors in Top Rookie PR list

Problem

When running the script in --rookie mode, the same author can appear multiple times in the Top N list. For example, in the April 2026 run, X appeared at both position 10 and 11:

This is unfair to other rookie contributors — a single prolific rookie can dominate the list, crowding out other genuine newcomers.

Root Cause

The final selection in main() uses heapq.nlargest on scores with no per-author constraint:

# Applied to both rookie and non-rookie mode — no author deduplication
top_final = heapq.nlargest(top_number, scores.items(), key=lambda x: x[1])

Fix

In --rookie mode, enforce a maximum of 1 PR per author when selecting the Top N, while leaving the non-rookie path completely unchanged:

Why this is safe

  • The else branch is identical to the original heapq.nlargest - non-rookie mode is completely unchanged
  • pr_stat.author consistently uses pr_data["author"]["login"] (GitHub login) throughout the script — no mismatch risk
  • No new imports required — set and sorted are Python builtins
  • Only one line in main() is replaced — no other logic is touched

Impact

Each rookie author can appear at most once in the Top N list, ensuring the award highlights a broader set of new contributors rather than being dominated by a single active newcomer.

I have tested local, works fine before and after fix runs. Local pre-commit checks do not give errors related to this PR.

@potiuk potiuk merged commit eac6ee6 into apache:main Apr 27, 2026
140 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

Backport successfully created: v3-2-test

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-2-test PR Link

github-actions Bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Apr 27, 2026
(cherry picked from commit eac6ee6)

Co-authored-by: Srabasti Banerjee <srabasti_b@ymail.com>
aws-airflow-bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Apr 27, 2026
(cherry picked from commit eac6ee6)

Co-authored-by: Srabasti Banerjee <srabasti_b@ymail.com>
vatsrahul1001 pushed a commit that referenced this pull request Apr 27, 2026
(cherry picked from commit eac6ee6)

Co-authored-by: Srabasti Banerjee <srabasti_b@ymail.com>
potiuk pushed a commit that referenced this pull request May 1, 2026
(cherry picked from commit eac6ee6)

Co-authored-by: Srabasti Banerjee <srabasti_b@ymail.com>
potiuk pushed a commit that referenced this pull request May 2, 2026
(cherry picked from commit eac6ee6)

Co-authored-by: Srabasti Banerjee <srabasti_b@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants