Skip to content

fix(devex): make auto-reviewer assignment resilient to stale team slugs#60033

Merged
webjunkie merged 6 commits into
masterfrom
fix/devex/assign-reviewers-team-validation
May 27, 2026
Merged

fix(devex): make auto-reviewer assignment resilient to stale team slugs#60033
webjunkie merged 6 commits into
masterfrom
fix/devex/assign-reviewers-team-validation

Conversation

@webjunkie
Copy link
Copy Markdown
Contributor

Problem

GitHub's POST /pulls/{n}/requested_reviewers rejects the whole batch with 422 if any team in team_reviewers isn't a collaborator on the repo. A single stale team slug in CODEOWNERS-soft or a product.yaml quietly drops every reviewer (teams and users) for affected PRs. So plenty of PRs landed without auto-assigned reviewers and nobody noticed.

Existing product:lint had an owners check, but it queried /orgs/{org}/teams and silently skipped in CI when the default GITHUB_TOKEN couldn't authenticate to that endpoint — so the rot kept growing.

Changes

Three layers:

Script-level fallbackassign-reviewers.js now catches the 422, splits users out, and retries each team individually. Valid entries still land, bad slugs get logged as cleanup nudges in the action log. Minimal diff.

Lint + path-filtered CI job — new hogli product:lint:owners subcommand validates product.yaml owners against the actual repo-collaborator team set (/repos/{repo}/teams, which works with the default GITHUB_TOKEN). New validate-product-yamls job in ci-backend.yml gated on a products/*/product.yaml paths filter, scoped to the changed yamls so unrelated pre-existing rot can't block a PR.

Data fixes — 6 product.yaml files pointing at non-existent teams: customer_analyticsteam-web-analytics, revenue_analyticsteam-growth, access_controlteam-platform-features, core_eventsteam-web-analytics, business_knowledge + conversationsconversations (real slug had no team- prefix).

Org-admin actions taken alongside this PR (not in diff) — granted push access on PostHog/posthog to 6 existing org teams that lacked repo access: conversations, team-events, team-data-tools, team-managed-warehouse, team-warehouse-sources, team-ai-gateway. These were already real teams with members; they just weren't repo collaborators, which is why the 422 fired.

After both the data fixes and the access grants, hogli product:lint:owners reports 0 failures across all 55 products.

How did you test this code?

I'm an agent and didn't manually open a PR to trigger the workflow. Verified locally:

  • hogli product:lint:owners (full sweep) — 0 failures across 55 products.
  • hogli product:lint:owners customer_analytics revenue_analytics (scoped) — passes; mirrors what CI runs.
  • hogli product:lint:owners access_control customer_analytics (with a known-bad input from before the fix) — fails with exit 1, correct CI behavior.
  • hogli product:lint customer_analytics — no longer hits the GitHub API (owners check removed from default CHECKS).
  • node -c .github/scripts/assign-reviewers.js — syntax valid.
  • python3 -c "import yaml; yaml.safe_load(open('.github/workflows/ci-backend.yml'))" — workflow yaml valid.
  • Re-ran the dorny path-stripping shell under bash (CI uses bash, not zsh) to confirm word-splitting passes positional args correctly.

No new automated tests for the lint subcommand — tools/hogli-commands/hogli_commands/product/ has no existing test module to extend.

Publish to changelog?

no

🤖 Agent context

I (Claude) authored this. Started from Julian's report that the assign-reviewers action 422'd on team-customer-analytics. Tracked the failure mode through several reframes:

  1. Identified the single bad team in that payload via GraphQL + REST cross-check.
  2. Slack-researched the right replacement owners (#project-customer-analytics channel had an explicit "team absorbed into web analytics" announcement; #project-revenue-analytics topic pointed at @rafaaudibert).
  3. Found the existing ProductYamlOwnersCheck was silently skipping in CI due to wrong endpoint + token scope mismatch.
  4. After Julian pushed back on the "what about CODEOWNERS-soft rot?" angle I re-audited and found 5 of the supposedly-dead teams actually exist on the org and just need repo access — much smaller fix than rewriting ownership.
  5. Granted access to those 6 teams via gh api PUT /orgs/PostHog/teams/{slug}/repos/PostHog/posthog -f permission=push, matching what other teams already have.

A few rejected approaches worth noting:

  • Initially used /orgs/PostHog/teams for the lint cache, switched to /repos/PostHog/posthog/teams so it matches the exact set assign-reviewers needs and works with default token scope in CI.
  • Considered a GraphQL pre-check inside assign-reviewers.js to validate all teams up-front in one round trip. Rejected — the simpler 422-fallback gives the same end result without adding a happy-path API call.
  • Considered fixing all 9 stale product.yamls + 3 CODEOWNERS-soft entries in one go — rejected as too much bundled ownership decision-making for a single PR. The path-filtered lint is now the forcing function for incremental cleanup.

GitHub's `POST /pulls/{n}/requested_reviewers` rejects the entire batch
with 422 if any team in team_reviewers isn't a collaborator on the repo.
A single stale slug in CODEOWNERS-soft or a product.yaml has been
silently dropping every reviewer (teams *and* users) for affected PRs.

Three layers of fix:

- assign-reviewers.js: on 422, retry users + each team individually so
  valid entries still land. Bad slugs are logged so they're visible in
  the action log as cleanup nudges.
- hogli product:lint:owners + a paths-filtered CI job validates every
  product.yaml against the actual repo-collaborator team set. Scoped to
  changed yamls so unrelated pre-existing rot doesn't block PRs.
- product.yaml fixes for products that pointed at non-existent teams
  (customer_analytics, revenue_analytics, access_control, core_events,
  business_knowledge, conversations).

Also switched the existing owner check to query /repos/{repo}/teams
instead of /orgs/{org}/teams — it's the exact set assign-reviewers
requires, and works with the default GITHUB_TOKEN so CI no longer
silently skips the check on auth failure.

Org-admin actions taken alongside this PR (not in diff):
granted push access on PostHog/posthog to conversations, team-events,
team-data-tools, team-managed-warehouse, team-warehouse-sources,
team-ai-gateway. These exist as org teams but lacked repo access.
@webjunkie webjunkie marked this pull request as ready for review May 26, 2026 09:43
@webjunkie webjunkie requested a review from a team as a code owner May 26, 2026 09:43
Copilot AI review requested due to automatic review settings May 26, 2026 09:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reviewer auto-assignment reliability by salvaging valid reviewers when GitHub rejects a batch request due to stale team slugs, and adds a CI-gated lint to prevent new stale product.yaml owners from being introduced.

Changes:

  • Add a 422 fallback path in .github/scripts/assign-reviewers.js that retries users and then teams individually, logging dropped stale team slugs.
  • Introduce hogli product:lint:owners (and a path-filtered CI job) to validate product.yaml owners against repo-collaborator GitHub teams via /repos/{repo}/teams.
  • Fix several products/*/product.yaml owner slugs that were pointing to invalid/stale teams.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tools/hogli-commands/hogli_commands/product/lint.py Adds lint_owners entrypoint to run owners validation across all or selected products.
tools/hogli-commands/hogli_commands/product/gh.py Switches team slug fetch to repo-collaborator teams via gh api repos/{repo}/teams.
tools/hogli-commands/hogli_commands/product/cli.py Exposes new product:lint:owners Click command.
tools/hogli-commands/hogli_commands/product/checks.py Updates owners check semantics, skips @username, flags scaffold placeholder, removes default inclusion.
products/revenue_analytics/product.yaml Updates owners to a valid team slug.
products/customer_analytics/product.yaml Updates owners to a valid team slug.
products/core_events/product.yaml Updates owners to a valid team slug.
products/conversations/product.yaml Updates owners to the correct team slug.
products/business_knowledge/product.yaml Updates owners to the correct team slug.
products/access_control/product.yaml Replaces team-CHANGEME placeholder with a real owning team.
hogli.yaml Registers the new product:lint:owners command.
.github/workflows/ci-backend.yml Adds path-filter output + a new validate-product-yamls job gated on products/*/product.yaml changes.
.github/scripts/assign-reviewers.js Implements per-team retry fallback when the bulk reviewers request returns 422.
Comments suppressed due to low confidence (1)

tools/hogli-commands/hogli_commands/product/checks.py:792

  • This change removes the previous behavior of skipping ProductYamlOwnersCheck in CI when GitHub team slugs can’t be fetched (GITHUB_ACTIONS == 'true'). There’s an existing unit test that asserts the old behavior (TestProductYamlOwnersCheck.test_gh_unavailable_skips_in_ci in tools/hogli-commands/hogli_commands/tests/test_checks.py), so this PR will likely make the test suite fail unless it’s updated to expect an error/issue in CI too (or the skip behavior is retained intentionally).

        gh_teams, fetch_err = get_team_slugs()
        if gh_teams is None:
            return CheckResult(
                lines=[f"✗ {fetch_err}"],
                issues=[fetch_err],
                file=f"products/{ctx.name}/product.yaml",
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/hogli-commands/hogli_commands/product/gh.py
Comment on lines +799 to +803
# `@username` entries are individual reviewers, not teams — validating
# them needs a different endpoint, and the assign-reviewers script
# already routes them separately. Skip here.
if owner.startswith("@"):
continue
CI feedback from PR #60033:

- New validate-product-yamls job hit 403 on /repos/{repo}/teams. Default
  GITHUB_TOKEN is repo-scoped and can't list collaborator teams. Reused
  the assign-reviewers GitHub App secrets (same use case, same scope
  needs). Gated on same-repo PRs only — fork PRs can't access app
  secrets, and they can't introduce product.yaml changes that ship to
  master without going through a same-repo PR anyway.

- Removed test_gh_unavailable_skips_in_ci. It asserted the old behavior
  where the check silently skipped on fetch failure when GITHUB_ACTIONS
  was set. That special-case is gone (it was what hid the rot in the
  first place). Renamed the local counterpart to drop the "_locally"
  suffix since the behavior is uniform now.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b3c937eda

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hogli-commands/hogli_commands/product/gh.py Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 26, 2026

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
tools/hogli-commands/hogli_commands/product/gh.py:13-14
The module-level `_fetch_attempted` flag ignores the `repo` parameter on every call after the first. If `get_team_slugs("other/repo")` were ever called in the same process, it would silently return the cached result for `"PostHog/posthog"` instead. The parameter is a dead letter — the docstring says "Cached after the first call" but gives no indication the argument is discarded on repeat calls. Dropping it and using a constant makes the API honest.

```suggestion
_REPO = "PostHog/posthog"

def get_team_slugs() -> tuple[set[str] | None, str]:
    """Fetch the GitHub team slugs that have access to PostHog/posthog. Cached after the first call.
```

### Issue 2 of 2
tools/hogli-commands/hogli_commands/product/lint.py:162-164
When all names in the list are unknown product directories they are skipped with a warning, `targets` ends up empty, and the function exits 0 printing "✓ All 0 product owners are valid PostHog/posthog teams". This looks like a passing check when it validated nothing. If the sed/xargs pipeline in CI ever produces an unexpected slug, the step would silently green.

```suggestion
    if names and not targets:
        click.echo("⚠  None of the provided names matched a product directory — nothing to validate")
        raise SystemExit(1)

    click.echo(f"Validating product.yaml owners across {scope} against PostHog/posthog teams\n")

    for product_dir in targets:
```

Reviews (1): Last reviewed commit: "fix(devex): give validate-product-yamls ..." | Re-trigger Greptile

Comment thread tools/hogli-commands/hogli_commands/product/gh.py Outdated
Comment on lines +162 to +164
click.echo(f"Validating product.yaml owners across {scope} against PostHog/posthog teams\n")

for product_dir in targets:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 When all names in the list are unknown product directories they are skipped with a warning, targets ends up empty, and the function exits 0 printing "✓ All 0 product owners are valid PostHog/posthog teams". This looks like a passing check when it validated nothing. If the sed/xargs pipeline in CI ever produces an unexpected slug, the step would silently green.

Suggested change
click.echo(f"Validating product.yaml owners across {scope} against PostHog/posthog teams\n")
for product_dir in targets:
if names and not targets:
click.echo("⚠ None of the provided names matched a product directory — nothing to validate")
raise SystemExit(1)
click.echo(f"Validating product.yaml owners across {scope} against PostHog/posthog teams\n")
for product_dir in targets:
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/hogli-commands/hogli_commands/product/lint.py
Line: 162-164

Comment:
When all names in the list are unknown product directories they are skipped with a warning, `targets` ends up empty, and the function exits 0 printing "✓ All 0 product owners are valid PostHog/posthog teams". This looks like a passing check when it validated nothing. If the sed/xargs pipeline in CI ever produces an unexpected slug, the step would silently green.

```suggestion
    if names and not targets:
        click.echo("⚠  None of the provided names matched a product directory — nothing to validate")
        raise SystemExit(1)

    click.echo(f"Validating product.yaml owners across {scope} against PostHog/posthog teams\n")

    for product_dir in targets:
```

How can I resolve this? If you propose a fix, please make it concise.

The 403 persisted because actions/create-github-app-token defaults to a
repo-scoped token even when the app has org-level permissions. Passing
`owner: PostHog` (via github.repository_owner) explicitly scopes the
issued token at the org so the app's `members: read` permission flows
through, which is what /repos/{repo}/teams needs.
@webjunkie webjunkie requested a review from a team May 26, 2026 10:17
webjunkie added 2 commits May 26, 2026 12:24
The /repos/{repo}/teams endpoint kept returning 403 with the
assign-reviewers app token even after scoping at the org level. That
endpoint needs a different combination of permissions than the app
currently has.

/orgs/{org}/teams only needs `members: read`, which the app definitely
has (it's required for POST team_reviewers to resolve team slugs).
Trade-off: the lint now validates "team exists in the PostHog org" but
not "team has access to this repo". The latter case (team exists but
lacks repo access) is covered defensively by assign-reviewers.js's 422
fallback — it retries each team individually and logs the bad slugs so
they're visible as cleanup nudges.
Two follow-ups from Greptile:

- gh.get_team_slugs no longer takes a parameter — the module-level cache
  meant the arg was silently ignored after the first call, which would
  return the wrong slug set if anyone ever called it for a different
  org. Hardcoded to PostHog (the only org this lives in) so the API is
  honest about what it does.
- lint_owners now fails loud when the caller passed positional names
  but none matched a product directory. Previously exited 0 with
  "All 0 products valid", which would silently green a CI bug if the
  sed/xargs pipeline ever produced an unexpected slug.
Copy link
Copy Markdown
Member

@Piccirello Piccirello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement

@webjunkie webjunkie enabled auto-merge (squash) May 27, 2026 06:29
@webjunkie webjunkie merged commit 013a7e5 into master May 27, 2026
216 checks passed
@webjunkie webjunkie deleted the fix/devex/assign-reviewers-team-validation branch May 27, 2026 06:52
@deployment-status-posthog
Copy link
Copy Markdown

deployment-status-posthog Bot commented May 27, 2026

Deploy status

Environment Status Deployed At Workflow
dev ✅ Deployed 2026-05-27 07:13 UTC Run
prod-us ✅ Deployed 2026-05-27 07:25 UTC Run
prod-eu ✅ Deployed 2026-05-27 07:27 UTC Run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants