Skip to content

ops(registry): backfill stale member_profiles.agents types + crawler reclassify-on-disagreement#3541

Merged
EmmaLouise2018 merged 2 commits into
mainfrom
EmmaLouise2018/registry-backfill-stale-types
Apr 30, 2026
Merged

ops(registry): backfill stale member_profiles.agents types + crawler reclassify-on-disagreement#3541
EmmaLouise2018 merged 2 commits into
mainfrom
EmmaLouise2018/registry-backfill-stale-types

Conversation

@EmmaLouise2018
Copy link
Copy Markdown
Contributor

@EmmaLouise2018 EmmaLouise2018 commented Apr 29, 2026

Refs #3538.

PR #3498 added resolveAgentTypes() server-side, but it only runs on writes (POST/PUT to /api/me/member-profile). Rows saved before #3498 never get re-evaluated. The crawler's type-update path at crawler.ts:580 only wrote back when the stored type was missing — once any non-unknown value was set, the row was frozen, so Bidcliq and Swivel (registered as 'buying' while being sales agents) cannot self-correct.

This is the cleanup for Problem 1 in #3538.

Crawler type-update policy (server/src/crawler.ts)

Old: write back only when no stored type and inferred is non-unknown.

New:

  • Promote when stored is missing OR stored is 'unknown' AND inferred is non-unknown. Same intent as before, broadened to cover the 'unknown' case that was previously frozen.
  • Log on disagreement (stored non-unknown ≠ inferred non-unknown). Do NOT auto-flip — single probes can be wrong; auto-flipping would corrupt good rows on a transient bad probe. Operator runs the backfill explicitly to reconcile.

Backfill script (server/scripts/backfill-member-agent-types.ts)

Walks every member_profiles row, calls resolveAgentTypes() on its agents[], writes back any agent whose stored type disagrees with the snapshot's inferred type. Idempotent. Has a --dry-run mode.

Operator runbook

Owner: member-tools / registry oncall. Runs from a workstation with DATABASE_URL set to the target environment (staging first, then prod). The script is read/write — same machine that runs migrations.

Expected diff size today:

  • Bidcliq, Swivel are the known cases: 'buying''sales'. Both are member-registered, both have agent_capabilities_snapshot rows that infer sales. Two flips minimum.
  • Possible additional flips: any other member-registered agent saved before fix(registry): server-side agent-type resolution at registration #3498 whose snapshot disagrees with its stored type. Order of magnitude: < 10 across the current member fleet.
  • If the dry-run reports > 25 flips, stop and investigate before running for real. Something else is going on (probe drift, mass mis-classification, classification-inference change). Operator escalates rather than absorbing the diff.

Where output goes:

  • Stdout from the script — capture to a file for the run record (e.g., 2026-04-29-staging-backfill.log).
  • Paste the captured stdout into a comment on this PR after the staging run, and into the prod-deployment ticket / runbook entry after the prod run.
  • Future audit ("when did Bidcliq flip?") relies on this paper trail until the audit-log table from ops(registry): backfill writes type-reclassification audit log #3550 lands.

Procedure:

# 1. Dry-run on staging
DATABASE_URL=<staging> npx tsx server/scripts/backfill-member-agent-types.ts --dry-run | tee 2026-04-29-staging-dryrun.log

# 2. Eyeball the diff. Expected ≈ Bidcliq + Swivel + 0–8 long-tail.
#    If >25 disagreements or any unexpected agent_url surfaces, STOP.

# 3. Real run on staging
DATABASE_URL=<staging> npx tsx server/scripts/backfill-member-agent-types.ts | tee 2026-04-29-staging-real.log

# 4. Spot-check the public registry — Bidcliq + Swivel render as sales now.

# 5. Repeat 1–4 against prod with prod DATABASE_URL.

Pre-run checklist:

  • Dry-run output captured.
  • Dry-run diff matches expected set ± documented surprises (no unexpected agent_url, no > 25 flips, every flip explainable).
  • Surprises documented in PR comment / ticket before proceeding.
  • Real run only after staging run validated and prod backup window aligns with normal practice.

Export

resolveAgentTypes is now exported from server/src/routes/member-profiles.ts so the script can reuse it. Pushing the abstraction up rather than duplicating it across the script + the write path.

Test plan

  • New: server/tests/unit/crawler-type-update-policy.test.ts — pins the promote/disagreement matrix. 5/5 pass.
  • npx tsc --noEmit -p server/tsconfig.json — clean.
  • Pre-commit hooks green.
  • Operator: run --dry-run on staging per the runbook above.

Stack ordering

Recommended merge order from #3538: 3540 → 3542 → 3543 → 3541. Backfill ships last so it's the most-deliberate step — runs against the wire-corrected codebase from #3540 and the docs-explained surface from #3542 / #3543.

Out of scope

…reclassify-on-disagreement

Refs #3538.

PR #3498 added `resolveAgentTypes()` server-side, but it only runs on writes
(POST/PUT to /api/me/member-profile). Rows saved before #3498 never get
re-evaluated. The crawler's type-update path at `crawler.ts:580` only wrote
back when the stored type was missing — once any non-unknown value was set,
the row was frozen.

This is the cleanup for Problem 1 in #3538.

## Crawler type-update policy (crawler.ts)

Old: write back only when no stored type and inferred is non-unknown.

New:
- Promote when stored is missing OR stored is 'unknown' AND inferred is
  non-unknown. Same intent as before, broadened to cover the 'unknown' case
  that was previously frozen.
- Log a warning on disagreement (stored non-unknown != inferred non-unknown).
  Do NOT auto-flip — single probes can be wrong; auto-flipping would corrupt
  good rows on a transient bad probe. Operator runs the backfill explicitly
  to reconcile.

## Backfill script (server/scripts/backfill-member-agent-types.ts)

Walks every `member_profiles` row, calls `resolveAgentTypes()` on its
`agents[]`, writes back any agent whose stored type disagrees with the
snapshot's inferred type. Idempotent. Has a `--dry-run` mode.

```
npx tsx server/scripts/backfill-member-agent-types.ts --dry-run
npx tsx server/scripts/backfill-member-agent-types.ts
```

## Export

`resolveAgentTypes` is now exported from `member-profiles.ts` so the script
can reuse it. The backfill is the same logic as the write path; pushing the
abstraction up rather than duplicating it.

## Test plan

- New: `server/tests/unit/crawler-type-update-policy.test.ts` — pins the
  promote/disagreement matrix. 5/5 pass.
- `npx tsc --noEmit -p server/tsconfig.json` — clean.

## Operator note

Run `--dry-run` first on staging to see the diff, then again on prod.
Bidcliq and Swivel ('buying' but actually sales) are the known cases.
@bokelley
Copy link
Copy Markdown
Contributor

Issue #3550 proposes a type_reclassification_log table (or reuse of an existing audit-log pattern) — same surface as this PR (backfill-member-agent-types.ts + crawler disagreement-log path). You've already noted it as out of scope in the PR body, so this is just the cross-link: consider folding before merge or confirm as follow-up post-merge.


Generated by Claude Code

@EmmaLouise2018 EmmaLouise2018 merged commit 29ea128 into main Apr 30, 2026
13 checks passed
@EmmaLouise2018 EmmaLouise2018 deleted the EmmaLouise2018/registry-backfill-stale-types branch April 30, 2026 03:27
EmmaLouise2018 added a commit that referenced this pull request Apr 30, 2026
Append-only audit trail for agent type transitions. Captures every flip
from the three writer paths (backfill_script, crawler_promote,
member_write) so future audits answer with a row, not a stdout-grep.

No FK to agents — historical record survives agent deletion.

Refs #3550, #3538, #3541.
EmmaLouise2018 added a commit that referenced this pull request Apr 30, 2026
#3541 + 457_agent_verification_badges_per_version both landed on main while
this PR was open, so 457 is now occupied. Migration runner explicitly
errors on duplicate version numbers (server/src/db/migrate.ts:80) — would
crash on next deploy.

Renames the migration file and the doc cross-reference in the helper
module's header comment. No test/code references the version number, so
no other call sites change.
EmmaLouise2018 added a commit that referenced this pull request Apr 30, 2026
…r-agent timing

#3541 has merged, so resolveAgentTypes is statically imported from
member-profiles.ts (the dynamic-import workaround is gone). Adds a
load-bearing silent-corruption guard: a transient probe failure on an
agent that already had a snapshot row never overwrites that row back to
NULL — decideWrite() routes probe_failed/dns_failed to "preserve". Adds
per-agent elapsed_ms + slowest-N report so operators see the slow tail.
Fails loud on missing DATABASE_URL. Test coverage: success / probe-failed
(timeout) / DNS-failed / discovery_error routing / silent-corruption
guard / dry-run / per-agent timing — 18/18 pass.
bokelley pushed a commit that referenced this pull request May 1, 2026
…e transitions (closes #3550) (#3567)

* feat(registry): add type_reclassification_log migration

Append-only audit trail for agent type transitions. Captures every flip
from the three writer paths (backfill_script, crawler_promote,
member_write) so future audits answer with a row, not a stdout-grep.

No FK to agents — historical record survives agent deletion.

Refs #3550, #3538, #3541.

* feat(registry): add insertTypeReclassification DB helper

Single-call insert helper for the type_reclassification_log table.
Idempotent only at the row level — deduplication is the caller's
responsibility. On insert failure we log and swallow: the audit log is
observability, not a write barrier, and a failed log insert must not
roll back the caller's primary intent.

Refs #3550.

* feat(registry): crawler disagreement path writes audit log row

When the type-update policy decides 'disagreement' (stored non-unknown
differs from inferred non-unknown), also write a type_reclassification_log
row with source='crawler_promote' and notes={decision: 'logged_only_no_promote'}.

The crawler still does not auto-flip — the disagreement event itself is
what the audit log captures. Operator runs the backfill to flip
explicitly. See #3538.

Refs #3550.

* feat(registry): member-write resolveAgentTypes flips emit audit log rows

After resolveAgentTypes runs at any of the three call sites (POST create,
PUT bulk-update, admin update), diff against the pre-resolve agent array
and write a type_reclassification_log row per flipped agent. source is
'member_write', member_id is the workos_organization_id (or profile id
on the admin path).

Pulls the diff/log logic into a new exported helper
(logResolvedTypeChanges) so the three sites stay tight.

Refs #3550.

* feat(registry): backfill script writes audit log rows in real mode

Every real-mode flip from backfill-member-agent-types.ts now writes a
type_reclassification_log row with source='backfill_script' and a
generated run_id (backfill-<unix-ms>). Dry-run skips the audit log —
no writes, no audit rows.

run_id is also echoed in the script's summary so an operator can answer
"what did the 2026-04-29 backfill change?" with a single SELECT.

Refs #3550.

* test(registry): pin type_reclassification_log helper + crawler audit hook

- type-reclassification-log-db.test.ts: 6 tests covering canonical insert
  shape, null-padding for omitted optionals, JSONB serialization, explicit
  null oldType (first-classification), error swallow (audit log must never
  block caller), and per-source acceptance.
- crawler-type-update-policy.test.ts: extends with 3 tests pinning that
  the disagreement branch writes source='crawler_promote' to the audit
  log, while the agree/promote branches do not.

Plus changeset.

Refs #3550.

* docs(registry): pin resolveAgentTypes returns-new-array invariant in docstring

Pre-review nit: logResolvedTypeChanges captures `before` arrays by reference
at three call sites and diffs against the resolved array. Whole audit-log
diff depends on resolveAgentTypes returning a new array, never mutating in
place. Pinning the contract in the docstring so a future refactor that
switches to in-place mutation surfaces the constraint at the source rather
than silently zeroing out audit log entries. Closes #3550.

* fix(registry): renumber type_reclassification_log migration 457 -> 459

#3541 + 457_agent_verification_badges_per_version both landed on main while
this PR was open, so 457 is now occupied. Migration runner explicitly
errors on duplicate version numbers (server/src/db/migrate.ts:80) — would
crash on next deploy.

Renames the migration file and the doc cross-reference in the helper
module's header comment. No test/code references the version number, so
no other call sites change.
EmmaLouise2018 added a commit that referenced this pull request May 1, 2026
…eout

Closes #3551 (Problem 2 quick-win for #3538).

77% of agents in the public registry currently render type='unknown'. Before
designing the full retry-with-backoff system, ship a one-shot script that
re-probes every currently-unknown agent with an extended 30s timeout. The
output tells us whether unknown is mostly transient or mostly dead — informs
the full Problem 2 PR's scope.

## Script

server/scripts/reprobe-unknown-agents.ts:
- Selects every agent with snapshot.inferred_type='unknown' or missing snapshot
- Re-probes with 30s timeout (vs current 10s in crawler.ts:548)
- Reuses the crawler's probe + write helpers — no bypass
- Calls resolveAgentTypes from member-profiles.ts (#3541) so member_profiles.agents[] picks up new types
- Reports {still_unknown, newly_classified by type, probe_failed, dns_failed}
- Idempotent. --dry-run mode for staging validation before real runs.

## Test

server/tests/unit/reprobe-unknown-agents.test.ts pins the report-shape contract
across success/failure/dns-fail cases.

## Operator note

Same protocol as the #3541 backfill: dry-run on staging, capture stdout,
eyeball the diff, then real run. Repeat on prod. Operator paste-back of the
final report into a PR comment so the full Problem 2 PR can scope from real
data.

## Out of scope

- Full retry-with-backoff loop with last_probe_attempt_at tracking
- Mark-as-dead semantics for permanently unreachable agents
Both depend on this script's output to scope correctly.
EmmaLouise2018 added a commit that referenced this pull request May 1, 2026
…r-agent timing

#3541 has merged, so resolveAgentTypes is statically imported from
member-profiles.ts (the dynamic-import workaround is gone). Adds a
load-bearing silent-corruption guard: a transient probe failure on an
agent that already had a snapshot row never overwrites that row back to
NULL — decideWrite() routes probe_failed/dns_failed to "preserve". Adds
per-agent elapsed_ms + slowest-N report so operators see the slow tail.
Fails loud on missing DATABASE_URL. Test coverage: success / probe-failed
(timeout) / DNS-failed / discovery_error routing / silent-corruption
guard / dry-run / per-agent timing — 18/18 pass.
EmmaLouise2018 added a commit that referenced this pull request May 1, 2026
…eout (#3558)

* feat(registry): one-shot re-probe of unknown agents with extended timeout

Closes #3551 (Problem 2 quick-win for #3538).

77% of agents in the public registry currently render type='unknown'. Before
designing the full retry-with-backoff system, ship a one-shot script that
re-probes every currently-unknown agent with an extended 30s timeout. The
output tells us whether unknown is mostly transient or mostly dead — informs
the full Problem 2 PR's scope.

## Script

server/scripts/reprobe-unknown-agents.ts:
- Selects every agent with snapshot.inferred_type='unknown' or missing snapshot
- Re-probes with 30s timeout (vs current 10s in crawler.ts:548)
- Reuses the crawler's probe + write helpers — no bypass
- Calls resolveAgentTypes from member-profiles.ts (#3541) so member_profiles.agents[] picks up new types
- Reports {still_unknown, newly_classified by type, probe_failed, dns_failed}
- Idempotent. --dry-run mode for staging validation before real runs.

## Test

server/tests/unit/reprobe-unknown-agents.test.ts pins the report-shape contract
across success/failure/dns-fail cases.

## Operator note

Same protocol as the #3541 backfill: dry-run on staging, capture stdout,
eyeball the diff, then real run. Repeat on prod. Operator paste-back of the
final report into a PR comment so the full Problem 2 PR can scope from real
data.

## Out of scope

- Full retry-with-backoff loop with last_probe_attempt_at tracking
- Mark-as-dead semantics for permanently unreachable agents
Both depend on this script's output to scope correctly.

* fix(registry): reprobe — static import + silent-corruption guard + per-agent timing

#3541 has merged, so resolveAgentTypes is statically imported from
member-profiles.ts (the dynamic-import workaround is gone). Adds a
load-bearing silent-corruption guard: a transient probe failure on an
agent that already had a snapshot row never overwrites that row back to
NULL — decideWrite() routes probe_failed/dns_failed to "preserve". Adds
per-agent elapsed_ms + slowest-N report so operators see the slow tail.
Fails loud on missing DATABASE_URL. Test coverage: success / probe-failed
(timeout) / DNS-failed / discovery_error routing / silent-corruption
guard / dry-run / per-agent timing — 18/18 pass.

* fix(test): vi.mock member-profiles to bypass WorkOS init in reprobe test

Previous shim was a no-op due to ESM import hoisting — imports run before
module-body statements, so member-profiles.ts (transitively imported via
the script's static import of resolveAgentTypes) constructed WorkOS before
WORKOS_API_KEY got set. Passed locally where the env was already set;
failed in CI where it wasn't.

Mock member-profiles directly so WorkOS init never triggers. The script's
behavior around resolveAgentTypes is exercised through the mock — unit
tests pin the behavior, not the implementation.

18/18 reprobe tests pass.
EmmaLouise2018 added a commit that referenced this pull request May 1, 2026
Changesets are append-only history. The previous commit on this branch
deleted this file when removing the script — that's wrong. The original
changeset describes what landed in #3541 and stays as historical record;
the new migration changeset (backfill-member-agent-types-migration.md)
describes the follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants