Skip to content

Rename crawler UA to drop 'bot' suffix#366

Merged
simonsmallchua merged 3 commits into
mainfrom
crawler-ua-rename
Apr 28, 2026
Merged

Rename crawler UA to drop 'bot' suffix#366
simonsmallchua merged 3 commits into
mainfrom
crawler-ua-rename

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented Apr 28, 2026

Summary

  • Renames the crawler user-agent from HoverBot/1.0 (+https://www.goodnative.co/hover) to Hover/1.0 (+https://www.goodnative.co/hover) to bypass naive substring filters that block the word "bot" (Amazon today; rest of issue Crawler reliability action plan (from #319 investigation) #365 row 3 going forward).
  • Updates the robots.txt parser comments and the per-bot test fixtures (User-agent: Hover instead of User-agent: HoverBot). The bot-name extraction in internal/crawler/robots.go already derives the name from the UA string, so the rename automatically routes per-bot rules to hover.
  • Refs Crawler reliability action plan (from #319 investigation) #365 (row 3, "UA rename"). Smallest, lowest-risk item in the action plan; ships first.

Probe results (HoverBot/1.0 vs Hover/1.0)

Domain Old UA New UA
amazon.com 503 200
target.com.au 403 403
woolworths.com.au 403 403

Amazon flips as expected. The two AU retailers do not flip — both respond Server: AkamaiGHost with akaalb_* ALB cookies and 403 any non-browser UA, including the Googlebot string. That's full Akamai Bot Manager, not a "bot" substring filter, so they belong to row 1 of #365 (WAF detection / pre-flight) plus row 3(b) (customer-side allowlist outreach), not row 3(a) (this PR). Detail in #365 comment.

Test plan

  • go test ./internal/crawler/... passes locally (existing robots tests updated, all green).
  • gofmt -w and goimports -w on all touched files; pre-commit format/lint/security gates passed.
  • Curl probe against amazon.com returns 200 with the new UA (was 503).
  • Curl probe against target.com.au and woolworths.com.au reproduced — confirmed Akamai-fingerprinted, deferred to row 1/3(b).
  • CI green on this PR.

Summary by CodeRabbit

  • Changed

    • Crawler user-agent updated from "HoverBot/1.0" to "Hover/1.0" to improve compatibility with strict bot filters while keeping the contact URL.
    • Robots.txt matching now recognizes the refined "Hover" agent identity when applying per-agent rules, ensuring intended crawl rules are applied.
  • Documentation

    • Changelog updated to reflect the user-agent rename and the adjusted matching behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf87a54f-c61a-4826-94b9-d52d73f9df24

📥 Commits

Reviewing files that changed from the base of the PR and between a694c9b and cb0a3b5.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

The crawler user-agent identifier was renamed from HoverBot/1.0 to Hover/1.0. Default config, inline comments, tests, and changelog were updated to reflect the new name; parsing behavior and test expectations remain unchanged.

Changes

Cohort / File(s) Summary
Changelog / Docs
CHANGELOG.md
Added an Unreleased entry documenting the user-agent rename to Hover/1.0 and referenced the issue.
Default configuration
internal/crawler/config.go
Default UserAgent in DefaultConfig() changed from HoverBot/1.0 (+https://www.goodnative.co/hover) to Hover/1.0 (+https://www.goodnative.co/hover).
Crawler comments
internal/crawler/robots.go
Inline comments updated to refer to Hover (example user-agent); no logic changes.
Tests
internal/crawler/robots_test.go
Test fixtures and userAgent inputs updated from HoverBot to Hover; expected parsing results unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰
I nibbled at the user-agent today,
Slipped "Bot" away to let "Hover" play.
Soft hops across the robots' field,
A quieter name, the same shield.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the primary change: renaming the crawler user-agent from 'HoverBot' to 'Hover' by dropping the 'bot' suffix.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch crawler-ua-rename

Comment @coderabbitai help to get the list of available commands and usage tips.

@supabase
Copy link
Copy Markdown

supabase Bot commented Apr 28, 2026

Updates to Preview Branch (crawler-ua-rename) ↗︎

Deployments Status Updated
Database Tue, 28 Apr 2026 10:25:33 UTC
Services Tue, 28 Apr 2026 10:25:33 UTC
APIs Tue, 28 Apr 2026 10:25:33 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Tue, 28 Apr 2026 10:25:35 UTC
Migrations Tue, 28 Apr 2026 10:25:36 UTC
Seeding Tue, 28 Apr 2026 10:25:38 UTC
Edge Functions Tue, 28 Apr 2026 10:25:38 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Release Versions

App patch: v0.33.11v0.33.12

Changelog

Changed

Fixed

  • Job throughput could collapse to ~0.3 tasks/s after a same-domain
    cancel-and-restart, while concurrent jobs on other domains kept running at 5–8
    tasks/s. Root cause was stale per-job broker state from the previous run
    leaking into the new dispatcher loop.

    • RemoveJobKeys now also clears the cancelled job's field from every
      per-host hover:dom:flight:* HASH. Without this every cancel left a
      leftover counter that drifted unbounded across restarts.
    • On worker boot, scan hover:dom:flight:* and drop fields whose job is not
      in the active Postgres set. Hard SIGKILL bypasses the graceful-shutdown
      drain, so the dispatcher's increment runs but the worker's pacer.Release
      decrement never does — dom:flight has no dedicated reconciler, so drift
      used to accumulate forever.
    • RunningCounters.Reconcile is now atomic: a single Lua EVAL replaces the
      previous DEL + HSET pipeline. Concurrent Increment / Decrement could
      land between the two commands and have its update silently lost when the
      rewrite landed, freezing dispatch for any job whose counter floated up to
      its concurrency cap until the next 120s reconcile.
  • Job status pill stayed on "Starting up" forever because UpdateJobStatus had
    no callers in the production graph. The dispatcher now flips
    pending → running on the first successful publish for a job via a new
    JobManager.MarkJobRunning (guarded UPDATE, idempotent across worker
    restarts).

  • Dispatcher now self-heals against unknown future drift classes in the per-job
    hover:running counter. When CanDispatch keeps refusing dispatch for a
    single job for longer than REDIS_DISPATCH_STUCK_THRESHOLD_S (default 30s)
    while the job's schedule ZSET still has due tasks, the dispatcher fires an
    immediate RunningCounters.Reconcile from the authoritative Redis PEL via the
    worker pool's new TriggerReconcile hook. Triggers are rate-limited to one
    per 2× threshold per job and collapse onto any in-flight reconcile via
    TryLock, so a genuinely-at-capacity job can't drive a reconcile burst. This
    is a safety net layered on top of the existing 120s reconcileLoop, not a
    replacement — closes the gap PR Fix per-job broker state leak on cancel/restart #362 left for future drift classes we haven't
    yet identified.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 33-36: The changelog line claiming the rename to `Hover/1.0`
bypasses naive substring filters for all listed sites overstates the cause;
update the sentence that begins "Crawler user-agent renamed from `HoverBot/1.0`
to `Hover/1.0`..." to separate the cases: state that Amazon (or other examples
that actually match) used substring-based blocking and was affected by the "bot"
substring, while target.com.au and woolworths.com.au continue to return 403 for
non-browser UAs generally; preserve the contact URL and the note about
robots.txt parser matching under `User-agent: Hover` and reference `#365`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a0bfc02-387a-4113-b31f-5cef609be126

📥 Commits

Reviewing files that changed from the base of the PR and between 498c8d7 and 6c83514.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • internal/crawler/config.go
  • internal/crawler/robots.go
  • internal/crawler/robots_test.go

Comment thread CHANGELOG.md Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-366.fly.dev
Dashboard: https://hover-pr-366.fly.dev/dashboard

@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-366.fly.dev
Dashboard: https://hover-pr-366.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit 84e833a into main Apr 28, 2026
24 of 27 checks passed
@simonsmallchua simonsmallchua deleted the crawler-ua-rename branch April 28, 2026 10:53
@coderabbitai coderabbitai Bot mentioned this pull request May 11, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant