Rename crawler UA to drop 'bot' suffix#366
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThe crawler user-agent identifier was renamed from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Updates to Preview Branch (crawler-ua-rename) ↗︎
Tasks are run on every commit but only new migration files are pushed.
View logs for this Workflow Run ↗︎. |
Release VersionsApp patch: ChangelogChanged
Fixed
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 33-36: The changelog line claiming the rename to `Hover/1.0`
bypasses naive substring filters for all listed sites overstates the cause;
update the sentence that begins "Crawler user-agent renamed from `HoverBot/1.0`
to `Hover/1.0`..." to separate the cases: state that Amazon (or other examples
that actually match) used substring-based blocking and was affected by the "bot"
substring, while target.com.au and woolworths.com.au continue to return 403 for
non-browser UAs generally; preserve the contact URL and the note about
robots.txt parser matching under `User-agent: Hover` and reference `#365`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a0bfc02-387a-4113-b31f-5cef609be126
📒 Files selected for processing (4)
CHANGELOG.mdinternal/crawler/config.gointernal/crawler/robots.gointernal/crawler/robots_test.go
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
🐝 Review App Deployed Homepage: https://hover-pr-366.fly.dev |
|
🐝 Review App Deployed Homepage: https://hover-pr-366.fly.dev |
Summary
HoverBot/1.0 (+https://www.goodnative.co/hover)toHover/1.0 (+https://www.goodnative.co/hover)to bypass naive substring filters that block the word "bot" (Amazon today; rest of issue Crawler reliability action plan (from #319 investigation) #365 row 3 going forward).User-agent: Hoverinstead ofUser-agent: HoverBot). The bot-name extraction ininternal/crawler/robots.goalready derives the name from the UA string, so the rename automatically routes per-bot rules tohover.Probe results (
HoverBot/1.0vsHover/1.0)Amazon flips as expected. The two AU retailers do not flip — both respond
Server: AkamaiGHostwithakaalb_*ALB cookies and 403 any non-browser UA, including the Googlebot string. That's full Akamai Bot Manager, not a "bot" substring filter, so they belong to row 1 of #365 (WAF detection / pre-flight) plus row 3(b) (customer-side allowlist outreach), not row 3(a) (this PR). Detail in #365 comment.Test plan
go test ./internal/crawler/...passes locally (existing robots tests updated, all green).gofmt -wandgoimports -won all touched files; pre-commit format/lint/security gates passed.Summary by CodeRabbit
Changed
Documentation