Skip to content

Drop Worker-id suffix from crawler UA#384

Merged
simonsmallchua merged 3 commits into
mainfrom
work/exciting-bartik-1e43af
May 11, 2026
Merged

Drop Worker-id suffix from crawler UA#384
simonsmallchua merged 3 commits into
mainfrom
work/exciting-bartik-1e43af

Conversation

@simonsmallchua
Copy link
Copy Markdown
Contributor

@simonsmallchua simonsmallchua commented May 11, 2026

Summary

  • crawler.New(config, id...) had a dead branch that appended Worker-<id> to the configured UserAgent. No production caller passes an ID — all three call sites (cmd/worker/main.go, cmd/app/main.go, internal/jobs/manager.go) use crawler.New(crawlerConfig), so the suffix has never been emitted.
  • Removed the suffix branch plus the now-unused id ...string variadic, crawlerID local, and the never-read id field on the Crawler struct.
  • Signature change is non-breaking (no caller passed an ID); behaviour unchanged in production.

Why

docs/research/2025-10/crawling-best-practice/issue-6-user-agent-rotation.md already notes the Worker-N suffix isn't used. Leaving the branch in place was a foot-gun — any future caller passing an ID would silently mutate the UA we publish on /bot and rely on for robots.txt identification.

Test plan

  • gofmt -w internal/crawler/crawler.go
  • go build ./...
  • go test ./internal/crawler/... (passes, 15.3s)
  • Spot-check /bot page renders HoverBot/1.0 (+https://goodnative.co) (no behavioural change expected — production UA was already this exact string)

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Summary by CodeRabbit

  • Refactor

    • Simplified crawler initialization: per-crawler identifiers removed and worker-specific User-Agent suffixes dropped.
    • Crawler now consistently uses the User-Agent defined in configuration, ensuring uniform request headers across workers.
  • Documentation

    • Changelog updated to reflect the User-Agent behavior change and removal of the prior per-worker ID behavior.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: abf0515c-1c74-4b7a-aa5e-055a15264bf2

📥 Commits

Reviewing files that changed from the base of the PR and between 9c4d551 and c4cb283.

📒 Files selected for processing (1)
  • CHANGELOG.md
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md

📝 Walkthrough

Walkthrough

The PR removes the per-crawler id field and optional id parameter from the New constructor. The Crawler now uses config.UserAgent directly instead of generating worker-specific User-Agent strings with a Worker-<id> suffix.

Changes

Crawler ID and User-Agent Simplification

Layer / File(s) Summary
Struct Definition
internal/crawler/crawler.go
Removed id string field from the exported Crawler struct.
Constructor API and User-Agent Handling
internal/crawler/crawler.go
Updated New signature to accept only config (removed variadic id parameter); Colly collector now uses colly.UserAgent(config.UserAgent) instead of generating worker-suffixed User-Agent strings.
Struct Initialization
internal/crawler/crawler.go
Removed id field assignment from the &Crawler{...} struct literal in the constructor.
Changelog
CHANGELOG.md
Add ### Changed note documenting that crawler user-agent is exactly config.UserAgent and the dead Worker-<id> branch was removed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • Good-Native/hover#366: Modifies the crawler's User-Agent handling in internal/crawler by updating the default config.UserAgent value, directly related to the User-Agent simplification in this PR.

Poem

🐰 I hopped through code with nimble feet,
I pruned the id—no more repeat,
The UserAgent now is plain and true,
Config speaks once, no Worker crew,
A tidy hop — the build says "woo!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Drop Worker-id suffix from crawler UA' directly and accurately summarizes the main change: removing the Worker- suffix from the crawler's User-Agent.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch work/exciting-bartik-1e43af

Comment @coderabbitai help to get the list of available commands and usage tips.

@supabase
Copy link
Copy Markdown

supabase Bot commented May 11, 2026

Updates to Preview Branch (work/exciting-bartik-1e43af) ↗︎

Deployments Status Updated
Database Mon, 11 May 2026 22:46:26 UTC
Services Mon, 11 May 2026 22:46:26 UTC
APIs Mon, 11 May 2026 22:46:26 UTC

Tasks are run on every commit but only new migration files are pushed.
Close and reopen this PR if you want to apply changes from existing seed or migration files.

Tasks Status Updated
Configurations Mon, 11 May 2026 22:46:28 UTC
Migrations Mon, 11 May 2026 22:46:30 UTC
Seeding Mon, 11 May 2026 22:46:31 UTC
Edge Functions Mon, 11 May 2026 22:46:31 UTC

View logs for this Workflow Run ↗︎.
Learn more about Supabase for Git ↗︎.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 11, 2026

Release Versions

App patch: v0.34.10v0.34.11

Changelog

Changed

  • Crawler user agent is now always exactly config.UserAgent. Dropped the dead
    Worker-<id> suffix branch in crawler.New along with the unused variadic ID
    parameter and struct field.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Line 34: The inline code span in CHANGELOG.md currently contains a leading
space inside the backticks (` Worker-<id>`) which triggers markdownlint MD038;
edit the line so the backticks enclose the exact token Worker-<id> (i.e., change
`` ` Worker-<id>` `` to `` `Worker-<id>` ``) and if you intended spacing for the
surrounding text, move the extra space outside the backticks; this touches the
text mentioning `Worker-<id>` and the surrounding phrase referencing
`crawler.New`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3162d1f2-8099-4308-8f6d-5e0bc35a948b

📥 Commits

Reviewing files that changed from the base of the PR and between 528352e and 9c4d551.

📒 Files selected for processing (1)
  • CHANGELOG.md

Comment thread CHANGELOG.md Outdated
@github-actions
Copy link
Copy Markdown
Contributor

🐝 Review App Deployed

Homepage: https://hover-pr-384.fly.dev
Dashboard: https://hover-pr-384.fly.dev/dashboard

@simonsmallchua simonsmallchua merged commit b5bacde into main May 11, 2026
21 checks passed
@simonsmallchua simonsmallchua deleted the work/exciting-bartik-1e43af branch May 11, 2026 23:06
simonsmallchua added a commit that referenced this pull request May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant