feat(security): scraper honeypot route on /hs/code namespace#38
Merged
Conversation
New page at /hs/code/00000000 returns a 200 with a clearly-marked
placeholder ("RESERVED FOR TESTING — NOT A REAL COMMODITY CODE",
duty 0.00%); robots: noindex,nofollow keeps honest crawlers out of
the index.
A hidden bait link is seeded on /hs (position:absolute; left:-9999px;
tabIndex=-1; aria-hidden; rel="nofollow") so DOM-parsing scrapers
follow it but visual users, keyboard users, screen readers, and
honest crawlers stay out of the trap.
middleware.ts emits the marker
[ScrapeGuard] HONEYPOT path=/hs/code/00000000 ip=… ua=…
before the existing HS rate-limit check, so a trap-trigger is logged
even when the IP is already over its 10/5min HS budget. No auto-block
this sprint — evidence collection first, auto-block in a follow-up
once false-positive rate is measured.
app/sitemap.ts now filters /hs/code/00000000 out of hsSubheadingRoutes
as defence-in-depth (the real WCO HS 2022 dataset uses 6-digit codes,
so the literal `00000000` could not appear anyway).
FAULT 5: CHANGELOG.md + lib/changelog-data.ts entries added for
2026-05-16. No new API, no nav/sitemap inclusion, no MCP surface
touched — honeypot is deliberately scoped to the website surface.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
5 tasks
SoapyRED
added a commit
that referenced
this pull request
May 16, 2026
Bumps Last-updated 9 May → 16 May. Captures the 17 PRs landed across 2026-05-13..2026-05-16 (PR #25 through PR #41) plus the 14 May infra changes that didn't have their own PR (Cloudflare disconnect, Upstash PAYG, IndexNow live). Sections refreshed: - Sprint cadence 13–16 May (new): full PR list with one-liner per PR. - Platform: MCP v2.1.0 → v2.1.1; route count 36 → 38. - Infrastructure changes (new): CF Workers disconnected 14 May, CF DNS- only / Vercel firewall is sole edge security, Upstash PAYG $20 cap, CLAUDE.md at root encodes FAULT 5 + FAULT 14, IndexNow workflow live. - Data integrity status (new): table for ULD / Airlines / ADR / Containers / UN-LOCODE / HS / Vehicles / Customs-duty. ULD + Airlines + ADR verified: true; the other 5 verified: false pending allowlist extension (specific domains enumerated). - Scraper defence status (new): PR #31 / #32 / #33 / #38 live, Phases 3+4 deferred to runbook, Phase 2 skipped. - Edge firewall: scoped to Vercel-only (CF inert now). - Distribution surfaces: table with current download counts, Smithery score, MCP Registry STALE flag, Glama description STALE flag. - Weekly digest CLI (new): six FAULT 14 invariants summarised; points at scripts/weekly-digest/README.md for the full spec. - Vercel Analytics: 30-day baseline updated (3,311 visitors / 6,070 PV / 69% bounce / SG 73%). - First validated user signals: Tom (CEVA) preserved + Simon's team organic adoption added per 16 May report. - What's blocked / What's next / Red flags: updated to reflect today's reality — vehicles+customs SHIPPED (#39 #40), weekly digest SHIPPED (#41), Make.com Town Hall 21 May 4PM BST queued, CEVA→WFS transition complete with week 2 of induction pending. - Canonical references: added pointers to scripts/weekly-digest/ and the IndexNow workflow. No CHANGELOG entry — internal doc, not user-visible. Per the prompt. Co-authored-by: SoapyRED <soapyred@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Scraper-detection honeypot scoped to the
/hs/codenamespace.app/hs/code/00000000/page.tsxreturns a 200 with a clearly-marked placeholder ("RESERVED FOR TESTING — NOT A REAL COMMODITY CODE", duty 0.00%).robots: noindex,nofollowkeeps honest crawlers out of the index.app/hs/page.tsxseeds a hidden bait link (position:absolute; left:-9999px; tabIndex=-1; aria-hidden; rel="nofollow") — visible to DOM-parsing scrapers, not to visual users / keyboard users / screen readers / honest crawlers.middleware.tsemits[ScrapeGuard] HONEYPOT path=/hs/code/00000000 ip=… ua=…before the existing 10/5min HS rate-limit check, so a trap-trigger is logged even if the IP is already over budget.app/sitemap.tsfilters/hs/code/00000000out ofhsSubheadingRoutesas defence-in-depth (real WCO HS 2022 codes are 6 digits —00000000could not appear anyway).FAULT 5 checklist applied
lib/changelog-data.tsentry added so/changelogrenders itgenerateMetadata()/ staticmetadataon the new page (noindex,nofollow — excluded from sitemap andlint:seo-titlesscope)Test plan
curl -s {preview}/hs/code/00000000 | grep -E "(RESERVED|noindex)"— confirm 200 + fake entry + noindex metacurl -s {preview}/sitemap.xml | grep "00000000"— confirm zero matchescurl -s {preview}/hs | grep "00000000"— confirm bait link is in the DOMnode scripts/smoke-test.mjs {preview}— full smoke test PASShttps://www.freightutils.com🤖 Generated with Claude Code