Skip to content

feat(security): scraper honeypot route on /hs/code namespace#38

Merged
SoapyRED merged 1 commit into
mainfrom
feat/scraper-honeypot
May 16, 2026
Merged

feat(security): scraper honeypot route on /hs/code namespace#38
SoapyRED merged 1 commit into
mainfrom
feat/scraper-honeypot

Conversation

@SoapyRED
Copy link
Copy Markdown
Owner

Summary

Scraper-detection honeypot scoped to the /hs/code namespace.

  • New static page at app/hs/code/00000000/page.tsx returns a 200 with a clearly-marked placeholder ("RESERVED FOR TESTING — NOT A REAL COMMODITY CODE", duty 0.00%). robots: noindex,nofollow keeps honest crawlers out of the index.
  • app/hs/page.tsx seeds a hidden bait link (position:absolute; left:-9999px; tabIndex=-1; aria-hidden; rel="nofollow") — visible to DOM-parsing scrapers, not to visual users / keyboard users / screen readers / honest crawlers.
  • middleware.ts emits [ScrapeGuard] HONEYPOT path=/hs/code/00000000 ip=… ua=… before the existing 10/5min HS rate-limit check, so a trap-trigger is logged even if the IP is already over budget.
  • app/sitemap.ts filters /hs/code/00000000 out of hsSubheadingRoutes as defence-in-depth (real WCO HS 2022 codes are 6 digits — 00000000 could not appear anyway).
  • No auto-block this sprint. Evidence collection first, auto-block in a follow-up sprint once we have false-positive-rate data.

FAULT 5 checklist applied

  • CHANGELOG.md entry added (2026-05-16, Security tag)
  • lib/changelog-data.ts entry added so /changelog renders it
  • generateMetadata() / static metadata on the new page (noindex,nofollow — excluded from sitemap and lint:seo-titles scope)
  • No new API surface, no nav/footer/MCP/Postman/README/npm changes — scope is the website surface only
  • Honeypot path NOT in sitemap (defensive filter + dataset already excludes 8-digit codes)

Test plan

  • Preview deploy READY (auto)
  • curl -s {preview}/hs/code/00000000 | grep -E "(RESERVED|noindex)" — confirm 200 + fake entry + noindex meta
  • curl -s {preview}/sitemap.xml | grep "00000000" — confirm zero matches
  • curl -s {preview}/hs | grep "00000000" — confirm bait link is in the DOM
  • node scripts/smoke-test.mjs {preview} — full smoke test PASS
  • Post-merge: same checks against https://www.freightutils.com
  • Post-merge: Sentry quiet for 10 min after deploy

🤖 Generated with Claude Code

New page at /hs/code/00000000 returns a 200 with a clearly-marked
placeholder ("RESERVED FOR TESTING — NOT A REAL COMMODITY CODE",
duty 0.00%); robots: noindex,nofollow keeps honest crawlers out of
the index.

A hidden bait link is seeded on /hs (position:absolute; left:-9999px;
tabIndex=-1; aria-hidden; rel="nofollow") so DOM-parsing scrapers
follow it but visual users, keyboard users, screen readers, and
honest crawlers stay out of the trap.

middleware.ts emits the marker

  [ScrapeGuard] HONEYPOT path=/hs/code/00000000 ip=… ua=…

before the existing HS rate-limit check, so a trap-trigger is logged
even when the IP is already over its 10/5min HS budget. No auto-block
this sprint — evidence collection first, auto-block in a follow-up
once false-positive rate is measured.

app/sitemap.ts now filters /hs/code/00000000 out of hsSubheadingRoutes
as defence-in-depth (the real WCO HS 2022 dataset uses 6-digit codes,
so the literal `00000000` could not appear anyway).

FAULT 5: CHANGELOG.md + lib/changelog-data.ts entries added for
2026-05-16. No new API, no nav/sitemap inclusion, no MCP surface
touched — honeypot is deliberately scoped to the website surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
freighttools Ready Ready Preview, Comment May 16, 2026 6:56am

Request Review

@SoapyRED SoapyRED merged commit 2375f78 into main May 16, 2026
2 checks passed
SoapyRED added a commit that referenced this pull request May 16, 2026
Bumps Last-updated 9 May → 16 May. Captures the 17 PRs landed across
2026-05-13..2026-05-16 (PR #25 through PR #41) plus the 14 May infra
changes that didn't have their own PR (Cloudflare disconnect, Upstash
PAYG, IndexNow live).

Sections refreshed:
- Sprint cadence 13–16 May (new): full PR list with one-liner per PR.
- Platform: MCP v2.1.0 → v2.1.1; route count 36 → 38.
- Infrastructure changes (new): CF Workers disconnected 14 May, CF DNS-
  only / Vercel firewall is sole edge security, Upstash PAYG $20 cap,
  CLAUDE.md at root encodes FAULT 5 + FAULT 14, IndexNow workflow live.
- Data integrity status (new): table for ULD / Airlines / ADR / Containers
  / UN-LOCODE / HS / Vehicles / Customs-duty. ULD + Airlines + ADR
  verified: true; the other 5 verified: false pending allowlist
  extension (specific domains enumerated).
- Scraper defence status (new): PR #31 / #32 / #33 / #38 live, Phases
  3+4 deferred to runbook, Phase 2 skipped.
- Edge firewall: scoped to Vercel-only (CF inert now).
- Distribution surfaces: table with current download counts, Smithery
  score, MCP Registry STALE flag, Glama description STALE flag.
- Weekly digest CLI (new): six FAULT 14 invariants summarised; points
  at scripts/weekly-digest/README.md for the full spec.
- Vercel Analytics: 30-day baseline updated (3,311 visitors / 6,070
  PV / 69% bounce / SG 73%).
- First validated user signals: Tom (CEVA) preserved + Simon's team
  organic adoption added per 16 May report.
- What's blocked / What's next / Red flags: updated to reflect today's
  reality — vehicles+customs SHIPPED (#39 #40), weekly digest SHIPPED
  (#41), Make.com Town Hall 21 May 4PM BST queued, CEVA→WFS transition
  complete with week 2 of induction pending.
- Canonical references: added pointers to scripts/weekly-digest/ and
  the IndexNow workflow.

No CHANGELOG entry — internal doc, not user-visible. Per the prompt.

Co-authored-by: SoapyRED <soapyred@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant