Remove ^Amazon CloudFront crawler signature#602
Merged
Conversation
CloudFront's dominant traffic pattern at the origin is reverse-proxy fetches on behalf of real end users (cache miss / origin shield), not crawler activity. Classifying it as a crawler causes silent analytics/auth/rate-limit bugs for any site hosted behind CloudFront. The signature has flip-flopped (added 2020, removed 2020, re-added 2023 with anecdotal "getting tons of this lately" justification — see #392, #410, #504). The 2020 removal had clear technical reasoning that still applies. Removing again, with a regression guard in devices.txt to lock in the decision. `^Amazon Simple Notification Service Agent$` and `^Amazon-Route53-Health-Check-Service` remain in the list — those are unambiguously bots. Fixes #594. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fixture line in devices.txt was easy to delete by accident in a future change. Replacing it with a named test that explicitly documents the decision and the issue history, so a future re-add PR will fail with a clearly-named assertion that prompts a deliberate choice instead of a silent revert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…front # Conflicts: # raw/Crawlers.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
'^Amazon CloudFront'fromsrc/Fixtures/Crawlers.phpraw/Crawlers.{txt,json}viaphp export.phpAmazon CloudFrontfixture fromtests/data/user_agent/crawlers.txtAmazon CloudFronttotests/data/user_agent/devices.txtas a regression guardWhy
Fixes #594.
CloudFront is a reverse proxy / CDN. The
Amazon CloudFrontUA appears at the origin almost exclusively when CloudFront is fetching on behalf of a real end user (cache miss / origin shield / cache fill). For any site hosted behind CloudFront — which is a huge fraction of AWS deployments — classifying it as a crawler:The signature has flip-flopped:
The 2020 removal's reasoning still applies. The 2023 re-addition was anecdotal and didn't analyze whether the traffic was actually crawler-like vs the user's own CloudFront proxying real visitors.
Other CDNs (Fastly, Akamai, Bunny) aren't listed by their generic proxy UAs either, so this is also an inconsistency.
Cloudflare-AlwaysOnlinestays — that's a specific stale-cache-serving feature, not a generic proxy UA.The two unambiguously-bot Amazon entries remain:
^Amazon Simple Notification Service Agent$(SNS HTTP deliveries)^Amazon-Route53-Health-Check-Service(Route53 health probes)The regression guard in
devices.txtwill fail the test suite if anyone re-adds the signature in the future without first removing the guard, prompting a fresh design discussion rather than another silent flip.Test plan
php export.phpregeneratesraw/Crawlers.{txt,json}cleanlyvendor/bin/phpunit— all 18 tests pass (2,202,834 assertions)CloudFrontreferences anywhere insrc/Fixtures/,raw/, ortests/data/user_agent/crawlers.txt🤖 Generated with Claude Code