fix(cdp): bypass bot filter for non-browser $lib events#60557
Merged
Conversation
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
nodejs/src/cdp/templates/_transformations/bot-detection/bot-detection.template.test.ts:331-345
**Duplicate test scenario**
This test is functionally identical to the existing `'should filter out known bot user agent'` test on line 40 — same Googlebot UA, no `$lib`, same `DEFAULT_INPUTS`, same expectation of `execResult` being falsy. Both implicitly exercise the "missing `$lib` → `is_browser_traffic = true`" path. The conservative-fallback intent is already captured by the in-code comment on `is_browser_traffic`, so this test adds no new coverage and could be removed or, preferably, folded into the `it.each` above by adding an `undefined` (or absent) `$lib` case alongside `'web'` and `'js'`.
Reviews (1): Last reviewed commit: "fix(cdp): narrow bot-filter bypass to cu..." | Re-trigger Greptile |
meikelmosby
approved these changes
Jun 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The
Filter Bot Eventstransformation (template-bot-detection) drops events from any source whose user agent matches PostHog's curated bot UA list (nodejs/src/cdp/hog-transformations/bots/bots.ts) or whose IP matches the curated datacenter list (nodejs/assets/bot-ips.txt).This silently kills server-side captures:
$ai_generationand other LLM observability events fromposthog-pythonandposthog-node(UAspython-httpx,axios/*, both substring-match the curated bot UA list).$identify/ custom events from a Python/Node backend.posthog-webhook).posthog-ios,posthog-android) whose HTTP-client UAs can substring-match entries on the curated list.These all look like "bots" by UA, but the curated lists are tuned for browser traffic. A backend HTTP client is a server, not a crawler.
Surfaced via a support ticket where a customer's
$ai_generationevents never reached LLM analytics, despite running the transformation in default config (no custom patterns).Changes
Gate the two curated-list calls (
isKnownBotUserAgentandisKnownBotIp) onis_browser_traffic, defined as$lib in {web, js}or$libmissing. Customer-configuredcustomBotPatternsandcustomIpPrefixescontinue to apply for every event regardless of$lib(the customer's explicit configuration is preserved).No new inputs, no schema changes.
How did you test this code?
I've tested this manually with code before/after and sending events and watching if they land or not.
Added jest cases to
bot-detection.template.test.ts:$lib in {posthog-python, posthog-node, posthog-webhook, posthog-ios}with bot UA -> event kept$libwith bot IP -> event kept$lib$lib$lib in {web, js}or missing -> event droppedAll 24 tests in the file pass locally via
hogli test.Automatic notifications
Docs update
No docs in scope.
🤖 Agent context
Authored by Claude Code (Opus 4.7) on Daniel's machine. Triage started from a support ticket where
$ai_generationevents never appeared in LLM analytics for a customer; Eli Reisman traced the drop to their bot-detection transformation; Radu Raicea asked me to follow up.Three approaches considered:
event.event LIKE '$ai_%'carve-out: solves the reported case but leaves the same silent-drop bug for server-side identify / custom events / webhook captures.nodejs/src/ingestion/ai/pipelines/ai-event-subpipeline.ts): too broad, kills legit PII-scrubbing and property normalization for AI events.$lib-based gate inside the template: matches the curated lists' actual scope (browser-tuned heuristics), covers$ai_*viaposthog-python/posthog-nodeand also fixes server-side identify / custom / webhook traffic and mobile traffic.Verified
$libconventions before relying on them: posthog-js ->'web', posthog-js-lite ->'js', every server SDK ->'posthog-<lang>', mobile SDKs ->'posthog-ios'/'posthog-android'/'posthog-react-native'/'posthog-flutter'. No ingestion-side normalization of$lib.posthog-langchainis not a$libvalue (langchain events rideposthog-pythonwith$ai_framework='langchain').Scope decision: the gate covers only the curated lists. Customer-configured
customBotPatternsandcustomIpPrefixesstill apply for every event. Initial draft skipped the entire function for non-browser$lib, which would have silently changed customer config behaviour - narrowed after a second pass.Missing
$libis treated as browser (curated lists still run) - that's the conservative choice for raw HTTP clients hitting/capture/without identifying themselves. Forging$lib=posthog-pythonto bypass the filter is theoretically possible but the existing filter is trivially bypassed by changing the UA string; the template targets crawler/datacenter noise, not adversarial abuse.