v0.3.0
·
15 commits
to main
since this release
Highlights
This release is a data-quality pass driven by ten end-to-end scans of major B2B
SaaS sites (HubSpot, Salesforce, Adobe, 6sense, Demandbase, ZoomInfo, Clearbit,
Notion, Linear, Airtable, monday.com, ClickUp, Loom, Gong, Mutiny). Every change
below is a real misattribution or omission observed in production traffic, not
speculative cleanup.
Tooling
scripts/normalize.jsnow detects cross-file collisions. When the same
cookie name or domain key appears in two or more actor files, the script
reports each occurrence with file name, attributed company, category, and
consent burden — and explicitly marksCOMPANY MISMATCHcases as bugs. The
alphabetical merge order inloadPlaybill()means later files silently
overwrite earlier ones, which had been hiding wrong attributions for
high-traffic vendors. The check ran on the existing data and surfaced 109
cookie + 344 domain collisions as a backlog for future passes.
Data — wrong attributions corrected
_dd_s→ Datadog Browser SDK (was DataDome bot protection). Two
different vendors with confusingly similar_dd_*prefixes._dd_sis the
primary session cookie of the Datadog RUM agent; DataDome's persistent cookie
isdatadomeand its test cookies use_dd_cookie_test_*. Category moves
fromsecurity/contestedtoanalytics/required— every site running
Datadog RUM was previously mis-audited as running bot protection._gd_session/_gd_svisitor/_gd_visitor→ 6sense Visitor ID (was
"Google Analytics Debug" /minimal). Confirmed across 6sense's own site,
Airtable, Gong, and others — the cookie value matches thesvisitor=
parameter onb.6sc.cobeacons in the same scan window. New burden:
required_strict. The_gd_visitorshort variant added._ttp→ TikTok Pixel (was Kakao Pixel)._ttpis TikTok's primary
pixel cookie; Kakao uses_kp_clk/_kawlt. Burden upgraded to
required_strict.cb_user_id/cb_group_id/cb_anonymous_id→ HubSpot (Clearbit)
Reveal (was Cxense via an over-greedycb_*pattern). The pattern is
deleted; explicit Clearbit entries take its place. Cxense (Piano DMP) uses
cX_*(capital X), notcb_.ar_debug→ Pinterest Conversion Tag (was Google). The Chrome
Attribution Reporting API debug cookie is set per-advertiser by their pixel,
not by Google directly; the canonical setter is Pinterest's tag.cg_uuid,greencolumnart.com→ CHEQ AI Technologies (was
"GreenColumnArt", categoryadvertising).greencolumnart.comis a CHEQ
cloak domain (thech=cheq4ppcURL marker is the tell). Category moved to
fingerprinting/required_strict. CHEQ rotates per-tenant cloak domains
(obs.<random>,ob.<random>) explicitly to evade tracker blocklists, so
the entry includes a note about path signatures (/ct,/mon,
/tracker/tc_imp.gif,/i/<hex>.js) for matcher-side detection.a.usbrowserspeed.com,usbrowserspeed.com→ Experian (Tapad)
Cross-Device Identity (was New Relic Browser Speed Test, plus a separate
USBrowserSpeed cookie attribution). Same cloak pattern as CHEQ — the URL
itself containspurpose=Retargeting + ID Resolution, which New Relic
doesn't do. Tapad was acquired by Experian in 2020. Category moved to
fingerprinting/required_strict.px.ads.linkedin.com→ LinkedIn (Microsoft) Insight Tag (was Roku /
DataXu / Roku OneView). DataXu's real domains arew55c.netand
dxlive.com; this entry was simply wrong.laboratory-anonymous-id→ HubSpot Laboratory (was "Various / Lab/Testing
Tools"). HubSpot's internal A/B testing framework.bb_*over-greedy Blackboard pattern removed. It was matching every
cookie starting withbb_as Blackboard LMS, including monday.com's
internal cookies. No replacement — the correct fix for the surfacing
cookies needs site-by-site investigation, and a wrong attribution is worse
than no attribution.
Data — cross-file duplicates resolved
The losing entry was deleted in each pair so the merger picks the better
attribution. Notable cases:
snap.licdn.com(kept advertising / "LinkedIn Insight Tag", deleted social)alb.reddit.com(added bare hostname under advertising / Reddit Pixel —
was previously only matched associal/contestedvia a wrong entry that
understated the consent burden of conversion pixels)fast.wistia.com(3-way collision: kept data_leak, deleted analytics + social)tag.demandbase.com(kept analytics / "Demandbase ABM" with reverse-IP note,
deleted marketing)secure.adnxs.com(kept fingerprinting, deleted advertising)js-agent.newrelic.com(default agent reclassified asanalytics/required
RUM, notsession_recording/contested— replay is a separately-licensed
feature using/replay/paths which retain their entry)
Data — new entries
Cookies (previously unmatched in production scans):
_dd_s,dd_anonymous_id(Datadog Browser SDK)_biz_uid,_biz_nA,_biz_pendingA,_biz_flagsA(Adobe / Bizible / Marketo Measure)pxcts,_pxde(HUMAN / PerimeterX bot defense — extends the existing family)_zitok(ZoomInfo WebSights first-party visitor token)ttcsid,ttcsid_*(TikTok Conversion Source ID)dicbo_id(Outbrain click ID)__q_state_*(Qualified Conversational Sales — pattern with workspace hash)Indr*(Unify base64-encoded workspace-prefixed cookie variants)mutiny.user.*(Mutiny B2B website personalization — reverse-IP firmographic)tracking-preferences(Twilio Segment consent storage)g_state(Google Sign-In / GSI state)tcm(Transcend Consent Manager)cookiehub(CookieHub CMP)hubspot_id_sent(HubSpot internal flag)cloudfront_viewer_country(AWS CloudFront geo header echoed to cookie)sequelUserId,sequelSessionId,sequel-consent(Sequel.io B2B virtual events)atlCohort,atl_session,atl_xid.current,atl_xid.ts,atlUserHash,
__Host-psifi.*(Atlassian cross-product tracking — covers Loom, Trello,
and other Atlassian-acquired properties)_otPreferencesSynced,ovtc_*(OneTrust virtual tag capture for SPA
navigation re-evaluation)
Domains:
www.google.com/ccm/collect,/rmkt/collect,/pagead/1p-user-list,
/pagead/1p-conversion,/gmp/conversion— Google Ads first-party endpoints
used as consent-mode workarounds. Each carries a note explaining the
1p-naming and the role in the Privacy Sandbox transition.pagead2.googlesyndication.com/ccm/collect— Google Customer Match Connect
(correctly distinguished from AdSense, which is the bare-hostname entry).fls.doubleclick.net— Floodlight conversion tracking (subdomain match
catches per-advertiser hosts like14611606.fls.doubleclick.net).sgtm-amer.hubspot.com,sgtm-emea.hubspot.com,sgtm-apac.hubspot.com—
HubSpot's hosted server-side GTM proxies, used to tunnel GA4 hits through
HubSpot's first-party domain.
Notes
- The
munchkin.marketo.netentry now documents that Marketo's bundled JS
ships JavaScript errors (which can include form values, URL params, and
DOM content) to Adobe's Sentry tenanto209747.ingest.us.sentry.io—
observed across multiple Marketo customer sites including Adobe's own
business.adobe.com, Airtable, and Gong. This is invisible to most
privacy reviews and worth surfacing in audit reports. - Several reverse-IP firmographic vendors (6sense, Demandbase, ZoomInfo,
Clearbit, Mutiny, Leadfeeder/Dealfront) carry notes about their legal
posture: company-level identification from IP creates GDPR-personal data
even on sites with no consent banner. - The matcher itself was not changed in this release. A separate
urlParamPatternmatching feature (for detecting enrichment vendors via
GA4up.db_*/up.*_6siuser-property leaks) is on the backlog.
Compatibility
No breaking API changes. Two semantic changes that downstream consumers may
notice:
_dd_s-using sites will now appear inanalytics/requiredrollups instead
ofsecurity/contested. If you bucket consent-burden levels into UI bands,
this will shift Datadog deployments from "minimal concern" toward "consent
required" — which matches reality.- The
bb_*andcb_*pattern removals mean cookies that previously matched
via these patterns will now returnnullfrommatchCookie(). If any
consumer relied on the old (wrong) attributions, those calls now correctly
surface as unmatched. Affected real-world cookies were re-attributed where
the actual vendor was identifiable (Clearbit'scb_*family), or left
unmatched where it wasn't (bb_visitor_id).
Full Changelog: v0.2.0...v0.3.0