You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hey, before building it i wanted to check if there'd be interest in an optional Firefox-based stealth scraper backend under gpt_researcher/scraper/, parallel to firecrawl / browser / web_base_loader / tavily_extract.
motivation: a growing share of research-relevant pages sit behind Cloudflare, Akamai, Datadome, or hCaptcha. relevant open issues are #1685 (anybrowse MCP for Cloudflare), #1081 (Crunchbase Cloudflare block), #1602 (Firecrawl returns empty results), #1404 (self-hosted Firecrawl returns 0). a Firefox build with stealth patches at the C++ source level covers those cases without JS shims to detect.
the backend would wrap feder-cr/invisible_playwright, which drives a patched Firefox 150 (feder-cr/invisible_firefox, MPL-2, same license as Firefox upstream). selected when the user sets SCRAPER=invisible_firefox. optional dependency, only imported when selected. no change to defaults.
also opened a draft PR #1786 with an RFC stub in docs/docs/proposals/ so the proposal has somewhere concrete to land.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
hey, before building it i wanted to check if there'd be interest in an optional Firefox-based stealth scraper backend under
gpt_researcher/scraper/, parallel to firecrawl / browser / web_base_loader / tavily_extract.motivation: a growing share of research-relevant pages sit behind Cloudflare, Akamai, Datadome, or hCaptcha. relevant open issues are #1685 (anybrowse MCP for Cloudflare), #1081 (Crunchbase Cloudflare block), #1602 (Firecrawl returns empty results), #1404 (self-hosted Firecrawl returns 0). a Firefox build with stealth patches at the C++ source level covers those cases without JS shims to detect.
the backend would wrap feder-cr/invisible_playwright, which drives a patched Firefox 150 (feder-cr/invisible_firefox, MPL-2, same license as Firefox upstream). selected when the user sets
SCRAPER=invisible_firefox. optional dependency, only imported when selected. no change to defaults.also opened a draft PR #1786 with an RFC stub in docs/docs/proposals/ so the proposal has somewhere concrete to land.
Beta Was this translation helpful? Give feedback.
All reactions