bpc-fetch

Bypass paywall sites — search, discover, and batch fetch articles as Markdown.

中文文档 • 936 Sites • Features • Install • Usage • Credits

What is this?

A command-line tool that fetches full-text articles from 936 paywalled news sites and saves them as clean Markdown with images. It replicates the bypass logic of the Bypass Paywalls Clean browser extension, but runs headlessly — no browser extension needed.

Supported Sites

936 sites across 40+ countries. Highlights:

Category	Sites
Financial	The Economist, Financial Times, Bloomberg, WSJ, Reuters, Forbes, Business Insider
US News	New York Times, Washington Post, LA Times, Chicago Tribune, Politico
UK/EU	The Telegraph, The Times, Der Spiegel, Le Monde, El País, Corriere della Sera
Tech/Science	Wired, The Atlantic, Nature, Science, Scientific American, MIT Tech Review
Magazines	The New Yorker, Vanity Fair, Vogue, National Geographic, Esquire
German	76 sites (FAZ, Handelsblatt, Süddeutsche Zeitung...)
French	69 sites (Le Figaro, Libération, Les Echos...)
More	Netherlands 30, Italy 28, Spain 26, Belgium 22, Australia 39...

Run bpc-fetch sites to see the full list.

Features

Full bypass coverage — Replicates all BPC extension strategies: custom User-Agent, Googlebot/Bingbot spoofing, referer manipulation, Playwright JS interception, archive.org fallback
Auto fallback chain — Each URL tries the optimal strategy, degrades gracefully until content is retrieved
Article discovery — Find recent articles via RSS, sitemap, or browser-rendered homepage
Cross-site crawl — Search + time filter + batch download in one command
Agent-friendly — JSON stdout, stderr progress, next_command hints in every response
Windows exe — Single-file distribution via PyInstaller, auto-downloads Chromium on first run

Installation

pip (recommended)

pip install bpc-fetch
playwright install chromium

From source

git clone https://github.com/user/bpc-fetch.git
cd bpc-fetch
pip install -e .
playwright install chromium

Windows exe

Download bpc-fetch.exe from Releases, then:

bpc-fetch.exe install-browser
bpc-fetch.exe doctor

Usage

# Check setup
bpc-fetch doctor

# List supported sites
bpc-fetch sites --filter economist

# Discover today's articles from a site
bpc-fetch discover economist.com --since today

# Fetch a single article
bpc-fetch fetch "https://www.economist.com/leaders/2024/01/01/example" --out-dir ./articles

# Batch fetch from URL list
bpc-fetch batch --file urls.txt --out-dir ./articles

# Cross-site crawl: keyword + time range
bpc-fetch crawl "AI regulation" --sites economist.com,ft.com --since 7d --out-dir ./ai-articles

Output format

article-title/
├── article-title.md      # YAML frontmatter + full text + image refs
└── images/
    ├── img_000_abc1.jpg
    └── img_001_def2.png

Agent integration

All commands output JSON. Use --compact for minimal output:

bpc-fetch discover ft.com --since today --compact
# → {"ok": true, "domain": "ft.com", "count": 15, "articles": [...], "next_command": "bpc-fetch batch ..."}

Bypass Strategies

Strategy	Sites	Method
`ua:custom`	7	Custom User-Agent string (Liskov, Google-InspectionTool, etc.)
`ua:googlebot`	85	Googlebot User-Agent
`ua:facebookbot`	5	Facebook crawler UA
`referer:google`	2	Google referer header
`block_js`	425	Playwright blocks paywall scripts via `Page.route()`
`archive`	274	Fetch from archive.org/archive.is
`cookies`	138	Access without tracking cookies

Building Windows exe

pip install pyinstaller
python build/build_win.py
# Output: dist/bpc-fetch.exe

Credits

This tool is built on top of the bypass logic from:

Bypass Paywalls Clean by magnolia1234 — the original browser extension that provides the site database and bypass strategies. All credit for the paywall bypass research goes to the BPC project maintainers.
LINUX DO — Community support and feedback.

License

MIT — see LICENSE.

The data/sites.js file is from the Bypass Paywalls Clean project (MIT License).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
data		data
packaging		packaging
src/bpc_fetch		src/bpc_fetch
.cli-health-ignore		.cli-health-ignore
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bpc-fetch

What is this?

Supported Sites

Features

Installation

pip (recommended)

From source

Windows exe

Usage

Output format

Agent integration

Bypass Strategies

Building Windows exe

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bpc-fetch

What is this?

Supported Sites

Features

Installation

pip (recommended)

From source

Windows exe

Usage

Output format

Agent integration

Bypass Strategies

Building Windows exe

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages