╔═╗┌─┐┬─┐┌─┐┌─┐┌─┐╔═╗┬─┐┌─┐
╚═╗│ ├┬┘├─┤├─┘├─╚╗╠═╝├┬┘│ │
╚═╝└─┘┴└─┴ ┴┴ └─╚╝╩ ┴└─└─┘
Professional Web Scraping Toolkit
A fast, lightweight CLI tool for web scraping and data extraction. Built with Python — no browser required.
ScrapePro Lite scrapes static HTML pages — blogs, news sites, e-commerce product pages, job boards, directories, and any server-rendered website.
Best for:
- News sites & blogs
- E-commerce product pages
- Job listing sites
- Business directories
- Wikipedia & documentation sites
- Any server-rendered HTML
Not for: JavaScript-heavy sites like YouTube, Twitter/X, Instagram, or modern SPAs. For those, check out ScrapePro Full.
- Smart scraping — auto-detects page structure and extracts all useful data
- CSS selector and XPath extraction
- Table extraction from any HTML tables
- Article text extraction (strips nav, ads, footers)
- Metadata extraction (meta tags, OpenGraph, JSON-LD)
- Site crawling with configurable depth
- Change detection for monitoring
- Export to JSON, CSV, XLSX, Markdown, and SQLite
- Rate limiting and polite crawling (respects robots.txt)
- Retry logic with exponential backoff
- Beautiful terminal output with progress bars
- Auto-detection of page types (e-commerce, articles, job listings)
pip install -r requirements.txtpython main.py scrape https://example.compython main.py scrape https://news.ycombinator.com --css ".titleline > a"
python main.py scrape https://example.com --css "h2.article-title"python main.py scrape https://example.com --xpath "//h1/text()"
python main.py scrape https://example.com --xpath "//div[@class='price']/text()"python main.py scrape https://en.wikipedia.org/wiki/Python_(programming_language) --tables
python main.py scrape https://example.com --links
python main.py scrape https://example.com --images
python main.py scrape https://example.com --metadata
python main.py scrape https://example.com/blog/post-1 --textpython main.py crawl https://example.com --depth 3python main.py scrape https://example.com
python main.py export json
python main.py export csv
python main.py export xlsx
python main.py export md
python main.py export sqlitepython main.py schedule https://example.com --interval 60python main.py compare snapshot_1.json snapshot_2.jsonpython main.py --demolite/
├── main.py # CLI entry point
├── scraper.py # Core scraping engine
├── parsers.py # Specialized page-type parsers
├── exporters.py # Export to JSON/CSV/XLSX/MD/SQLite
├── config.py # Configuration management
├── test_scraper.py # 35 tests
├── requirements.txt # Dependencies
└── README.md # This file
pip install pytest
pytest test_scraper.py -vrequests— HTTP clientbeautifulsoup4+lxml— HTML parsingrich— Beautiful terminal outputclick— CLI frameworkopenpyxl— Excel exportfake-useragent— User agent rotationretrying— Retry logic
Need JavaScript rendering for YouTube, Twitter, or modern SPAs? ScrapePro Full adds Playwright-powered headless browser rendering while keeping all the same features.
MIT