A Python tool for scraping articles from Substack publications for research and content analysis.
This scraper extracts article metadata (date, author, headline, URL, subheading) from any Substack publication and exports it to CSV format. Designed for journalists, researchers, and data analysts studying media narratives and content patterns.
CSV with the following columns:
| date | author_byline | headline | url | subheading |
|---|---|---|---|---|
| 2025-12-10 | Author Name | Article Title | https://... | Brief description |
Works with any Substack publication:
- Drop Site News:
https://www.dropsitenews.com - Zeteo:
https://zeteo.com - Any custom domain running on Substack
- Copy the code from the python file, edit the URL, rename the save file, and hit enter.
- Scrapes all articles from any Substack publication
- Extracts: publication date, author byline, headline, URL, and subheading/description
- Exports to clean CSV format
- Respectful scraping with built-in delays
- Fully commented code for learning and customization
- Journalism research: Analyze coverage patterns across independent media
- Content analysis: Study narrative framing and topic trends
- Media monitoring: Track publication output over time
- Academic research: Dataset creation for discourse analysis
External packages (install required):
requests- HTTP requests to API endpointsbeautifulsoup4- HTML parsing for author extraction
Built-in packages (no install needed):
csv- CSV file writingtime- Rate limiting delaysurllib.parse- URL construction
- The code has been generated by Perplexity.
- The code has been tested. The repo includes two sample CSVs as output.
MIT