Skip to content

architmeta/substack-scraper

Repository files navigation

Substack Scraper

A Python tool for scraping articles from Substack publications for research and content analysis.

Overview

This scraper extracts article metadata (date, author, headline, URL, subheading) from any Substack publication and exports it to CSV format. Designed for journalists, researchers, and data analysts studying media narratives and content patterns.

Output Format

CSV with the following columns:

date author_byline headline url subheading
2025-12-10 Author Name Article Title https://... Brief description

Examples

Works with any Substack publication:

  • Drop Site News: https://www.dropsitenews.com
  • Zeteo: https://zeteo.com
  • Any custom domain running on Substack

Recommended Use

  • Copy the code from the python file, edit the URL, rename the save file, and hit enter.

Features

  • Scrapes all articles from any Substack publication
  • Extracts: publication date, author byline, headline, URL, and subheading/description
  • Exports to clean CSV format
  • Respectful scraping with built-in delays
  • Fully commented code for learning and customization

Use Cases

  • Journalism research: Analyze coverage patterns across independent media
  • Content analysis: Study narrative framing and topic trends
  • Media monitoring: Track publication output over time
  • Academic research: Dataset creation for discourse analysis

Dependencies

External packages (install required):

  • requests - HTTP requests to API endpoints
  • beautifulsoup4 - HTML parsing for author extraction

Built-in packages (no install needed):

  • csv - CSV file writing
  • time - Rate limiting delays
  • urllib.parse - URL construction

Transparency

  • The code has been generated by Perplexity.
  • The code has been tested. The repo includes two sample CSVs as output.

License

MIT

About

Use this script to scrape all articles from a Substack website's archive page

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages