Substack Scraper

Tools to scrape articles from a Substack publication and convert them into Markdown for a Hugo website.

Prerequisites

Python 3
substack-api library

Recommended Setup (using venv):

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

1. Scrape an Article

Use scrape_substack.py to fetch an article by its URL. This saves the raw content and metadata to a JSON file in the scraped_data directory.

# Ensure venv is active or use direct path
./venv/bin/python3 scrape_substack.py <SUBSTACK_ARTICLE_URL>

Example:

./venv/bin/python3 scrape_substack.py https://{your_substack_url}.substack.com/p/{your_article_id}

2. Process to Markdown

Use process_letter_json.py to convert the scraped JSON files (in scraped_data) into Hugo-ready Markdown files (in output). this script handles cleaning HTML, removing widgets, and formatting the frontmatter.

./venv/bin/python3 process_letter_json.py scraped_data

The generated markdown files will be in the output/ directory and can be moved to your Hugo content content directory (e.g. content/posts/letters) as needed.

Agent Prompt

If you are using an AI agent and want to automate this flow, you can use the following prompt:

Task: Scrape and process this Substack article: [INSERT_URL_HERE]

Steps:

Run python3 scrape_substack.py [INSERT_URL_HERE] to fetch the data into scraped_data/.

Run python3 process_letter_json.py scraped_data to convert it to Markdown in output/.

Verify the generated markdown file in output/.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
example		example
LICENSE		LICENSE
README.md		README.md
process_letter_json.py		process_letter_json.py
requirements.txt		requirements.txt
scrape_substack.py		scrape_substack.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Substack Scraper

Prerequisites

Usage

1. Scrape an Article

2. Process to Markdown

Agent Prompt

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Uh oh!

License

awangdev/substack_scraper

Folders and files

Latest commit

History

Repository files navigation

Substack Scraper

Prerequisites

Usage

1. Scrape an Article

2. Process to Markdown

Agent Prompt

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages