Tools to scrape articles from a Substack publication and convert them into Markdown for a Hugo website.
- Python 3
substack-apilibrary
Recommended Setup (using venv):
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtUse scrape_substack.py to fetch an article by its URL. This saves the raw content and metadata to a JSON file in the scraped_data directory.
# Ensure venv is active or use direct path
./venv/bin/python3 scrape_substack.py <SUBSTACK_ARTICLE_URL>Example:
./venv/bin/python3 scrape_substack.py https://{your_substack_url}.substack.com/p/{your_article_id}Use process_letter_json.py to convert the scraped JSON files (in scraped_data) into Hugo-ready Markdown files (in output). this script handles cleaning HTML, removing widgets, and formatting the frontmatter.
./venv/bin/python3 process_letter_json.py scraped_dataThe generated markdown files will be in the output/ directory and can be moved to your Hugo content content directory (e.g. content/posts/letters) as needed.
If you are using an AI agent and want to automate this flow, you can use the following prompt:
Task: Scrape and process this Substack article:
[INSERT_URL_HERE]Steps:
- Run
python3 scrape_substack.py [INSERT_URL_HERE]to fetch the data intoscraped_data/.- Run
python3 process_letter_json.py scraped_datato convert it to Markdown inoutput/.- Verify the generated markdown file in
output/.