Skip to content

Node tool to scrape and transform articles into Markdown for local reading

License

Notifications You must be signed in to change notification settings

chrisodicho/article-archiver

Repository files navigation

Article Archiver production

The purpose of this library is to convert online articles and blog posts into local markdown by only preserving:

  • article content
  • media assets
  • meta data

The heavy lifting around scraping is done with Cypress and the content is enhanced with Mozilla Readability.


Getting Started

⚠️ This library is under development and not expected to work until the TODO's are completed ⚠️

Installation

npm install -g article-archiver

Usage

npx article-archiver <urls>

Architecture

Architecture

TODO

  • setup cypress
  • configure cypress to scrape URL's
  • implement code cleaner and enhancer
  • implement readability
  • wire up scraper to enhancer
  • setup http server for tmp files
  • setup website-scraper
  • wire up archiver to save local assets to tmp folder
  • setup utf8 and turndown transformers
  • wire up transformer to merge meta data and write to output