Skip to content

WWW: Archive old aboutcode.org websites #147

@mjherzog

Description

@mjherzog

Before we publish www.aboutcode.org as a "production" website (on Dreamhost) we need to archive a copy of the current WordPress-based website.
The objective is to download the content to a set of HTML (or similar?) files that we can archive for future reference - likely to be stored in an archive in this repo. We will not need to actually operate the website after archiving - we just need to be able to view the page content.

My Gemini-Google search surfaced two primary FOSS options to create this archive:

Wget
Best For: Advanced users who need a powerful command-line tool for precise mirroring.
Features: Using the --mirror command, Wget can create a complete local copy of a site’s directory structure. It is versatile, supporting HTTP, HTTPS, and FTP protocols.

Command Example: wget --mirror --convert-links --adjust-extension --page-requisites --no-parent <http://example.com>

Some relevant links are:

HTTrack
Best For: Creating a functional, offline mirror of a website with its original link structure intact.
Features: It crawls a site recursively, downloading pages, images, and scripts, and converts absolute links to relative ones for offline browsing. It is available for Windows (WinHTTrack), Linux, and Android.
Note: It is highly effective for static sites but may struggle with modern, JavaScript-heavy dynamic content.

Please start with Wget and see what we can get. -: )

Metadata

Metadata

Assignees

Labels

websiteFor issues and PRs for www.aboutcode.org

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions