Freeze-dry: web page conservation
Freeze-dry stores a web page as it is shown in the browser. It takes the DOM, and returns it as an
HTML string, after having and inlined external resources such as images and stylesheets (as
It also ensures the snapshot is static and completely offline: all scripts are removed, and any attempt at internet connectivity is blocked by adding a content security policy. The resulting HTML document is a static, self-contained snapshot of the page.
For more details about how this exactly works, see src/Readme.md.
const html = await freezeDry(document, options)
options object is optional, and even
document can be omitted, in which case it will default
window.document. Possible options are:
timeout(number): Maximum time (in milliseconds) spent on fetching the page's subresources. The resulting HTML will have only succesfully fetched subresources inlined.
docUrl(string): overrides the documents's URL. This will influence the expansion of relative URLs, and is useful for cases where the document was constructed dynamically (e.g. using DOMParser).
addMetadata(boolean): If true (the default), a
linktag will be added to the returned html, noting the documents URL and time of snapshotting (that is, the current time).
The meta data mimics the HTTP headers defined for the Memento protocol. The added headers look like so:
<meta http-equiv="Memento-Datetime" content="Sat, 18 Aug 2018 18:02:20 GMT"> <link rel="original" href="https://example.com/main/page.html">
keepOriginalAttributes(boolean): If true (the default), preserves the original value of an element attribute if its URLs are inlined, by noting it as a new
data-original-...attribute. For example,
<img src="bg.png">would become
<img src="data:..." data-original-src="bg.png">. Note this is an unstandardised workaround to keep URLs of subresources available; unfortunately URLs inside stylesheets are still lost.
now(Date): Overrides the snapshot time (only relevant when
addMetadatais true). Mainly intended for testing purposes.
Note that the resulting string can easily be several megabytes when pages contain images, videos, fonts, etcetera.