Skip to content
Danny Lin edited this page Jan 20, 2024 · 28 revisions

Introduction to WebScrapBook

WebScrapBook is a browser extension that captures the web page faithfully with various archive formats and customizable configurations, for future retrieval, organization, annotation, and editing. This project inherits from legacy Firefox add-on ScrapBook X.

Features

  1. Capture faithfully: A web page shown in the browser can be captured without losing any subtle detail. Metadata such as source URL and timestamp are also recorded.
  2. Customizable capture: WebScrapBook can save selected area in a page, save source page (before processed by scripts), or save page as a bookmark. How to capture images, audio, video, fonts, frames, styles, scripts, etc. are also customizable. A web page can be saved as a folder, a ZIP-based archive file (HTZ or MAFF), or a single HTML file.
  3. Page editing: A web page can be highlighted, annotated, or edited before or after a capture.
  4. Organizable collections: Captured pages can be organized in the browser sidebar using one or more scrapbooks, and each scrapbooks holds a hierarchical tree structure to organize data items. Notes using HTML or markdown format can also be created and managed. (*)
  5. Fulltext searching: Each scrapbook can be further indexed for a rich-feature search (using title, fulltext, comment, source URL, create time, modify time, etc.). (*)
  6. Remote access: Captured data can be hosted with a central backend server and be read or edited from other devices. Alternatively, a scrapbook can generate a static site index and be distributed as a static web site. (*)
  7. Mobile support: WebScrapBook supports mobile browsers such as Firefox for Android and Kiwi browser. You can capture and edit the web page from a mobile phone or tablet.
  8. Legacy ScrapBook support: Scrapbooks created from legacy ScrapBook or ScrapBook X can be converted into WebScrapBook-compliant format for usage. (*)
  • All or partial functionality of a starred feature above requires a running collaborating backend server, which can be easily set up using PyWebScrapBook.
  • An HTZ or MAFF archive file can be viewed using the built-in archive page viewer, using PyWebScrapBook or other assistant tools, or by opening the index page after unzipping.

Installation

WebScrapBook is available for Chromium-based browsers (Google Chrome, Edge, Opera, Vivaldi, Brave, etc.), and Firefox-based browsers (Firefox for Desktop or Android, Tor Browser, etc.). Just go to the app store of the corresponding browser and install this extension.

Known mobile browsers that support installation of the extension:

  • Android: Firefox for Android, Kiwi Browser
  • iOS: none

Installation from source code

You can also install this extension from source code, as long as the browser supports.

Generate an installation package from the source code

  1. Download the latest source code from this repository and unpack.
  2. Run build/pack.cmd (on Windows) or build/pack.sh (on POSIX/Linux).
  3. There will be dist/WebScrapBook.zip (for Chromium) and dist/WebScrapBook.xpi (for Firefox) generated for installation.

Install in a Chromium-based browser

  1. Make sure the browser supports installation from a package.
    • Known supported: Chrome, Edge, Brave, Kiwi
  2. Go the the extension management page.
  3. Check Developer mode.
  4. Load the .zip package file generated in the above section in the extension management page (through a button like Load Package, or dragging and dropping the file into the management page).

Install in a Firefox-based browser

  1. Make sure the browser supports installation of an unsigned add-on. (Consult this document for more details.)
    • Known NOT supported: Firefox Release, Firefox Beta
    • Known supported: Firefox ESR, Firefox Nightly, Firefox Developer Edition, Waterfox, Tor Browser
  2. Set config xpinstall.signatures.required to false. (Type about:config in the URL address bar to enter the config page.)
  3. Load the .xpi package file generated in the above section in a tab (through Install Add-on From File command in the add-on management page, or File > Open File from the main menu, or dragging and dropping the file to the tab list).