Skip to content

Releases: ArchiveBox/ArchiveBox

v0.7.1: Minor new features, bugfixes, and new dependency versions

04 May 05:53
Compare
Choose a tag to compare

Get this release via pip, docker, brew, or dpkg (apt ppa update delayed).

# Get it with Pip on any OS (`amd64`, `arm64`, `arm/v7`)
pip install --upgrade 'archivebox==0.7.1'`
# Get it with Docker on any OS (`amd64`, `arm64`, `arm/v7`)
docker pull archivebox/archivebox:0.7.1
# Get it with brew on macOS (`amd64`, `arm64`)
brew tap archivebox/archivebox
brew install archivebox
# Get it with apt on Ubuntu/Debian based systems (`any`)
wget 'https://github.com/ArchiveBox/debian-archivebox/raw/main/archivebox-0.7.1.deb'
apt install ./archivebox-0.7.1.deb
# OR
dpkg -i ./archivebox-0.7.1.deb

Note: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing pip install archivebox w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.

(Launchpad apt ppa update coming eventually, packaging for apt has gotten harder lately)


# Then run this to upgrade an existing collection data dir to 0.7.1
cd ~/path/to/data/dir
archivebox init

What's Changed

Lots of bugfixes, speedups, and small convenience features.

New Contributors

Expand to see the list...

Full Changelog: v0.6.2...v0.7.1

v0.6.2: >10x performance gain, new Admin UI & CLI features, and more

10 Apr 12:24
Compare
Choose a tag to compare

New features

  • new ArchiveResult log in the admin web UI, with full editing ability of individual extractor outputs + list of outputs under each Snapshot admin entry
  • ability to save multiple snapshots of the same URL over time using new Re-snapshot button
  • add init --quick and server --quick-init options to quickly update the db version without doing a full re-init (for users with large archive collections this will make version upgrades a lot faster / less painful)
  • add new archivebox setup command and archivebox init --setup flag to aid in automatically installing dependencies and creating a superuser during initial setup
  • new SNAPSHOTS_PER_PAGE=40 and MEDIA_MAX_SIZE=750m config options
  • allow hotlinking directly to specific extractor output on the snapshot detail page using URL #hash e.g. /archive/<timestamp>/index.html#git
  • add ability to view snapshot matching a given URLs by visiting /archive/https://example.com/some/url -> redirects to -> /archive/<timestamp>/index.html (also works without scheme /archive/example.com)
  • #660 add ability to tag URLs while adding them via the web UI and via the CLI using archivebox add --tag=tag1,tag2,tag3 ...
  • #659 add back ability to override visual styling with custom HTML and CSS using new config option CUSTOM_TEMPLATES_DIR
  • ability to add and remove multiple tags at once from the snapshot admin using autocompleting dropdown

Enhancements

  • lots of performance improvements! (in testing with 100k entries, the main index was brought down from 10-14 second load times to ~110ms once cache warms up)
  • full text search now works on the public snapshot list
  • dates and times are now localized to your browser's timezone instead of showing in UTC
  • integrity and correctness improvements to readability, mercury, warc, and other extractors
  • video subtitles and description are now added to the full-text search index as well (including youtube's autogenerated transcripts in all languages)
  • log all errors with full tracebacks to new data/logs/errors.log file (so users no longer have to run in --debug mode to see error details)
  • better archivebox schedule logging and changed logfile location to ./logs/schedule.log
  • better docker-compose setup experience with sonic config example in docker-compose.yml
  • add Django Debug Toolbar + djdt_flamegraph for developers to profile UI performance
  • add --overwrite flag support to archivebox schedule, archived urls get added similarly to add --overwrite
  • #644 remove boostrap and jquery remove network requests to CDNs by inlining them instead
  • #647 allow filtering by ArchiveResult status in the Snapshot admin UI to select only links that have been archived or not archived
  • #550 kill all orphan child processes after each extractor finishes to prevent dangling chromium/node subprocesses and memory leaks
  • 3276434 add new SEARCH_BACKEND_TIMEOUT config option to tune amount of time search backend can take before it gives up
  • more diagnostic info added to the Snapshot admin view including most recent status code, content type, detected server, etc
  • make the order of the table columns, layout, and spacing the same on the public view and private view (also remove DataTable, we're not using it)
  • better snapshot grid page (faster load times, nicer CSS for tags and cards, more actions supported and metadata shown)
  • added Cache-Control headers to dramatically speed up load times by caching favicons, screenshots, etc. in browsers/upstreams
  • new project releases page https://releases.archivebox.io and demo url https://demo.archivebox.io

Bugfixes

  • #673 fix searching by URL substring in Snapshot admin list
  • #658 fix Snapshot admin action buttons not working in Safari and some other browsers
  • #678 fix AssertionError error when archivebox would to attempt archive with CHROME_BINARY=None when Chrome was not found on host system
  • #654 fix some issues with sonic attempting to index massive text blobs or binary blobs on some pages and hanging
  • #674 fix UTF-8 encoding encoding problems with file reading/writing on Windows (supporting a Python pkg on Windows is unreasonably painful ya'll)
  • #433 fix deleted items sometimes reappearing on next import/update
  • #473 fix issue preventing use of archivebox python API inside raw REPL (not using archivebox shell)
  • fix stdin/stdout/stderr handling for some edge cases in Docker/Docker-Compose

image
image

v0.5.6: Bugfixes and packaging improvements

09 Feb 14:25
9766ea2
Compare
Choose a tag to compare
  • add ARMv7 and ARMv8 CPU support for apt / deb distribution on Launchpad PPA
  • fix nodesource apt repo not supported on i386 b90afc8
  • fix handling of skipped ArchiveResult entries with null output 0aea5ed
  • catch exception on import of old index.json into ArchiveResult 171bbeb
  • move debsign to release not build 66fb5b2
  • skip tests during debian build a32eac3
  • fix emptystrings in cmd_version causing exception a49884a
  • automate deb dist better and bump version 0e6ac39
  • fix assertion 6705354
  • change wording of db not found error 683a087

v0.5.4: New Snapshot detail UI, lots of bugfixes, speed improvements, and limit media downloads to 750mb by default

01 Feb 08:11
Compare
Choose a tag to compare

Thank you contributors who helped with the 181 commits in this release!
@cdvv7788, @jdcaballerov, @thedanbob, @aggroskater, @mAAdhaTTah, @mario-campos, @mikaelf

  • fix migration failing due to null cmd_versions in older archives a3008c8
  • Publish, minor, & major version to DockerHub and add set up CodeQL codeql-analysis.yml c5b7d9f, bbb6cc8
  • fix DATABASE_NAME posixpath, and dependencies dict bug 02bdb3b, 5c7842f
  • use relative imports for .util to fix windows import clash 72e2c7b
  • fix COOKIES_FILE config param breaking in wget ef7711f
  • Refactor should_save_extractor methods to accept overwrite parameter 5420903
  • Fix issue #617 by using mark_safe in combination with format_html … 1989275
  • make permission chowning on docker start less fancy, respect PUID/PGID #635
  • add createsuperuser flag to server command 39ec77e
  • fix files icons styling and use the db exclusively for rendering them, instead of filesystem f004058, 7d8fe66, 5c54bcc, 534ead2
  • limit youtubedl download size to 750m and stop splitting out audio files 3227f54
  • also search url, timestamp, tags on public index 8a4edb4
  • fix trailing slash problems and wget not detecting download path 9764a8e
  • add response status code to headers.json c089501
  • fix singlefile path used for sonic 24e2493
  • cleanup template layout in filesystem, new snapshot detail page UI

Screen Shot 2021-01-30 at 9 53 22 p

v0.5.3: New grid UI, full-text search, oneshot subcommand, Pocket API and Wallabag importers, bufixes, and packaging improvements

06 Jan 19:46
Compare
Choose a tag to compare

v0.4.24: Packaging improvements, UI improvements, and bugfixes

03 Dec 16:57
b186e98
Compare
Choose a tag to compare

Last stable version for the v0.4 branch, contains numerous last fixes an improvements to v0.4 before the leap to v0.5.

v0.4.21: Better Node dependency version checking and sdist PATH fixes

18 Aug 23:44
Compare
Choose a tag to compare

v0.4.17: Bugfixes and CLI experience improvements

18 Aug 13:50
Compare
Choose a tag to compare
  • Fix bugs with parsing long URLs as paths
  • html-encoded URLs
  • new generic HTML parser
  • new --init and --overwrite flags on add
  • improve stdout and hints
  • fix Pull title button
  • other small bugfixes

v0.4.16: Fix issue with readability archiving timing out

18 Aug 06:15
Compare
Choose a tag to compare

A minor bugfix release for the Readability archive method to avoid timing out killing the whole archiving process.

v0.4.15: Add support for scheduled archiving in docker

18 Aug 06:03
Compare
Choose a tag to compare
  • fix a bug where invalid URLs where attempted to be parsed an imported, causing the whole archive process to crash
  • add support for scheduled archiving in docker
docker run -v $PWD:/data archivebox schedule --foreground --every=day --depth=1 'https://getpocket.com/users/USERNAME/feed/all'
# docker-compose.yml

version: '3.7'

services:
  archivebox:
    image: nikisweeting/archivebox:latest
    command: schedule --foreground --every=day --depth=1 'https://getpocket.com/users/USERNAME/feed/all'
    environment:
      - USE_COLOR=True
      - SHOW_PROGRESS=False
    volumes:
      - ./data:/data