Releases: ArchiveBox/ArchiveBox
v0.7.1: Minor new features, bugfixes, and new dependency versions
Get this release via pip, docker, brew, or dpkg (apt ppa update delayed).
# Get it with Pip on any OS (`amd64`, `arm64`, `arm/v7`)
pip install --upgrade 'archivebox==0.7.1'`# Get it with Docker on any OS (`amd64`, `arm64`, `arm/v7`)
docker pull archivebox/archivebox:0.7.1# Get it with brew on macOS (`amd64`, `arm64`)
brew tap archivebox/archivebox
brew install archivebox# Get it with apt on Ubuntu/Debian based systems (`any`)
wget 'https://github.com/ArchiveBox/debian-archivebox/raw/main/archivebox-0.7.1.deb'
apt install ./archivebox-0.7.1.deb
# OR
dpkg -i ./archivebox-0.7.1.debNote: this is not packaged using "proper" debian techniques like 0.6.2 was, instead it's just a wrapper for executing pip install archivebox w/ a few extras. This is because ArchiveBox relies on some binary and dynamic dependencies (node, chrome, playwright, ffmpeg, yt-dlp, etc.) which aren't allowed in Debian packages.
(Launchpad apt ppa update coming eventually, packaging for apt has gotten harder lately)
# Then run this to upgrade an existing collection data dir to 0.7.1
cd ~/path/to/data/dir
archivebox initWhat's Changed
Lots of bugfixes, speedups, and small convenience features.
- fix bookmarklet script by @dryrain39 in #708
- point to master image, not latest by @FiddlyRumpus in #739
- Docs: Improve spelling on readme by @Namdrib in #766
- Exempt /add route from CSRF by @tjhorner in #777
- Bump ws from 5.2.2 to 5.2.3 by @dependabot in #784
- Discard Referer header from iframe and link to original URL by @Inndy in #799
- Update setup.sh in #804
- Fix Pinboard RSS parsing valid links as
Noneby @overhacked in #822 - healthcheck endpoint by @ajgon in #873
- Update README.md by @adamwolf in #884
- Fixes Add button behavior on Safari by @adamwolf in #886
- Tweak JS so Safari can choose admin actions by @adamwolf in #885
- Avoid KeyError on Pocket API parser by @bltavares in #843
- (#847) Decode error output hints to string if needed by @TheCakeIsNaOH in #904
- Change logfile open to write mode only by @tuupola in #906
- Fix #725 - correctly parse tags on json import by @hannah98 in #908
- Bump ansi-regex from 5.0.0 to 5.0.1 by @dependabot in #910
- Bump jszip from 3.6.0 to 3.7.1 by @dependabot in #909
- Added TAG_SEPARATOR_PATTERN option for splitting tags by @hannah98 in #911
- Fix typo: volumes section in docker-compose.yml should use array notation by @akhilleusuggo in #918
- Fix broken URI fragment in README.md by @xfq in #942
- Fix typo in README.md by @hyfen in #932
- Fix bin_version: set LANG=C when calling executables to avoid parsing localized output by @pellaeon in #936
- Fix arch installation command by @CrazyPython in #923
- Update pywb entrypoint by @kusold in #961
- Fix missing input redirection in a hint text by @rossvor in #967
- improve title extractor by @prnake in #924
- Bump node-fetch from 2.6.1 to 2.6.7 by @dependabot in #969
- Add PikaPods as commercial hosting option by @m3nu in #974
- Attempted to warn on #984 and #1014 by @turian in #1020
- Method typo? by @EsEnZeT in #1048
- Added standalone dockerfile instructions by @turian in #1023
- Add missing migration 0021 by @turian in #1027
- get setup.sh to run on FreeBSD again (13.x) by @mwestza in #1068
- Warn on broken steps, use yt-dlp to avoid youtube-dl errors, and don't crash on bad UTF-8 by @turian in #1026
- Add SINGLEFILE_ARGS to control single-file arguments by @notevenaperson in #1021
- Support for Reverse Proxy authentication backends (like authelia) by @ajgon in #866
- Bump moment from 2.29.3 to 2.29.4 by @dependabot in #1081
- Install the CodeSee workflow. by @codesee-maps in #1103
- Revert "Install the CodeSee workflow." by @pirate in #1104
- add systemd config by @fa0311 in #1115
- add CHROME_TIMEOUT args by @fa0311 in #1120
- add explicitly specify --headless=new by @fa0311 in #1123
- Add missing closing quote to style attribute by @tejr in #1128
- Fix for Issue #1008 by @dcalano in #1131
New Contributors
Expand to see the list...
- @dryrain39 made their first contribution in #708
- @FiddlyRumpus made their first contribution in #739
- @Namdrib made their first contribution in #766
- @tjhorner made their first contribution in #777
- @Inndy made their first contribution in #799
- @ajgon made their first contribution in #873
- @TheCakeIsNaOH made their first contribution in #904
- @tuupola made their first contribution in #906
- @akhilleusuggo made their first contribution in #918
- @xfq made their first contribution in #942
- @hyfen made their first contribution in #932
- @pellaeon made their first contribution in #936
- @CrazyPython made their first contribution in #923
- @kusold made their first contribution in #961
- @rossvor made their first contribution in #967
- @prnake made their first contribution in #924
- @m3nu made their first contribution in #974
- @turian made their first contribution in #1020
- @EsEnZeT made their first contribution in #1048
- @mwestza made their first contribution in #1068
- @notevenaperson made their first contribution in #1021
- @codesee-maps made their first contribution in #1103
- @fa0311 made their first contribution in #1115
- @tejr made their first contribution in #1128
- @dcalano made their first contribution in #1131
Full Changelog: v0.6.2...v0.7.1
v0.6.2: >10x performance gain, new Admin UI & CLI features, and more
New features
- new ArchiveResult log in the admin web UI, with full editing ability of individual extractor outputs + list of outputs under each Snapshot admin entry
- ability to save multiple snapshots of the same URL over time using new
Re-snapshotbutton - add
init --quickandserver --quick-initoptions to quickly update the db version without doing a full re-init (for users with large archive collections this will make version upgrades a lot faster / less painful) - add new
archivebox setupcommand andarchivebox init --setupflag to aid in automatically installing dependencies and creating a superuser during initial setup - new
SNAPSHOTS_PER_PAGE=40andMEDIA_MAX_SIZE=750mconfig options - allow hotlinking directly to specific extractor output on the snapshot detail page using URL
#hashe.g./archive/<timestamp>/index.html#git - add ability to view snapshot matching a given URLs by visiting
/archive/https://example.com/some/url-> redirects to ->/archive/<timestamp>/index.html(also works without scheme/archive/example.com) - #660 add ability to tag URLs while adding them via the web UI and via the CLI using
archivebox add --tag=tag1,tag2,tag3 ... - #659 add back ability to override visual styling with custom HTML and CSS using new config option
CUSTOM_TEMPLATES_DIR - ability to add and remove multiple tags at once from the snapshot admin using autocompleting dropdown
Enhancements
- lots of performance improvements! (in testing with 100k entries, the main index was brought down from 10-14 second load times to ~110ms once cache warms up)
- full text search now works on the public snapshot list
- dates and times are now localized to your browser's timezone instead of showing in UTC
- integrity and correctness improvements to readability, mercury, warc, and other extractors
- video subtitles and description are now added to the full-text search index as well (including youtube's autogenerated transcripts in all languages)
- log all errors with full tracebacks to new
data/logs/errors.logfile (so users no longer have to run in --debug mode to see error details) - better
archivebox schedulelogging and changed logfile location to./logs/schedule.log - better docker-compose setup experience with sonic config example in
docker-compose.yml - add Django Debug Toolbar +
djdt_flamegraphfor developers to profile UI performance - add
--overwriteflag support toarchivebox schedule, archived urls get added similarly toadd --overwrite - #644 remove boostrap and jquery remove network requests to CDNs by inlining them instead
- #647 allow filtering by ArchiveResult status in the Snapshot admin UI to select only links that have been archived or not archived
- #550 kill all orphan child processes after each extractor finishes to prevent dangling chromium/node subprocesses and memory leaks
- 3276434 add new
SEARCH_BACKEND_TIMEOUTconfig option to tune amount of time search backend can take before it gives up - more diagnostic info added to the Snapshot admin view including most recent status code, content type, detected server, etc
- make the order of the table columns, layout, and spacing the same on the public view and private view (also remove DataTable, we're not using it)
- better snapshot grid page (faster load times, nicer CSS for tags and cards, more actions supported and metadata shown)
- added
Cache-Controlheaders to dramatically speed up load times by caching favicons, screenshots, etc. in browsers/upstreams - new project releases page https://releases.archivebox.io and demo url https://demo.archivebox.io
Bugfixes
- #673 fix searching by URL substring in Snapshot admin list
- #658 fix Snapshot admin action buttons not working in Safari and some other browsers
- #678 fix
AssertionErrorerror when archivebox would to attempt archive withCHROME_BINARY=Nonewhen Chrome was not found on host system - #654 fix some issues with sonic attempting to index massive text blobs or binary blobs on some pages and hanging
- #674 fix UTF-8 encoding encoding problems with file reading/writing on Windows (supporting a Python pkg on Windows is unreasonably painful ya'll)
- #433 fix deleted items sometimes reappearing on next import/update
- #473 fix issue preventing use of archivebox python API inside raw REPL (not using archivebox shell)
- fix stdin/stdout/stderr handling for some edge cases in Docker/Docker-Compose
v0.5.6: Bugfixes and packaging improvements
- add ARMv7 and ARMv8 CPU support for
apt/debdistribution on Launchpad PPA - fix nodesource apt repo not supported on i386 b90afc8
- fix handling of skipped ArchiveResult entries with null output 0aea5ed
- catch exception on import of old index.json into ArchiveResult 171bbeb
- move debsign to release not build 66fb5b2
- skip tests during debian build a32eac3
- fix emptystrings in cmd_version causing exception a49884a
- automate deb dist better and bump version 0e6ac39
- fix assertion 6705354
- change wording of db not found error 683a087
v0.5.4: New Snapshot detail UI, lots of bugfixes, speed improvements, and limit media downloads to 750mb by default
Thank you contributors who helped with the 181 commits in this release!
@cdvv7788, @jdcaballerov, @thedanbob, @aggroskater, @mAAdhaTTah, @mario-campos, @mikaelf
- fix migration failing due to null cmd_versions in older archives a3008c8
- Publish, minor, & major version to DockerHub and add set up CodeQL codeql-analysis.yml c5b7d9f, bbb6cc8
- fix DATABASE_NAME posixpath, and dependencies dict bug 02bdb3b, 5c7842f
- use relative imports for
.utilto fix windows import clash 72e2c7b - fix
COOKIES_FILEconfig param breaking in wget ef7711f - Refactor
should_save_extractormethods to acceptoverwriteparameter 5420903 - Fix issue #617 by using mark_safe in combination with format_html … 1989275
- make permission chowning on docker start less fancy, respect PUID/PGID #635
- add createsuperuser flag to server command 39ec77e
- fix files icons styling and use the db exclusively for rendering them, instead of filesystem f004058, 7d8fe66, 5c54bcc, 534ead2
- limit youtubedl download size to 750m and stop splitting out audio files 3227f54
- also search url, timestamp, tags on public index 8a4edb4
- fix trailing slash problems and wget not detecting download path 9764a8e
- add response status code to headers.json c089501
- fix singlefile path used for sonic 24e2493
- cleanup template layout in filesystem, new snapshot detail page UI
v0.5.3: New grid UI, full-text search, oneshot subcommand, Pocket API and Wallabag importers, bufixes, and packaging improvements
- ArchiveResult moved to SQLite3 DB for performance @cdvv7788
- lots of assorted bugfixes and improvements courtesy of @cdvv7788 and @jdcaballerov
- new full-text search support with ripgrep and sonic courtesy of @jdcaballerov
- new
archivebox oneshotcommand for downloading a single site without starting a whole collection - new Pocket API importer courtesy of @mAAdhaTTah
- new Wallabag importer courtesy of @ehainry
- new extractor options on Add page courtesy of @BlipRanger
- new apt/deb/homebrew/pip packaging setup into separate repos under new Github Org https://github.com/ArchiveBox
- new official PPA and Docker Hub accounts https://hub.docker.com/r/archivebox/archivebox (with automatic armv7 builds courtesy of @chrismeller)
- new Snapshot grid view courtesy of @jdcaballerov

v0.4.24: Packaging improvements, UI improvements, and bugfixes
Last stable version for the v0.4 branch, contains numerous last fixes an improvements to v0.4 before the leap to v0.5.
v0.4.21: Better Node dependency version checking and sdist PATH fixes
v0.4.17: Bugfixes and CLI experience improvements
- Fix bugs with parsing long URLs as paths
- html-encoded URLs
- new generic HTML parser
- new
--initand--overwriteflags onadd - improve stdout and hints
- fix Pull title button
- other small bugfixes
v0.4.16: Fix issue with readability archiving timing out
A minor bugfix release for the Readability archive method to avoid timing out killing the whole archiving process.
v0.4.15: Add support for scheduled archiving in docker
- fix a bug where invalid URLs where attempted to be parsed an imported, causing the whole archive process to crash
- add support for scheduled archiving in docker
docker run -v $PWD:/data archivebox schedule --foreground --every=day --depth=1 'https://getpocket.com/users/USERNAME/feed/all'# docker-compose.yml
version: '3.7'
services:
archivebox:
image: nikisweeting/archivebox:latest
command: schedule --foreground --every=day --depth=1 'https://getpocket.com/users/USERNAME/feed/all'
environment:
- USE_COLOR=True
- SHOW_PROGRESS=False
volumes:
- ./data:/data

