Enhancement: Use the same URL layout as Archive.org for viewing ArchiveBox Snapshots https://archive.org/web/<URL>
#1085
Labels
expected: next release
size: medium
status: backlog
Work is planned someday but is not the highest priority at the moment
touches: API/CLI/user interface
touches: docs
type: enhancement
why: functionality
Intended to improve ArchiveBox functionality or features
To visit an archived version of a website (or archive it automatically) on Archive.org, one can just visit
http://web.archive.org/web/https://example.com/
and it will redirect tohttp://web.archive.org/web/20230116145642/https://example.com/
(or whatever the most recent snapshot timestamp is).To really emobdy the tagline "ArchiveBox is a self-hosted version of archive.org" we should properly support their URL scheme too.
e.g.
https://demo.archivebox.io/web/https://example.com
should redirect to the most recent snapshothttps://demo.archivebox.io/web/20230116145642/https://example.com
1673919713
or the Archive.org-style20230116145642
format and truncated forms2023
,202301
,20230116
https://demo.archivebox.io/01ARZ3NDEKTSV4RRFFQ69G5FAV/...
2023
matches202301
,20230116
,20230116145642
automatically, and01AN4Z07BY
matches01AN4Z07BY79KA1307SR9X4MV3
automaticallyFull spec:
https://demo.archivebox.io/web/<SLUG>
whereSLUG
can be:- an original URL, with or without scheme, e.g.
https://example.com/index.html
, 'example.com/index.html' ➡️ redirect to most recent snapshot forhttps://demo.archivebox.io/web/20230116145642/https://example.com/index.html
- an ArchiveBox snapshot UUID in
ulid/spec
format01AN4Z07BY79KA1307SR9X4MV3/index.html
or timestamp prefix01AN4Z07BY/index.html
➡️ redirect to that exact snapshothttps://demo.archivebox.io/web/20230116145642/https://example.com/index.html
- an ArchiveBox snapshot timestamp in
YYMMDDHHMMSS
, shortened forms likeYYYYMM
, or unix timestamp format e.g.20230116145642/index.html
or202301161456/index.html
,202301/index.html
,1673919713/index.html
➡️ redirect to most recent snapshot matching that prefixhttps://demo.archivebox.io/web/20230116145642/https://example.com/index.html
Subtasks:
ulid
field + migration to coalesce old uuid and timestamp fields into new ulid format (+asserts all snapshot timestamps are valid and are between 1900 and 2100 AD)xxxx-xxxx-xxxxxxx
format, add ULID diagram in docs breaking it down into timestamp and randomness0
,1
,2
,htt
to make prefix-matching faster and less error prone (avoids clashing with199x*
/20**
year,1*
unix timestamp,01*
ULIDs, orhttp(s?)
URL slug prefixes)The text was updated successfully, but these errors were encountered: