Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Use the same URL layout as Archive.org for viewing ArchiveBox Snapshots https://archive.org/web/<URL> #1085

Open
pirate opened this issue Jan 17, 2023 · 0 comments
Labels
expected: next release size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/user interface touches: docs type: enhancement why: functionality Intended to improve ArchiveBox functionality or features

Comments

@pirate
Copy link
Member

pirate commented Jan 17, 2023

To visit an archived version of a website (or archive it automatically) on Archive.org, one can just visit http://web.archive.org/web/https://example.com/ and it will redirect to http://web.archive.org/web/20230116145642/https://example.com/ (or whatever the most recent snapshot timestamp is).

To really emobdy the tagline "ArchiveBox is a self-hosted version of archive.org" we should properly support their URL scheme too.

e.g.

  • https://demo.archivebox.io/web/https://example.com should redirect to the most recent snapshot https://demo.archivebox.io/web/20230116145642/https://example.com

    • note: support both the ArchiveBox-style timestamp in unix timestamp format e.g. 1673919713 or the Archive.org-style 20230116145642 format and truncated forms 2023, 202301, 20230116
    • note: also support visiting using snapshots using ulid uuid instead of timestamp as slug, e.g. https://demo.archivebox.io/01ARZ3NDEKTSV4RRFFQ69G5FAV/...
    • note: support auto prefix-matching slugs so that 2023 matches 202301, 20230116, 20230116145642 automatically, and 01AN4Z07BY matches 01AN4Z07BY79KA1307SR9X4MV3 automatically

    Full spec:

https://demo.archivebox.io/web/<SLUG> where SLUG can be:
- an original URL, with or without scheme, e.g. https://example.com/index.html, 'example.com/index.html' ➡️ redirect to most recent snapshot for https://demo.archivebox.io/web/20230116145642/https://example.com/index.html
- an ArchiveBox snapshot UUID in ulid/spec format 01AN4Z07BY79KA1307SR9X4MV3/index.html or timestamp prefix 01AN4Z07BY/index.html ➡️ redirect to that exact snapshot https://demo.archivebox.io/web/20230116145642/https://example.com/index.html
- an ArchiveBox snapshot timestamp in YYMMDDHHMMSS, shortened forms like YYYYMM, or unix timestamp format e.g. 20230116145642/index.html or 202301161456/index.html, 202301/index.html, 1673919713/index.html ➡️ redirect to most recent snapshot matching that prefix https://demo.archivebox.io/web/20230116145642/https://example.com/index.html

Subtasks:

  • adds derived ulid field + migration to coalesce old uuid and timestamp fields into new ulid format (+asserts all snapshot timestamps are valid and are between 1900 and 2100 AD)
  • update admin and index UI to show ULID of old UUID4 xxxx-xxxx-xxxxxxx format, add ULID diagram in docs breaking it down into timestamp and randomness
  • create disambiguation page to show all the matching results for a given SLUG if it's the prefix for multiple possible snapshots
  • reject Snapshot UUIDs being created that begin with 0, 1,2,htt to make prefix-matching faster and less error prone (avoids clashing with 199x*/20** year, 1* unix timestamp, 01* ULIDs, or http(s?) URL slug prefixes)
  • add docs examples on how to truly "self-host your own archive.org", add screenshot side-by-side of URL bar examples for visiting snapshots on Archive.org and demo.Archivebox.io

image

@pirate pirate added the status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet label Jan 17, 2023
@pirate pirate changed the title Feature Request: Support Archive.org snapshot URL layout schema 1:1 Enhancement: Support https://archive.org/web/<URL> snapshot URL schema 1:1 Jan 17, 2023
@pirate pirate added size: medium why: functionality Intended to improve ArchiveBox functionality or features touches: docs status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/user interface type: enhancement expected: next release why: incentives and removed status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet labels Jan 17, 2023
@pirate pirate changed the title Enhancement: Support https://archive.org/web/<URL> snapshot URL schema 1:1 Enhancement: Use the same URL layout as Archive.org for viewing ArchiveBox Snapshots https://archive.org/web/<URL> Jun 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: next release size: medium status: backlog Work is planned someday but is not the highest priority at the moment touches: API/CLI/user interface touches: docs type: enhancement why: functionality Intended to improve ArchiveBox functionality or features
Projects
None yet
Development

No branches or pull requests

1 participant