Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Automatically rewrite URLs to use alternative frontends for difficult-to-archive sites (e.g. using benbusby/farside) #1319

Open
2 of 6 tasks
pirate opened this issue Jan 12, 2024 · 0 comments
Labels
expected: maybe someday size: hard status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: API/CLI/user interface touches: configuration touches: data/schema/architecture type: enhancement why: functionality Intended to improve ArchiveBox functionality or features
Milestone

Comments

@pirate
Copy link
Member

pirate commented Jan 12, 2024

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

Sites like Facebook, Instagram, Twitter, Tiktok, etc. are difficult to archive and frequently block bot traffic or require logged-in sessions to simply view content.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Many alternative frontends exist that display social media content with less clutter and in a more easily archivable way. e.g.

ArchiveBox should be configurable to rewrite sites the user chooses to use alternative frontends.
Ideally it should be a general solution to URL rewriting and cleanup that can take over from URL_ALLOWLIST/DENYLIST and also handle merging duplicate URLs.

What hacks or alternative solutions have you tried to solve the problem?

Manually replacing URL fragments before piping them in to archivebox:

cat urls.txt | perl -pe 's/twitter\.com/nitter.net/gm' | archivebox add

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually
@pirate pirate added size: hard status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: configuration why: functionality Intended to improve ArchiveBox functionality or features touches: data/schema/architecture touches: API/CLI/user interface type: enhancement expected: maybe someday labels Jan 12, 2024
@pirate pirate added this to the v0.🌈 milestone Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: maybe someday size: hard status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: API/CLI/user interface touches: configuration touches: data/schema/architecture type: enhancement why: functionality Intended to improve ArchiveBox functionality or features
Projects
None yet
Development

No branches or pull requests

1 participant