Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: a web clipper #1203

Open
4 of 9 tasks
berezovskyi opened this issue Aug 8, 2023 · 3 comments
Open
4 of 9 tasks

Feature Request: a web clipper #1203

berezovskyi opened this issue Aug 8, 2023 · 3 comments
Labels
expected: unlikely unless contributed This change is unlikely to be made unless someone contributes a PR for review.

Comments

@berezovskyi
Copy link

berezovskyi commented Aug 8, 2023

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

Being able to clip webpage contents that are hard to fetch using ArchiveBox (captchas, datacenter IP blocks, authentication).

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Evernote Web Clipper did it perfectly.

What hacks or alternative solutions have you tried to solve the problem?

Using Evernote Web Clipper for pages that ArchiveBox cannot archive. Tried Joplin and it seems to do the job too.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up
@berezovskyi
Copy link
Author

I started to look around on how this could be done technically and the first idea I have is to take some OSS clipper extension and fork it to suit AB needs. Eg https://github.com/go-shiori/shiori-web-ext

Regarding the upload, I think the best way would be to allow AB to import WARCs (also see #160). Then, perhaps, an extension like https://github.com/machawk1/warcreate could be used without any changes or with a minimal one (to automatically upload the WARC).

@gerroon
Copy link

gerroon commented Dec 15, 2023

This would be so awesome! Joplin has a good web clipper. Trilium's web cliper is ok.

@pirate
Copy link
Member

pirate commented Dec 15, 2023

In the meantime as a workaround if you urgently need this, any files placed into the snapshot folder (./archive/<timestamp>/) will be respected by archivebox. So if you have any external WARC, PNG, PDF, etc files you can drag them into the snapshot folder manually or create a small script to place them in there.

If you overwrite the existing files or use the default names archivebox uses it will even display them properly in the UI as part of the snapshot.

I try to respect the UNIX "everything is a file" mentality, and may even move towards supporting more pure filesystem-based manipulation of the archives in future releases.

@pirate pirate added the expected: unlikely unless contributed This change is unlikely to be made unless someone contributes a PR for review. label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expected: unlikely unless contributed This change is unlikely to be made unless someone contributes a PR for review.
Projects
None yet
Development

No branches or pull requests

3 participants