Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Browser extension to submit either all history or certain URLs to a given ArchiveBox instance #577

Open
adamwolf opened this issue Dec 9, 2020 · 33 comments
Labels
good first ticket help wanted size: medium status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: urls/views/API

Comments

@adamwolf
Copy link
Contributor

adamwolf commented Dec 9, 2020

Hi folks!

After adding the little bookmarklet, I'd like to add another extension. Once the API is closer, would you rather see an Android/iOS "share to" app extension, or a Chrome extension to quickly submit an URL to your ArchiveBox?

(Of course, if these are both things you don't like, just let me know! :)

@pirate
Copy link
Member

pirate commented Dec 9, 2020

Yeah for sure, that would be great! We can easily expose an /add endpoint for those. I don't have any Android/iOS app dev experience, so that's definitely something we could use help with.

@pirate pirate added size: medium help wanted status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet labels Dec 9, 2020
@pirate
Copy link
Member

pirate commented Jan 23, 2021

Copying @CodingSpiderFox's message from duplicate ticket here:

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I don't want to manually type the URLs in my shell or run the export script regularly because I tend to for get it and I also want to save my pages right away. Also, I want Archivebox running on my NAS and not on my local computer.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

I want to have a plugin for at least Firefox and Chrome where I can

  • configure the URL of my archivebox on my local network and my credentials for my archivebox
  • have two modes:
  • a) it logs every URL I visited automatically to my archivebox and archivebox saves it right away
  • b) a button in the addons toolbar that I can click which submits the current open URL in the current tab (only the current tab) to my archivebox and archivebox saves it right away

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up

@pirate pirate changed the title Question: extension options? Feature Request: Browser extension to submit either all history or certain URLs to a given ArchiveBox instance Jan 23, 2021
@adamwolf
Copy link
Contributor Author

Hi! I haven't followed this project as closely as I have in the past, but I keep seeing it in headlines... good work!

Is there an /add or equivalent API endpoint? No worries if not... I'm a little overbooked with billable work at the moment but if there isn't one yet, is there a particular ticket that tracks that? I could subscribe to that so I know when to get started on this.

@pirate
Copy link
Member

pirate commented Jan 23, 2021

There is an /add endpoint now, but it's the one used by the UI so it requires a CSRF token which is a pain for API-style usage. No ticket for fixing that yet, but I'll be sure to post back here once I stabilize that endpoint more.

I'm also a bit swamped with my day job right now, but I haven't forgotten about this.

@adamwolf
Copy link
Contributor Author

No problem! Do not rush to implement this for my sake! :) Thanks for all your work.

@pirate
Copy link
Member

pirate commented Mar 10, 2021

Ideally a browser extension for ArchiveBox should be releasable cross-platform with minimal effort on the packaging side (ideally like something equivalent to FPM in the Debian packaging world).

Some of my research so far:

So far this seems like the best place to get started: https://www.emailthis.me/open-source/extension-boilerplate
Their sample extension is quite close to what the ArchiveBox extension UI would need.

If anyone wants to take a crack at this, PRs are welcome! In theory an extension that submits a POST to http://<user configurable archivebox host>/add? could be accomplished in <200 LOC.

@voarsh2
Copy link

voarsh2 commented Mar 15, 2021

This extension would be great.
Also, as well as submitting urls with a click, it might make it easy to have an automatic submission (if that's an option and turned on), to submit browser history.

@pirate
Copy link
Member

pirate commented Apr 1, 2021

@layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox

(x-posting this here)

@layderv
Copy link

layderv commented Apr 1, 2021

Would it be useful to add it to the repo's readme? Is there any useful, missing feature?

@LennyPenny
Copy link

LennyPenny commented Apr 1, 2021

I think it would be cool to have an optional mode in this extension that will just queue every page you visit to be archived

edit: oh nvm #577 (comment) already mentions that

@voarsh2
Copy link

voarsh2 commented Apr 2, 2021

@layderv has written a sample extension for Firefox: https://github.com/layderv/archivefox

(x-posting this here)

Cool, except I'm on Chrome.

@rastacalavera
Copy link

So i installed the addon but my instance is on a raspberry pi not my host computer. It looks like the addon and the instance need to be on the same machine? Is the correct? Or, can I put in the url with port number and /add at the end?
image

@layderv
Copy link

layderv commented May 19, 2021

@rastacalavera the addon's repository is probably best to ask this. I didn't add that feature, but if you show me how you use it manually, I can see how to add it

@tjhorner
Copy link
Contributor

Hey @pirate, I can work on this if you'd like. I'm not well-versed in Python/Django, so I'd appreciate if you could add the API endpoint for adding URLs to archive. (Else, I can totally try it myself, doesn't seem too difficult!) How would authentication work? I think for now a simple shared secret that's defined in the config would be fine.

I'll work on the browser extension for now. Since archiving all your history would probably take up way too much space and not be very useful (for e.g. Gmail, Google Photos, other auth'd services), I think the best way to determine which sites to archive would be:

  • Don't archive any sites by default
  • Users can manually archive the current page (or links) from the context menu
  • Users can add domains/regexes to auto-archive from settings
  • If the extension notices a user browsing a certain domain often, it will ask them if they'd like to archive it or not. If they choose yes, then it'll retroactively archive the history (going back some amount of days; not forever) and any future visit to that domain

So as to not accidentally DoS your ArchiveBox instance, matched URLs would be buffered and submitted in batches, every 10 minutes or so. But if the user closes their browser while there are buffered URLs, it would submit them immediately before closing.

What do you think?

@tjhorner
Copy link
Contributor

I've got something working pretty well! Here are some screenshots:

image

image

image

And here is the repo: https://github.com/tjhorner/archivebox-exporter

All that's left is to implement the actual API call to ArchiveBox (and some config fields for pointing to the right domain). Let me know if you want to take care of implementing that server-side or if you're fine with me handling it.

@tjhorner
Copy link
Contributor

tjhorner commented Jul 1, 2021

Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup)

I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution.

@voarsh2
Copy link

voarsh2 commented Jul 2, 2021

Just an update: I forked ArchiveBox and added a temporary API endpoint just for the extension. You can see that branch here: https://github.com/tjhorner/archivebox/tree/temporary-add-api (More info on how to set it up here: https://github.com/tjhorner/archivebox-exporter/wiki/Setup)

I submitted the extension to both the Chrome and Firefox web stores, and I'll post another comment here when they both pass review. Once ArchiveBox gets a more official API, I'll be glad to update the extension to support that instead of this weird hacky solution I've come up with, hah. But for now I think this is a decent solution.

Awesome, really pumped to try this!
Hopefully I'll have some time in the next few days.

@pirate
Copy link
Member

pirate commented Jul 2, 2021

@tjhorner have you tried using the existing POST /core/snapshot/add/ (archivebox/core/admin.py:382) endpoint to add new URLs? I believe the only potential blocker is the CSRF token requirement, which we can probably remove with a @csrf_exempt decorator on that view handler function.

Either way, I should have time to take a closer look in the upcoming weeks and help put whatever you need into ArchiveBox master to get this working.

As a side note, I pass on a subset of the donations that archivebox gets to dependencies we use and other crucial projects in the ecosystem. If one or more user-contributed extensions get reliable and feature-complete enough that we can make direct people to them in the README, I'd be happy to pass on some of our $ support to those projects! It's small amounts right now (<$100/mo) but hopefully as the project grows it will become more significant.

@tjhorner
Copy link
Contributor

tjhorner commented Jul 2, 2021

I believe the only potential blocker is the CSRF token requirement

Yep, I ran into that when trying to use that endpoint in my testing. I was thinking of how the extension would authenticate with ArchiveBox, and I decided on an API key would be the best solution. But I just did another test and it turns out since the extension has permission to access user data on their ArchiveBox instance, it will send the sessionid cookie along with the request, so as long the user is signed in and the session remains active (and since SESSION_SAVE_EVERY_REQUEST is set, it should automatically renew), then the extension should be authorized.

So, TL;DR: yep, it seems all that's needed is to exempt that view from CSRF, since authentication is shared with the browser session.

I decorated the API view in my branch with @method_decorator(csrf_exempt, name='dispatch') and it worked just fine. I'll decorate the existing /add path with that and see if the extension can successfully make requests to that.

@pirate
Copy link
Member

pirate commented Jul 2, 2021

Ok, in the future we will likely have to build some infrastructure to authenticate the extension with ArchiveBox and issue it a dedicated bearer token key with CSRF-free endpoints (likely with a broader push towards building a real REST API). For now that should be ok though.

If you want to PR that decorator change you made against dev I can review and merge it into the 0.6.3 release candidate, though I cant promise that release will go out in the next couple weeks (I have a lot of travel and non-tech projects coming up). If it takes me any longer than 2 weeks then I can probably roll a micro-release with only your change and some other small bugfixes and save the other things on the 0.6.3 TODO list for later, as having this extension would be a huge usability win for many ArchiveBox users.

For anyone who wants to use this early, see instructions here on how run the ArchiveBox pre-release dev version on your machine:
https://github.com/ArchiveBox/ArchiveBox#install-and-run-a-specific-github-branch

@tjhorner
Copy link
Contributor

tjhorner commented Jul 2, 2021

Just added the CSRF exempt decorator to AddView in this branch. I modified the extension to use that route and it works like a charm! I'll submit a PR with that change against dev. In the meantime I'll update the extension setup instructions and push an update to the Chrome/FF stores with this change.

@tjhorner
Copy link
Contributor

tjhorner commented Jul 2, 2021

The extension's now published on the Chrome and FF webstores! Give it a try and let me know what you think. Make sure you're running the dev branch of ArchiveBox (instructions here).

Bug reports and feature requests welcome, just make a new issue on the repo: https://github.com/tjhorner/archivebox-exporter/issues

@voarsh2
Copy link

voarsh2 commented Jul 2, 2021

Quick question, when I use docker to build from "dev" branch, am I actually building from this branch: https://github.com/tjhorner/archivebox/tree/temporary-add-api?
E.G: docker build -t archivebox:dev https://github.com/tjhorner/ArchiveBox.git#temporary-add-api

@tjhorner
Copy link
Contributor

tjhorner commented Jul 2, 2021

@voarsh2 No, you should be building from ArchiveBox/ArchiveBox#dev. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :)

Edit: also make sure you have the latest version of the extension. It should be 1.2.0

@voarsh2
Copy link

voarsh2 commented Jul 2, 2021

No, you should be building from ArchiveBox/ArchiveBox#dev. I updated the instructions on the wiki to reflect that. If there are other places it needs to be updated let me know :)

Ah okay, I also thought my way above made sense since it's not in the ArchiveBox project yet....

so: docker build -t archivebox:dev https://github.com/ArchiveBox/ArchiveBox.git#dev ?
If I am pulling from the official repo, how are your changes from your repo applied exactly? I assume I'm missing something....

@tjhorner
Copy link
Contributor

tjhorner commented Jul 2, 2021

I ended up going a different route by utilizing the existing /add/ endpoint, just disabling CSRF checks there. I submitted a PR earlier (#777) and it's now in dev here. It's a short term solution but it works for now. Once there's a fully fleshed out REST API with proper authorization and stuff, the extension will move to that.

In the very earliest version of the extension you would have needed to build from my fork, yes, but no longer.

edit: if you have any further questions please ask them in the discussions section of the repo; I don't want to clutter this issue too much 😅

@pirate
Copy link
Member

pirate commented Jul 17, 2021

One thing I'd like to do is push extension users away from "archive every page I visit" by default. Archives rapidly lose value that way, and people will end up just disabling the tool or deleting large swaths of their archive if thats the default for long periods of time. One-click archiving using a button in the navbar is always better than saving all browser history by default, curation is really important and the archives will hold both more value on a decades and centuries timescale if they are limited to pages deemed worthy of saving.

I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save everything you visit.

https://youtu.be/7eoz_EU6-wQ?t=1387

@voarsh2
Copy link

voarsh2 commented Jul 17, 2021

I'm not proposing removing the "all history" feature, just not making it the default, because despite what people think initially, it's really not a great idea long-term to save everything you visit.

I think, it's clear archive all is not on unless you make it so......

I will tag browsing history as an inbox to sort later....

@pirate
Copy link
Member

pirate commented Jul 17, 2021

Yes, that is the case for @tjhorner's extension right now, but there are comments on reddit asking to make it the default, so I'm linking those people here for an explanation. I also want to stress it here for the other people developing extensions, there are 3 in the works right now last I counted.

@mAAdhaTTah
Copy link
Contributor

@pirate If there are extensions in the works, would it be worth picking on the REST API? Is that ready to start or
should we wait until the worker rearch w/ Huey is done?

@pirate
Copy link
Member

pirate commented Jul 19, 2021

I think a minimal API can be worked on before the Huey refactor, as the user-facing API is going to be relatively stable even with the change to the internals. Maybe just these things to start:

  • /api/core/snapshot/ GET, POST, PUT
  • /api/core/snapshot/<id> GET, PATCH, DELETE
  • /api/core/archiveresult/ GET, POST
  • /api/core/archiveresult/<id> GET, PATCH, DELETE
  • /api/core/tag/ GET, POST, PUT
  • /api/core/tag/<id> GET, PATCH, DELETE

and this bonus escape hatch endpoint to do everything else not possible with the above ^:

  • /api/cli/<command> POST (simulate running any archivebox CLI command with a given dict of args and kwargs to populate the CLI flags and args)
    e.g. /api/cli/add POST {urls: 'https://example.com', depth: 1, extractors: ['wget', 'media', 'screenshot'], ...}
    or /api/cli/schedule POST {urls: 'https://example.com', depth: 1, every: 'day', ...}

I'm leaning towards using FastAPI for the API instead of DRF. I like the pydantic type-based API definitions better than DRF's serializers but I could be convinced either way.

@adamwolf
Copy link
Contributor Author

adamwolf commented Jul 19, 2021 via email

@mAAdhaTTah
Copy link
Contributor

I am using FastAPI on a side project and like it a lot but I think the integration with the way Archivebox loads Django will be complicated. Django Ninja appears to have a lot of the same trappings as FastAPI, so I'd be inclined to go with that rather than try to shoehorn FastAPI into the current Django integration.

I would be willing to work on this too–I'm trying to consume ArchiveBox for displaying my reading on my site and pulling it from the SQLite file directly is turning out to be a bit annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first ticket help wanted size: medium status: idea-phase Work is tentatively approved and is being planned / laid out, but is not ready to be implemented yet touches: urls/views/API
Development

No branches or pull requests

8 participants