-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heads up...tracking a problem with archive.today and also wget options #35
Comments
Unfortunately, archive.today (or whatever its alias of the day may be) generally seems like an unreliable service for using as a backend. It's not intended to be used except through a browser. So don't be surprised if it doesn't work sometimes. If a change has been made to it that requires a change in this code, we can do that. For Wget, you'll have to be more specific than "it does not like the option." Obviously it works for me and always has. |
(use-package org-web-tools) produces timeout of archive.is function and then the following error in Messages for wget function:
The following then fixes wget params: (use-package org-web-tools
:config
(setq org-web-tools-archive-wget-options
(delete "--execute robots=off" org-web-tools-archive-wget-options))
(add-to-list 'org-web-tools-archive-wget-options "-e robots=off")) My wget man page still shows both the -e and --execute options as valid but apparently not.
|
I tried a: (setq org-web-tools-attach-archive-fn #'org-web-tools-archive--wget-tar) to just skip the archive.is attempts completely but its still trying archive.is. New to elisp so I'm probably missing something important. |
If those Wget options don't work on your Wget version, I don't know what to suggest other than to not use them. Hopefully you won't need them, but be aware of their purpose. Maybe there is a new, alternative option syntax in your Wget version? I recommend using the customization system rather than |
I just recognized something: the option string Wget complains about includes both Try putting But I can't explain why my Wget doesn't complain about that option. |
Indeed adding "--execute" and "robots=off" as their own customize entries seems to have solved the issue with wget archiving. I'm still not able to get the archive.is based archiving working but the above is a suitable work-around. |
I think there may be a bug in Wget, because I recently noticed this problem when calling it from outside of Emacs. I guess we have to work around it in Emacs.
archive.is doesn't seem to provide zip archives at all anymore. I can't even download them through a browser, and I couldn't find any explanation on its "blog" where people ask questions. In one case I tried to use Wget on the archive.is HTML view (because the page I was trying to save rendered most of its content with JavaScript, so Wget on the actual site was useless), but the downloaded page had about 90% of the content missing, even though it displayed correctly in a browser. Archiving contemporary web pages is mostly a disaster. I guess if you are serious about it, you'd better look into WARC or WebRecorder tools, something like that, but those are much more complicated, and AFAIK they require specialized "playback" tools. Imagine what people are going to have to do a few decades from now, running ancient browsers in ancient VMs just to render a newspaper article of the day. Or, almost as bad, looking at image-based archives of newspapers, like microfilm from before the digital age. It seems like no one ever knows when to say, "Stop, that's complicated enough. Just because we could doesn't mean that we should." |
Y, thanks for the confirmation. I feel ya' on future-state stuff. |
@nrvale0 Where you able to solve 1) arhive.today and 2) wget params problems? I have same in #52 |
does this look right
? UPDATE: wget error solved with following:
but archive.today always fails... |
This is still an issue on |
@deadcombo Thanks for reminding me. I've pushed a fix to |
Just a heads up...
For some reason archive.today requests are failing (no, not using Cloudflare) and then the backup wget is failing because it does not like the '--execute robots=off' option.
I'm going to try to solve the archive.today problem first but I'll race ya! ;)
The text was updated successfully, but these errors were encountered: