If I add anything to the [ARCHIVE_METHOD_OPTIONS] section it causes wget to throw an exception. Here is my current ArchiveBox.conf:
[GENERAL_CONFIG]
TIMEOUT = 120
[SERVER_CONFIG]
SECRET_KEY = <super_secret>
[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = FALSE
SAVE_PDF = FALSE
SAVE_SCREENSHOT = FALSE
SAVE_MEDIA = FALSE
[ARCHIVE_METHOD_OPTIONS]
COOKIES_FILE = /home/winteriscariot/cookies.txt
WGET_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
[DEPENDENCY_CONFIG]
CHROME_BINARY = /usr/bin/chromium
If I just have one option (such as COOKIES_FILE) it still fails. It does NOT fail if I remove the COOKIES_FILE and WGET_USER_AGENT from the ArchiveBox.conf. With the above config I get the follow exception thrown, about halfway through the wget process:
$ archivebox add https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
[i] [2021-01-10 20:20:08] ArchiveBox v0.5.3: archivebox add https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
> /mnt/storage/Archive
[+] [2021-01-10 20:20:08] Adding 1 links to index (crawl depth=0)...
> Saved verbatim input to sources/1610310008-import.txt
> Parsed 1 URLs from input (Plain Text)
> Found 1 new URLs not already in index
[*] [2021-01-10 20:20:08] Writing 1 links to main index...
√ /mnt/storage/Archive/index.sqlite3
[▶] [2021-01-10 20:20:08] Starting archiving of 1 snapshots in index...
[+] [2021-01-10 20:20:08] "www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released"
https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
> ./archive/1610310008.636074
> title
> favicon
> wget
! Failed to archive link: Exception: Exception in archive_methods.save_wget(Link(url=https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/))
Traceback (most recent call last):
File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 108, in archive_link
result = method_function(link=link, out_dir=out_dir)
File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
return func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/archivebox/extractors/wget.py", line 115, in save_wget
return ArchiveResult(
File "<string>", line 12, in __init__
File "/usr/lib/python3.9/site-packages/archivebox/index/schema.py", line 46, in __post_init__
self.typecheck()
File "/usr/lib/python3.9/site-packages/archivebox/index/schema.py", line 57, in typecheck
assert all(isinstance(arg, str) and arg for arg in self.cmd)
AssertionError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/bin/archivebox", line 8, in <module>
sys.exit(main())
File "/usr/lib/python3.9/site-packages/archivebox/cli/__init__.py", line 129, in main
run_subcommand(
File "/usr/lib/python3.9/site-packages/archivebox/cli/__init__.py", line 69, in run_subcommand
module.main(args=subcommand_args, stdin=stdin, pwd=pwd) # type: ignore
File "/usr/lib/python3.9/site-packages/archivebox/cli/archivebox_add.py", line 85, in main
add(
File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
return func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/archivebox/main.py", line 593, in add
archive_links(new_links, overwrite=False, **archive_kwargs)
File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
return func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 173, in archive_links
archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
return func(*args, **kwargs)
File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 122, in archive_link
raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_wget(Link(url=https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/))
I'm running on an up-to-date Arch Linux install (updated this morning to try and fix it) and I installed archivebox via pip, and is version 5.3.
wget version (just the default from the arch repos):
$ wget --version
GNU Wget 1.20.3 built on linux-gnu.
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls
+ntlm +opie +psl +ssl/gnutls
Wgetrc:
/etc/wgetrc (system)
Locale:
/usr/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc"
-DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib
-D_FORTIFY_SOURCE=2 -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS
-DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fno-plt
Link:
gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG
-march=x86-64 -mtune=generic -O2 -pipe -fno-plt
-Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -lpcre2-8 -luuid
-lidn2 -lnettle -lgnutls -lz -lpsl ftp-opie.o gnutls.o http-ntlm.o
../lib/libgnu.a /usr/lib/libunistring.so
Unfortunately I'm not familiar enough with python to debug myself, or even have a good idea if this is a bug in archivebox or a config or dependency issue. is there something obvious that I should be looking at for this?
Any help would be awesome, thanks!
If I add anything to the [ARCHIVE_METHOD_OPTIONS] section it causes wget to throw an exception. Here is my current ArchiveBox.conf:
If I just have one option (such as COOKIES_FILE) it still fails. It does NOT fail if I remove the COOKIES_FILE and WGET_USER_AGENT from the ArchiveBox.conf. With the above config I get the follow exception thrown, about halfway through the wget process:
I'm running on an up-to-date Arch Linux install (updated this morning to try and fix it) and I installed archivebox via pip, and is version 5.3.
wget version (just the default from the arch repos):
Unfortunately I'm not familiar enough with python to debug myself, or even have a good idea if this is a bug in archivebox or a config or dependency issue. is there something obvious that I should be looking at for this?
Any help would be awesome, thanks!