Skip to content

Question: Adding any options to [ARCHIVE_METHOD_OPTIONS] causes wget to fail #611

@winteriscariot

Description

@winteriscariot

If I add anything to the [ARCHIVE_METHOD_OPTIONS] section it causes wget to throw an exception. Here is my current ArchiveBox.conf:

[GENERAL_CONFIG]
TIMEOUT = 120

[SERVER_CONFIG]
SECRET_KEY = <super_secret>

[ARCHIVE_METHOD_TOGGLES]
SAVE_ARCHIVE_DOT_ORG = FALSE
SAVE_PDF = FALSE
SAVE_SCREENSHOT = FALSE
SAVE_MEDIA = FALSE

[ARCHIVE_METHOD_OPTIONS]
COOKIES_FILE = /home/winteriscariot/cookies.txt
WGET_USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"

[DEPENDENCY_CONFIG]
CHROME_BINARY = /usr/bin/chromium

If I just have one option (such as COOKIES_FILE) it still fails. It does NOT fail if I remove the COOKIES_FILE and WGET_USER_AGENT from the ArchiveBox.conf. With the above config I get the follow exception thrown, about halfway through the wget process:

$ archivebox add https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
[i] [2021-01-10 20:20:08] ArchiveBox v0.5.3: archivebox add https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
    > /mnt/storage/Archive

[+] [2021-01-10 20:20:08] Adding 1 links to index (crawl depth=0)...
    > Saved verbatim input to sources/1610310008-import.txt
    > Parsed 1 URLs from input (Plain Text)                                                                                                                                                 
    > Found 1 new URLs not already in index

[*] [2021-01-10 20:20:08] Writing 1 links to main index...
    √ /mnt/storage/Archive/index.sqlite3                                                                                                                                                    

[▶] [2021-01-10 20:20:08] Starting archiving of 1 snapshots in index...

[+] [2021-01-10 20:20:08] "www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released"
    https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/
    > ./archive/1610310008.636074
      > title
      > favicon                                                                                                                                                                             
      > wget                                                                                                                                                                                
    ! Failed to archive link: Exception: Exception in archive_methods.save_wget(Link(url=https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/))               

Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 108, in archive_link
    result = method_function(link=link, out_dir=out_dir)
  File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/archivebox/extractors/wget.py", line 115, in save_wget
    return ArchiveResult(
  File "<string>", line 12, in __init__
  File "/usr/lib/python3.9/site-packages/archivebox/index/schema.py", line 46, in __post_init__
    self.typecheck()
  File "/usr/lib/python3.9/site-packages/archivebox/index/schema.py", line 57, in typecheck
    assert all(isinstance(arg, str) and arg for arg in self.cmd)
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/archivebox", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.9/site-packages/archivebox/cli/__init__.py", line 129, in main
    run_subcommand(
  File "/usr/lib/python3.9/site-packages/archivebox/cli/__init__.py", line 69, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/lib/python3.9/site-packages/archivebox/cli/archivebox_add.py", line 85, in main
    add(
  File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/archivebox/main.py", line 593, in add
    archive_links(new_links, overwrite=False, **archive_kwargs)
  File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 173, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/usr/lib/python3.9/site-packages/archivebox/util.py", line 112, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/archivebox/extractors/__init__.py", line 122, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_wget(Link(url=https://www.ghacks.net/2021/01/10/password-manager-keepass-2-47-has-been-released/))

I'm running on an up-to-date Arch Linux install (updated this morning to try and fix it) and I installed archivebox via pip, and is version 5.3.

wget version (just the default from the arch repos):

$ wget --version
GNU Wget 1.20.3 built on linux-gnu.

-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls 
+ntlm +opie +psl +ssl/gnutls 

Wgetrc: 
    /etc/wgetrc (system)
Locale: 
    /usr/share/locale 
Compile: 
    gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/etc/wgetrc" 
    -DLOCALEDIR="/usr/share/locale" -I. -I../lib -I../lib 
    -D_FORTIFY_SOURCE=2 -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS 
    -DNDEBUG -march=x86-64 -mtune=generic -O2 -pipe -fno-plt 
Link: 
    gcc -I/usr/include/p11-kit-1 -DHAVE_LIBGNUTLS -DNDEBUG 
    -march=x86-64 -mtune=generic -O2 -pipe -fno-plt 
    -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -lpcre2-8 -luuid 
    -lidn2 -lnettle -lgnutls -lz -lpsl ftp-opie.o gnutls.o http-ntlm.o 
    ../lib/libgnu.a /usr/lib/libunistring.so 

Unfortunately I'm not familiar enough with python to debug myself, or even have a good idea if this is a bug in archivebox or a config or dependency issue. is there something obvious that I should be looking at for this?

Any help would be awesome, thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions