Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Exception in archive_methods.save_readability due to bytes string being passed to hint #706

Closed
Valporaena opened this issue Apr 15, 2021 · 4 comments
Labels
size: easy status: wip Work is in-progress / has already been partially completed type: bug report
Milestone

Comments

@Valporaena
Copy link

I'm encountering the same problem user @jrruethe already described some time ago. Seems like it was solved, but it reoccurred on my setup after installing the latest update and running archivebox setup command for some reason.

  1. Ran arcivebox update (several times, it reproduces)
  2. On a specific link it crashes, giving the following output
[√] [2021-04-15 10:56:49] "The Long War on Objectivity       | The New Republic"
    https://newrepublic.com/article/158497/long-war-objectivity
    √ ./archive/1617309812.979884
      > readability
    ! Failed to archive link: Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 114, in archive_link
    log_archive_method_finished(result)
  File "/usr/lib/python3/dist-packages/archivebox/logging_util.py", line 435, in log_archive_method_finished
    hints = hints if isinstance(hints, (list, tuple)) else hints.split('\n')
TypeError: a bytes-like object is required, not 'str'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/archivebox", line 11, in <module>
    load_entry_point('archivebox==0.6.2', 'console_scripts', 'archivebox')()
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/usr/lib/python3/dist-packages/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
  File "/usr/lib/python3/dist-packages/archivebox/cli/archivebox_update.py", line 119, in main
    update(
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/main.py", line 783, in update
    archive_links(to_archive, overwrite=overwrite, **archive_kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 181, in archive_links
    archive_link(to_archive, overwrite=overwrite, methods=methods, out_dir=Path(link.link_dir))
  File "/usr/lib/python3/dist-packages/archivebox/util.py", line 114, in typechecked_function
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/archivebox/extractors/__init__.py", line 130, in archive_link
    raise Exception('Exception in archive_methods.save_{}(Link(url={}))'.format(
Exception: Exception in archive_methods.save_readability(Link(url=https://newrepublic.com/article/158497/long-war-objectivity))
ArchiveBox v0.6.2
Cpython Linux Linux-5.4.0-71-generic-x86_64-with-glibc2.29 x86_64
IN_DOCKER=False DEBUG=False IS_TTY=True TZ=UTC SEARCH_BACKEND_ENGINE=ripgrep

[i] Dependency versions:
 √  ARCHIVEBOX_BINARY     v0.6.2          valid     /usr/bin/archivebox                                                         
 √  PYTHON_BINARY         v3.8.5          valid     /usr/bin/python3.8                                                          
 √  DJANGO_BINARY         v2.2.12         valid     /usr/lib/python3/dist-packages/django/bin/django-admin.py                   
 √  CURL_BINARY           v7.68.0         valid     /usr/bin/curl                                                               
 √  WGET_BINARY           v1.20.3         valid     /usr/bin/wget                                                               
 √  NODE_BINARY           v10.19.0        valid     /usr/bin/node                                                               
 √  SINGLEFILE_BINARY     v0.3.16         valid     ./node_modules/single-file/cli/single-file                                  
 √  READABILITY_BINARY    v0.0.2          valid     ./node_modules/readability-extractor/readability-extractor                  
 √  MERCURY_BINARY        v1.0.0          valid     ./node_modules/@postlight/mercury-parser/cli.js                             
 √  GIT_BINARY            v2.25.1         valid     /usr/bin/git                                                                
 -  YOUTUBEDL_BINARY      -               disabled  /usr/bin/youtube-dl                                                         
 √  CHROME_BINARY         v89.0.4389.114  valid     /usr/bin/chromium-browser                                                   
 √  RIPGREP_BINARY        v11.0.2         valid     /usr/bin/rg                                                                 

[i] Source-code locations:
 √  PACKAGE_DIR           23 files        valid     /usr/lib/python3/dist-packages/archivebox                                   
 √  TEMPLATES_DIR         3 files         valid     /usr/lib/python3/dist-packages/archivebox/templates                         
 -  CUSTOM_TEMPLATES_DIR  -               disabled                                                                              

[i] Secrets locations:
 -  CHROME_USER_DATA_DIR  -               disabled                                                                              
 -  COOKIES_FILE          -               disabled                                                                              

[i] Data locations:
 √  OUTPUT_DIR            14 files        valid     /home/.../archivebox                                                     
 √  SOURCES_DIR           27 files        valid     ./sources                                                                   
 √  LOGS_DIR              1 files         valid     ./logs                                                                      
 √  ARCHIVE_DIR           9024 files      valid     ./archive                                                                   
 √  CONFIG_FILE           291.0 Bytes     valid     ./ArchiveBox.conf                                                           
 √  SQL_INDEX             105.7 MB        valid     ./index.sqlite3             
@pirate
Copy link
Member

pirate commented Apr 15, 2021

It's a different error, unrelated, but thanks for reporting, I'll fix it.

@pirate pirate added size: easy type: bug report status: wip Work is in-progress / has already been partially completed labels Apr 15, 2021
@pirate pirate changed the title Bug: Exception: Exception in archive_methods.save_readability - again Bug: Exception in archive_methods.save_readability due to bytes string being passed to hint Apr 15, 2021
@Valporaena
Copy link
Author

Oh, my bad. Looked very similar, but I'm completely untrained in these things - shouldn't have presumed it was related.

@pirate
Copy link
Member

pirate commented May 10, 2022

I think I fixed it in d581a50, let me know if you still see this issue in the next release and comment back so I can reopen it if so.

@pirate pirate closed this as completed May 10, 2022
@mike-greenmmd
Copy link

Is there a plan to merge this into main / master?
I'm getting this exact issue using the docker compose method of running archivebox:
https://raw.githubusercontent.com/ArchiveBox/ArchiveBox/master/docker-compose.yml
which is pulling the master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: easy status: wip Work is in-progress / has already been partially completed type: bug report
Projects
None yet
Development

No branches or pull requests

3 participants