Skip to content

ValueError: URL contains control codes "... /big<br />\n/2013/ ..." #369

@SpiritQuaddicted

Description

@SpiritQuaddicted

Scraping some site I ended up with this error when wpull tried to --convert-links after downloading:

INFO Converting links in file ‘scrubbed’ (type=html).
ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/application/app.py", line 152, in run
    yield from pipeline.process()
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File "/usr/lib/python3.6/asyncio/coroutines.py", line 210, in coro
    res = func(*args, **kw)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/application/tasks/conversion.py", line 69, in process
    converter.convert_by_record(session.url_record)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 97, in convert_by_record
    filename, temp_filename, base_url=url_record.url)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 161, in convert
    self._convert_element(element, is_xhtml=is_xhtml)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 181, in _convert_element
    new_value = self._convert_plain(link_info)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 230, in _convert_plain
    url_info = URLInfo.parse(url, encoding=self._encoding)
  File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/url.py", line 128, in parse
    raise ValueError('URL contains control codes: {}'.format(ascii(url)))
ValueError: URL contains control codes: 'http://i53.fastpic.ru/big<br />\n/2013/scrubbed/scrubbed/scrubbed.jpg'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.

This is most likely a malformed URL from a messageboard.

$ python --version
Python 3.6.2
$ ~/.local/bin/wpull3 --version
2.0.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions