-
-
Notifications
You must be signed in to change notification settings - Fork 84
Open
Labels
Description
Scraping some site I ended up with this error when wpull tried to --convert-links after downloading:
INFO Converting links in file ‘scrubbed’ (type=html).
ERROR Fatal exception.
Traceback (most recent call last):
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/application/app.py", line 152, in run
yield from pipeline.process()
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
yield from self._process_one_worker()
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
task.result()
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
item = yield from self.process_one(_worker_id=worker_id)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
yield from task.process(item)
File "/usr/lib/python3.6/asyncio/coroutines.py", line 210, in coro
res = func(*args, **kw)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/application/tasks/conversion.py", line 69, in process
converter.convert_by_record(session.url_record)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 97, in convert_by_record
filename, temp_filename, base_url=url_record.url)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 161, in convert
self._convert_element(element, is_xhtml=is_xhtml)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 181, in _convert_element
new_value = self._convert_plain(link_info)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/converter.py", line 230, in _convert_plain
url_info = URLInfo.parse(url, encoding=self._encoding)
File "/home/scrubbed/.local/lib/python3.6/site-packages/wpull/url.py", line 128, in parse
raise ValueError('URL contains control codes: {}'.format(ascii(url)))
ValueError: URL contains control codes: 'http://i53.fastpic.ru/big<br />\n/2013/scrubbed/scrubbed/scrubbed.jpg'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.
This is most likely a malformed URL from a messageboard.
$ python --version
Python 3.6.2
$ ~/.local/bin/wpull3 --version
2.0.1