Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long) #82

Closed
ivan opened this issue Mar 21, 2014 · 5 comments
Assignees
Labels

Comments

@ivan
Copy link
Contributor

ivan commented Mar 21, 2014

!a http://www.haskell.org/pipermail/haskell/ in #archivebot resulted in this:

Starting GetItemFromQueue for Item 
ERROR Fatal exception.
Traceback (most recent call last):
  File "/usr/lib/python3.4/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 176, in _process_input
    yield self._process_url_item(url_item)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/engine.py", line 203, in _process_url_item
    yield self._processor.process(url_item)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/hook.py", line 227, in process
    raise tornado.gen.Return((yield session.process()))
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 241, in process
    is_done = yield self._process_one()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.4/site-packages/tornado/gen.py", line 531, in run
    yielded = self.gen.send(next)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 291, in _process_one
    is_done = self._handle_response(response)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/hook.py", line 291, in _handle_response
    return super()._handle_response(response)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 392, in _handle_response
    return self._handle_document(response)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 407, in _handle_document
    self._scrape_document(self._request, response)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/hook.py", line 326, in _scrape_document
    super()._scrape_document(request, response)
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 478, in _scrape_document
    scraper, scrape_info
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/processor.py", line 523, in _process_scrape_info
    if self._should_fetch_reason(url_info, url_record)[0]:
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/hook.py", line 248, in _should_fetch_reason
    record_info_dict = url_record.to_dict()
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/database.py", line 117, in to_dict
    'url_info': self.url_info.to_dict(),
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/database.py", line 100, in url_info
    return URLInfo.parse(self.url, encoding=self.url_encoding or 'utf8')
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/url.py", line 148, in parse
    cls.normalize_hostname(url_split_result.hostname),
  File "/home/archivebot/.local/lib/python3.4/site-packages/wpull/url.py", line 158, in normalize_hostname
    return hostname.encode('idna').decode('ascii')
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
Finished WgetDownload for Item 
@chfoo chfoo added the bug label Mar 21, 2014
@chfoo chfoo self-assigned this Mar 22, 2014
chfoo added a commit that referenced this issue Mar 22, 2014
@chfoo chfoo removed their assignment Mar 27, 2014
@nsapa
Copy link

nsapa commented Mar 30, 2014

On the likeness.com job:

ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/engine.py", line 181, in _process_input
    yield self._process_url_item(url_item)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/engine.py", line 211, in _process_url_item
    yield self._processor.process(url_item)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/hook.py", line 234, in process
    raise tornado.gen.Return((yield session.process()))
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 529, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 246, in process
    is_done = yield self._process_one()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 520, in run
    next = self.yield_point.get_result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 409, in get_result
    return self.runner.pop_result(self.key).result()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/concurrent.py", line 129, in result
    raise_exc_info(self.__exc_info)
  File "<string>", line 3, in raise_exc_info
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/stack_context.py", line 302, in wrapped
    ret = fn(*args, **kwargs)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 574, in inner
    self.set_result(key, result)
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 500, in set_result
    self.run()
  File "/home/archivebot/.local/lib/python3.2/site-packages/tornado/gen.py", line 531, in run
    yielded = self.gen.send(next)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 296, in _process_one
    is_done = self._handle_response(response)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/hook.py", line 298, in _handle_response
    return super()._handle_response(response)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 400, in _handle_response
    return self._handle_document(response)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 415, in _handle_document
    self._scrape_document(self._request, response)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/hook.py", line 333, in _scrape_document
    super()._scrape_document(request, response)
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 486, in _scrape_document
    scraper, scrape_info
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/processor.py", line 531, in _process_scrape_info
    if self._should_fetch_reason(url_info, url_record)[0]:
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/hook.py", line 255, in _should_fetch_reason
    record_info_dict = url_record.to_dict()
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/item.py", line 95, in to_dict
    'url_info': self.url_info.to_dict(),
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/item.py", line 78, in url_info
    return URLInfo.parse(self.url, encoding=self.url_encoding or 'utf8')
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/url.py", line 148, in parse
    cls.normalize_hostname(url_split_result.hostname),
  File "/home/archivebot/.local/lib/python3.2/site-packages/wpull/url.py", line 158, in normalize_hostname
    return hostname.encode('idna').decode('ascii')
  File "/usr/lib/python3.2/encodings/idna.py", line 167, in encode
    result.extend(ToASCII(label))
  File "/usr/lib/python3.2/encodings/idna.py", line 73, in ToASCII
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

@ivan
Copy link
Contributor Author

ivan commented Mar 30, 2014

About a minute into ~/.local/bin/wpull --delete-after --page-requisites -r --span-hosts https://likeness.com/p/keubydeZPcs/Maono_Seattle, I see:

INFO Fetched ‘https://foursquare.com/item/5143b884e4b0da66d2a62413’: 200 OK. Length: None [text/html; charset=utf-8].
ERROR Fatal exception.
Traceback (most recent call last):
  File "/usr/lib/python3.4/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/at/.local/lib/python3.4/site-packages/wpull/engine.py", line 178, in _process_input
    url_info = URLInfo.parse(url_record.url, encoding=url_encoding)
  File "/home/at/.local/lib/python3.4/site-packages/wpull/url.py", line 158, in parse
    cls.normalize_hostname(url_split_result.hostname),
  File "/home/at/.local/lib/python3.4/site-packages/wpull/url.py", line 168, in normalize_hostname
    return hostname.encode('idna').decode('ascii')
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
INFO FINISHED.
INFO Duration: 0:00:48. Speed: 65.9 KiB/s.
INFO Downloaded: 50 files, 3.0 MiB.
INFO Exiting with status 2.

@ivan
Copy link
Contributor Author

ivan commented Mar 30, 2014

with --debug:

DEBUG Found URLs: inline=8 linked=6
DEBUG Marking URL https://likeness.com/signup status done.
DEBUG End session for URLRecord(url='https://likeness.com/signup', status='in_progress', try_count=0, level=1, top_url='https://likeness.com/p/keubydeZPcs/Maono_Seattle', status_code=None, referrer='https://likeness.com/p/keubydeZPcs/Maono_Seattle', inline=None, link_type='html', url_encoding='utf-8', post_data=None, filename=None) URLInfo(scheme='https', netloc='likeness.com', path='/signup', query=None, fragment='', username=None, password=None, hostname='likeness.com', port=443, raw='https://likeness.com/signup', encoding='utf-8').
DEBUG Get next URL todo.
DEBUG Return record URLRecord(url='http://maono.springhillnorthwest.c.../', status='in_progress', try_count=0, level=1, top_url='https://likeness.com/p/keubydeZPcs/Maono_Seattle', status_code=None, referrer='https://likeness.com/p/keubydeZPcs/Maono_Seattle', inline=None, link_type='html', url_encoding='utf-8', post_data=None, filename=None).
ERROR Fatal exception.
Traceback (most recent call last):
  File "/usr/lib/python3.4/encodings/idna.py", line 165, in encode
    raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/at/.local/lib/python3.4/site-packages/wpull/engine.py", line 178, in _process_input
    url_info = URLInfo.parse(url_record.url, encoding=url_encoding)
  File "/home/at/.local/lib/python3.4/site-packages/wpull/url.py", line 158, in parse
    cls.normalize_hostname(url_split_result.hostname),
  File "/home/at/.local/lib/python3.4/site-packages/wpull/url.py", line 168, in normalize_hostname
    return hostname.encode('idna').decode('ascii')
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
DEBUG Stopping. force=True
DEBUG Stopping. force=True
INFO FINISHED.

@chfoo
Copy link
Member

chfoo commented Mar 30, 2014

I forgot about Unicode decomposition:

>>> import wpull.url
>>> wpull.url.URLInfo.parse('http://maono.springhillnorthwest.c…')
URLInfo(scheme='http', netloc='maono.springhillnorthwest.c…', path='/', query=None, fragment='', username=None, password=None, hostname='maono.springhillnorthwest.c...', port=80, raw='http://maono.springhillnorthwest.c…', encoding='utf-8')

I think this might be a bug in Python.

@chfoo
Copy link
Member

chfoo commented Mar 30, 2014

I reported the bug at http://bugs.python.org/issue21103. I can easily work around this issue though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants