-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exceptions with bad parsing #9
Comments
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 431, in _readable |
[D 120826 11:32:46 existing:67] Q0 getting content for 4c3edf3a8229cd http://www.dafont.com/ Exception in thread Thread-1: |
[D 120827 11:13:41 existing:67] Q0 getting content for 4c3edf3a8229cd http://www.dafont.com/ |
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in *bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, _self.__kwargs)
File "scripts/readability/existing.py", line 65, in fetch_content
read = ReadUrl.parse(url)
File "/home/rharding/src/bookie/bookie/lib/readable.py", line 171, in parse
if not document.readable:
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in __get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 426, in readable
return tounicode(self._readable)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 431, in _readable
if self.candidates:
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 419, in candidates
doc = self.doc
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/readable.py", line 409, in doc
doc = self.orig.html
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/utils.py", line 55, in get
value = self.fget(inst)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 93, in html
return self._parse(self.orig_html)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 80, in _parse
doc = build_doc(html)
File "/home/rharding/src/bookie/local/lib/python2.7/site-packages/breadability/document.py", line 54, in build_doc
page_unicode = page.decode(enc, 'replace')
TypeError: decode() argument 1 must be string, not None
The text was updated successfully, but these errors were encountered: