New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page (possibly UTF content) makes plugins/web/titleSnarfer breaks (Uncaught exception) #1359
Comments
Hi, This is caused by a bug in Python 2's You should also upgrade to Python 3 if possible ( Thanks for the report |
Hi there. The site link source says 'meta charset="UTF-8"', still, the ".encode('utf-8')" failed. I have searched but couldn't find python 2 functions to retrieve url encoding from source (like requests.get.encoding in python 3), to try it more. The way I have it now is from this block:
To:
Like I said had to load a new import to limnoria lxml.html :( Thanks! |
Could you post the new error message?
There's |
Hi ProgVal.
For that new code block version I'm testing got 'AttributeError: 'NoneType' object has no attribute 'text'' in ' title = utils.str.normalizeWhitespace(pageparse.find(".//title").text)', but this one is easier to understand and fix. Thanks! |
The Web plugin already does... And everything we're dealing with here is either ASCII or UTF-8 so the problem's not there. I'm sorry, but I don't know what to do other than upgrading to Python 3 |
Nice! Thanks for the explanation. |
Backport fixes for the Web plugin [1][2][3]. [1] progval/Limnoria#1371 [2] progval/Limnoria#1362 [3] progval/Limnoria#1359 Submitted by: DanDare (GitHub: Rodrigo-NH, via IRC) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@513446 35697150-7ecd-e111-bb59-0022644237b5
Backport fixes for the Web plugin [1][2][3]. [1] progval/Limnoria#1371 [2] progval/Limnoria#1362 [3] progval/Limnoria#1359 Submitted by: DanDare (GitHub: Rodrigo-NH, via IRC)
Backport fixes for the Web plugin [1][2][3]. [1] progval/Limnoria#1371 [2] progval/Limnoria#1362 [3] progval/Limnoria#1359 Submitted by: DanDare (GitHub: Rodrigo-NH, via IRC) git-svn-id: svn+ssh://svn.freebsd.org/ports/head@513446 35697150-7ecd-e111-bb59-0022644237b5
Backport fixes for the Web plugin [1][2][3]. [1] progval/Limnoria#1371 [2] progval/Limnoria#1362 [3] progval/Limnoria#1359 Submitted by: DanDare (GitHub: Rodrigo-NH, via IRC)
Hi. While trying to check why titleSnarfer won't return Title's page for http://lastsummer.de/creating-custom-packages-on-freebsd found this on the logs:
In this block from plugins/Web/plugin.py:
It seems that 'text' is the whole page to be parsed by HTMLparser. Anyway, changing line 166 (in my plugin.py copy) from
parser.feed(text)
to
parser.feed(text.encode('utf-8'))
Fixed the problem for this specific page while other pages (as far as I tested) keeps working as usual.
Can't conclude what is the problem or how relevant it could be, reporting here in case this example is useful.
The current (running) version of this Limnoria is installed on 2018-12-22T04-00-46, running on Python 2.7.15 (default, Dec 20 2018, 01:13:53) [GCC 4.2.1 Compatible FreeBSD Clang 6.0.0 (tags/RELEASE_600/final 326565)]. The newest versions available online are 2019.01.27 (in testing), 2018.12.19 (in master).
Salute!
The text was updated successfully, but these errors were encountered: