You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Parse lxml ERR] line 1045: Tag nav invalid
[Article parse ERR] http://www.cnet.com/products/apple-ipad-march-2012/
You must download and parse an article before parsing it!
Traceback (most recent call last):
File "crawler.py", line 30, in <module>
a.nlp()
File "/root/.virtualenvs/cnet-crawler/local/lib/python2.7/site-packages/newspaper/article.py", line 276, in nlp
raise ArticleException()
newspaper.article.ArticleException
I'm not using the concurrent version, I'm not building a newspaper from a url, but rather I have a list of all the articles and I build a new Article from them.
The text was updated successfully, but these errors were encountered:
I've seen this error before. From my personal experience it occurs when the HTML you are trying to parse is much too "deformed" for lxml. (The error is complaining about a <nav> tag).
Here the stack trace:
I'm not using the concurrent version, I'm not building a newspaper from a url, but rather I have a list of all the articles and I build a new Article from them.
The text was updated successfully, but these errors were encountered: