You must download and parse an article before parsing it #52

bitliner · 2014-05-30T13:48:36Z

Here the stack trace:

[Parse lxml ERR] line 1045: Tag nav invalid
[Article parse ERR] http://www.cnet.com/products/apple-ipad-march-2012/
You must download and parse an article before parsing it!
Traceback (most recent call last):
  File "crawler.py", line 30, in <module>
    a.nlp()
  File "/root/.virtualenvs/cnet-crawler/local/lib/python2.7/site-packages/newspaper/article.py", line 276, in nlp
    raise ArticleException()
newspaper.article.ArticleException

I'm not using the concurrent version, I'm not building a newspaper from a url, but rather I have a list of all the articles and I build a new Article from them.

The text was updated successfully, but these errors were encountered:

codelucas · 2014-05-31T19:03:37Z

I'll test this on my computer and get back to ya.

I've seen this error before. From my personal experience it occurs when the HTML you are trying to parse is much too "deformed" for lxml. (The error is complaining about a <nav> tag).

http://stackoverflow.com/questions/4967103/beautifulsoup-and-lxml-html-what-to-prefer

BeautifulSoup is preferred for "non well-formed" html. (You have the option of using both lxml or BeautifulSoup to parse, but lxml is much faster.

Casyfill · 2016-09-02T15:11:37Z

But how can I switch to bf parser? Can't find any documentation on that

cesarandreslopez · 2016-09-23T15:34:59Z

Same here. Not sure how to change the parsing to BeautifulSoup

go2dmny · 2017-02-21T07:20:31Z

Any update? Also looking for a solution

codelucas closed this as completed Jun 14, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You must download and parse an article before parsing it #52

You must download and parse an article before parsing it #52

bitliner commented May 30, 2014

codelucas commented May 31, 2014

Casyfill commented Sep 2, 2016

cesarandreslopez commented Sep 23, 2016

go2dmny commented Feb 21, 2017

You must download and parse an article before parsing it #52

You must download and parse an article before parsing it #52

Comments

bitliner commented May 30, 2014

codelucas commented May 31, 2014

Casyfill commented Sep 2, 2016

cesarandreslopez commented Sep 23, 2016

go2dmny commented Feb 21, 2017