New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed retrieved by urllib2 is sometimes truncated #1

wettenhj opened this Issue May 17, 2013 · 0 comments


None yet
1 participant

wettenhj commented May 17, 2013

I have experienced truncation of feeds retrieved by urllib2 as described here:
and here:

The behaviour from feedparser's point of view is this:

Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:32:06)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import feedparser
doc = feedparser.parse("")
if doc.bozo:
... raise doc.bozo_exception
Traceback (most recent call last):
File "", line 2, in
xml.sax._exceptions.SAXParseException: :137:14: unclosed token

The feed content used to trigger the error above is being dynamically generated by a Node.js application. If I instead serve the same feed content (saved into a static document) from an Apache web server, then the problem is avoided, so perhaps it is related to a timing issue, i.e. Node.js pausing part-way through serving up the atom feed. One timing issue which could affect urllib2 is case 3 in this question:

The truncation could be avoided by replacing use of the "urllib2" module in with use of the "requests" module, as described here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment