Feed retrieved by urllib2 is sometimes truncated #1

wettenhj opened this Issue May 17, 2013 · 0 comments


I have experienced truncation of feeds retrieved by urllib2 as described here:
and here:

The behaviour from feedparser's point of view is this:

Python 2.7.3 (v2.7.3:70274d53c1dd, Apr 9 2012, 20:32:06)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import feedparser
doc = feedparser.parse("")
if doc.bozo:
... raise doc.bozo_exception
Traceback (most recent call last):
File "", line 2, in
xml.sax._exceptions.SAXParseException: :137:14: unclosed token

The feed content used to trigger the error above is being dynamically generated by a Node.js application. If I instead serve the same feed content (saved into a static document) from an Apache web server, then the problem is avoided, so perhaps it is related to a timing issue, i.e. Node.js pausing part-way through serving up the atom feed. One timing issue which could affect urllib2 is case 3 in this question:

The truncation could be avoided by replacing use of the "urllib2" module in feedparser.py with use of the "requests" module, as described here:

