An RSS/Atom feed parsing layer for lxml.objectify in Python
Python
Switch branches/tags
Nothing to show
Permalink
Failed to load latest commit information.
feedreader
.gitignore Test env support with unittest2 Mar 25, 2011
LICENSE Added setuptools and version. Sep 5, 2009
MANIFEST.in Added setuptools and version. Sep 5, 2009
README.rst Moved feedreader.feedreader to feedreader.parser. You now MUST use fe… Sep 7, 2009
setup.py Test env support with unittest2 Mar 25, 2011

README.rst

Feedreader

A universal feed parser designed to operate on top of the lxml interface.

This is a VERY rough readme, and this project is very early in development. It however, is used to power Lifestrm.com.

Our mission was simple:

  • Don't write an XML parser (we use lxml)
  • Keep it transparent, but allow easy access to underlying objects.
  • Support as many services as possible, and make accessing their media easy.

Features

  • RSS 2.0 (incl. media enclosures)
  • Atom 1.0 (incl. link enclosures)

Installation

Usage

There are several methods which are usable to parse a feed:

from feedreader.parser import from_url
parsed = from_url('http://www.domain.com/rss.xml')

from feedreader.parser import from_string
parsed = from_string(open('my.rss', 'r').read())

from feedreader.parser import from_file
parsed = from_file(open('my.rss', 'r'))

Once you have initialized the parser, you will be able to access supported elements via a natural property syntax:

>>> parsed.title
My feed title
>>> parsed.link
http://www.domain.com/rss.xml
>>> parsed.published
datetime.datetime(2009, 8, 13, 2, 53, 11, 867908)

For the entries in a feed, you may use the entries accessor:

>>> parsed.entries
[<Entry ...>, <Entry ...>, <Entry ...>]

And each entry also supports similar common attributes:

>>> parsed.entries[0].title
My Article Name
>>> parsed.entries[0].link
http://www.domain.com/my-article-name

Keeping with our goals of allowing access to underlying XML, feedreader is a simple proxy. What this means is that while we provide accessors for many common attributes across feeds, you can still get at any XML element fairly easily:

>>> parsed.myUnsupportedXMLTag
(Fill me in with whatever lxml would return)