0
<meta name="author" content="Jeff Hodges (http://somethingsimilar.com)"><title>Universal Feed Parser in Ruby</title></head><body>
0
<h2>Introduction to rFeedParser, the Universal Feed Parser in Ruby</h2>
0
<p>rFeedParser is a translation of Mark Pilgrim's <a href="http://feedparser.org/">Universal Feed Parser</a> from Python into Ruby. It has nearly the exact same behavior.</p>
0
<p><pre> require 'rubygems'
0
fp['entries'].each{ |e| puts e['title'] }
0
- <p><strong>Notes on Time/Date Storage:</strong>
0
-The storage of dates in the *_parsed items (such as, updated_parsed,
0
-created_parsed, etc.) are in the Python 9-tuple format. While this is
0
-great for passing the date tests, Ruby has no good way of understanding
0
-these things. As of version <code>0.9.94</code>, there is now a "_time" key
0
-next to every key containing a Python datetime 9-tuple that contains a Ruby
0
-Time object. So, if you see <code>updated</code> or
0
-<code>updated_parsed</code>, you can also find a <code>updated_time</code>
0
-right next door. The use of the <code>py2rtime</code> is deprecated and no
0
-longer accessible from the top level. You can still find it at
0
-<code>FeedParser.py2rtime</code></p>
0
- <p>More documentation can be found over at the <a href="http://feedparser.org/docs/">Feed Parser documentation</a> site.</p>
0
+ <p><strong>Notes on Time/Date Storage:</strong> The storage of
0
+ dates in the *_parsed items (such as, updated_parsed,
0
+ created_parsed, etc.) are in the Python 9-tuple format. While
0
+ this is great for passing the date tests, Ruby has no good way
0
+ of understanding these things. As of version
0
+ <code>0.9.94</code>, there is now a "_time" key next to every
0
+ key containing a Python datetime 9-tuple that contains a Ruby
0
+ Time object. So, if you see <code>updated</code> or
0
+ <code>updated_parsed</code>, you can also find a
0
+ <code>updated_time</code> right next door. The use of the
0
+ <code>py2rtime</code> is deprecated and no longer accessible
0
+ from the top level. You can still find it at
0
+ <code>FeedParser.py2rtime</code></p>
0
+ <p>More documentation can be found over at the <a
0
+ href="http://feedparser.org/docs/">Feed Parser
0
+ documentation</a> site.</p>
0
- <p>There is one extra thing. You can subclass <code>StrictFeedParser</code> or <code>LooseFeedParser</code> and pass it to <code>FeedParser#parse</code> with the keys <code>:strict</code> and <code>:loose</code> and your subclass will be used to parse the feeds in its respective case.</p>
0
+ <p>There is one extra thing. You can subclass
0
+ <code>StrictFeedParser</code> or <code>LooseFeedParser</code>
0
+ and pass it to <code>FeedParser#parse</code> with the keys
0
+ <code>:strict</code> and <code>:loose</code> and your subclass
0
+ will be used to parse the feeds in its respective case.</p>
0
- <p>The latest release can is <a href="http://rubyforge.org/frs/?group_id=3309">0.9.94</a> and you can track new releases, send bug reports or just yell at me belligerently over at the <a href="http://rubyforge.org/projects/rfeedparser/">rFeedParser project page</a>.</p>
0
+ <p>The latest release can is <a
0
+ href="http://rubyforge.org/frs/?group_id=3309">0.9.94</a> and
0
+ you can track new releases, send bug reports or just yell at
0
+ me belligerently over at the <a
0
+ href="http://rubyforge.org/projects/rfeedparser/">rFeedParser
0
<h2>Depedencies and Installation</h2>
0
- <p>Currently, rFeedParser is a bit of a Frankenstein's monster. It depends on <a href="http://code.whytheluckystiff.net/hpricot/">Hpricot</a>, <a href="http://rubyforge.org/projects/char-encodings/">Character-Encodings</a>, <a href="http://as.rubyonrails.com/">ActiveSupport</a>, <a href="http://rubyforge.org/projects/ruby-htmltools">HTMLTools</a>, <a href="http://htmlentities.rubyforge.org/">HTMLEntities</a>, and the not-quite-easy-to-get <a href="http://www.yoshidam.net/Ruby.html#xmlparser">Ruby bindings</a> of the <a href="http://expat.sourceforge.net/">Expat XML Parser</a>.
0
-The first four are easy to install (they all come as gems, and the
0
-dependencies are built in to the rfeedparser gem), while the Expat
0
-bindings are less so. If you are on a Ubuntu or Debian
0
-system, you can simply install the <code>libxml-parser-ruby1.8</code> package. Others will have to download the source code from <a href="http://www.yoshidam.net/Ruby.html#xmlparser">Yoshida Masato's site</a> and compile it yourself (Sorry). I do have a gem for <a href="http://rubyforge.org/projects/xmlparser">xmlparser</a> which includes some patches from Debian, but there have been difficulties compiling it on certain systems, specifically some MacOSX 10.4 boxes. For some reason, it compiles fine on mine but not others. I'd love some help figuring out this Works for Me™ bug.</p>
0
+ <p>Currently, rFeedParser is a bit of a Frankenstein's
0
+ monster. It depends on <a
0
+ href="http://code.whytheluckystiff.net/hpricot/">Hpricot</a>,
0
+ href="http://rubyforge.org/projects/char-encodings/">Character-Encodings</a>,
0
+ <a href="http://as.rubyonrails.com/">ActiveSupport</a>, <a
0
+ href="http://rubyforge.org/projects/ruby-htmltools">HTMLTools</a>,
0
+ href="http://htmlentities.rubyforge.org/">HTMLEntities</a>;
0
+ all available as gems.
0
+ In addition, bindings to a native XML parser are
0
+ recommended. The easiest thing to do is install the libxml2
0
+ bindings, which are available on Debian-based systems in the
0
+ <tt>libxml-ruby1.8</tt> package. But you can also use the
0
+ expat bindings in <tt>libxml-parser-ruby1.8</tt>. (confused
0
dependencies are all being used because they allow for (mostly) UTF-8
0
safe parsing and speed out the wazoo. rFeedParser runs at roughly the
0
@@ -58,7 +87,9 @@ the curious, rFeedParser's XML test cases are exactly the same XML
0
files that the original Universal Feed Parser is tested against. It
0
currently succeeds on 98.7% of the tests, with some of the failures
0
coming from "superficial" problems ("would pass if this little twiddly
0
-bit that doesn't mean anything was different"). I hope to correct these
0
+bit that doesn't mean anything was different"). Others are encoding-specific,
0
+so if you really care about getting those EBIDIC- or MacTurkish-encoded
0
+feeds right, please submit a patch.I hope to correct these
0
in the near future. I should point out here that this large of a
0
success rate over such a mature and complex testing environment makes
0
rFeedParser, by far, the most viable of the current crop of Ruby feed
Comments
No one has commented yet.