Skip to content

Provides an interface on top of nokogiri to parse XML files like Apache Digester.

Notifications You must be signed in to change notification settings

at-point/sawtooth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

                                                               __
                                                   _____....--' .'
                                      _..___...---'._ o      -`(
                       _             | |  _         \   .--.  `\
    ___  __ ___      _| |_ ___   ___ | |_| |__      |   \   \ `|
   / __|/ _` \ \ /\ / / __/ _ \ / _ \| __| '_ \     |o o |  |  |
   \__ \ (_| |\ V  V /| || (_) | (_) | |_| | | |     \___'.-`.  '.
   |___/\__,_| \_/\_/  \__\___/ \___/ \__|_| |_|          |   `---'
  '^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'

A companion for nokogori to parse XML files by rules, similar to Apache Commons Digester.

Converting XML structures into Ruby is most often an unsatisfying task, having to choose between implementing a SAX parser (for speed) or using nokogiri features like CSS selectors for ease of use. At it's base sawtooth is parsing documents using SAX, but provides an interface to specify rules for the handling the document.

require 'open-uri'
require 'sawtooth'

rules = Sawtooth.rules do
  before { |doc| doc << [] }                  # 1. create an array for all news items

  on 'rss/channel/item' do
    on_start  { |doc| doc << Hash.new }       # 2. on an item create hash
    on_finish { |doc| doc.parent << doc.pop } # 3. when closing an item, pop from stack and
  end                                         #    append to parent array (from step 1.)

  on_text 'rss/channel/item/*'                # 4. add contents to hash
end

result = rules.parse(open('http://rss.cnn.com/rss/edition.rss')).root
p result #=> [{ 'title' => 'Some CNN News...', 'guid' =>, ...}, ...]

This sample shows the DSL exposed to create the XML parsing rules for an RSS feed.

About

Provides an interface on top of nokogiri to parse XML files like Apache Digester.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages