Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A Microdata parser/extractor library for Ruby
Ruby
Tag: v0.2.0

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
spec
.gitignore
CHANGELOG.rdoc
LICENSE.rdoc
README.rdoc
Rakefile
TODO.rdoc

README.rdoc

Mida

Description

A Microdata parser and extractor library for ruby. This is based on the latest Published version of the Microdata Specification dated 5th April 2011.

Installation

Mida keeps RubyGems up-to-date with its latest version, so installing is as easy as:

gem install mida

Requirements:

  • Nokogiri

Usage

The following examples assume that you have required mida and open-uri.

Extracting Microdata from a page

All the Microdata is extracted from a page when a new Mida::Document instance is created.

To extract all the Microdata from a webpage:

url = 'http://example.com'
open(url) {|f| doc = Mida::Document.new(f, url)}

The top-level Items will be held in an array accessible via doc.items.

To simply list all the top-level Items that have been found:

puts doc.items

Searching

If you want to search for an Item that has a specific itemtype/vocabulary this can be done with the search method.

To return all the Items that use one of Google's Review vocabularies:

doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)

Inspecting an Item

Each Item is a Mida::Item instance and has three main methods of interest, type, properties and id.

To find out the itemtype of the Item:

puts doc.items.first.type

To find out the itemid of the Item:

puts doc.items.first.id

Properties are returned as a hash containing name/values pairs. The values will be an array of either String or Mida::Item instances.

To see the properties of the Item:

puts doc.items.first.properties

Working with Vocabularies

Mida allows you to define vocabularies, so that input data can be constrained to match expected patterns. By default a generic vocabulary (Mida::Vocabulary::Generic) is registered which will match against any itemtype with any number of properties.

If you want to specify a vocabulary you create a class derived from Mida::VocabularyDesc and use itemtype, has_one, has_many and types to describe the vocabulary.

As an example the following describes a subset of Google's Review vocabulary:

class Review < Mida::VocabularyDesc
  itemtype %r{http://data-vocabulary.org/review}
  has_one 'itemreviewed'
  has_one 'rating'
end

To register the above Vocabulary use:

Mida::Vocabulary.register(Review)

Now if Mida is parsing some input and manages to match against the Review itemtype, it will only allow the specified properties and will reject any that don't have the correct number. It will also set Item#vocabulary accordingly, e.g.

doc.items.first.vocabulary      # => Review

Bugs/Feature Requests

If you find a bug or want to make a feature request, please report it at the Mida project's issues tracker on github.

License

Copyright © 2011 Lawrence Woodman. This software is licensed under the MIT License. Please see the file, LICENSE.rdoc, for details.

Something went wrong with that request. Please try again.