Mida¶ ↑

Description¶ ↑

A Microdata parser and extractor library for ruby. This is based on the latest Published version of the Microdata Specification dated 5th April 2011.

Installation¶ ↑

Mida keeps RubyGems up-to-date with its latest version, so installing is as easy as:

gem install mida

Requirements:¶ ↑

Nokogiri

Command Line Usage¶ ↑

To use the command line tool, supply it with the urls or filenames that you would like to be parsed (by default each item is output as yaml):

mida http://lawrencewoodman.github.io/mida/news/

If you want to search for specific types you can use the -t switch followed by a Regular Expression:

mida -t /person/i http://lawrencewoodman.github.io/mida/news/

For more information look at mida‘s help:

mida -h

Library Usage¶ ↑

The following examples assume that you have required mida and open-uri.

Extracting Microdata from a page¶ ↑

All the Microdata is extracted from a page when a new Mida::Document instance is created.

To extract all the Microdata from a webpage:

url = 'http://example.com'
open(url) {|f| doc = Mida::Document.new(f, url)}

The top-level Items will be held in an array accessible via doc.items.

To simply list all the top-level Items that have been found:

puts doc.items

Searching¶ ↑

If you want to search for an Item that has a specific itemtype/vocabulary this can be done with the search method.

To return all the Items that use one of Google’s Review vocabularies:

doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)

Inspecting an `Item`¶ ↑

Each Item is a Mida::Item instance and has four main methods of interest: type, vocabulary, properties and id.

To find out the itemtype of the Item:

puts doc.items.first.type

To find out the itemid of the Item:

puts doc.items.first.id

Properties are returned as a hash containing name/values pairs. The values will be an array of either String or Mida::Item instances.

To see the properties of the Item:

puts doc.items.first.properties

Working with Vocabularies¶ ↑

Mida allows you to define vocabularies, so that input data can be constrained to match expected patterns. By default a generic vocabulary (Mida::GenericVocabulary) is registered which will match against any itemtype with any number of properties.

If you want to specify a vocabulary you create a class derived from Mida::Vocabulary. As an example the following describes a subset of Google’s Review vocabulary:

class Rating < Mida::Vocabulary
  itemtype %r{http://data-vocabulary.org/rating}i
  has_one 'best'
  has_one 'worst'
  has_one 'value'
end

class Review < Mida::Vocabulary
  itemtype %r{http://data-vocabulary.org/review}i
  has_one 'itemreviewed'
  has_one 'rating' do
    extract Rating, Mida::DataType::Text
  end
end

When you create a subclass of Mida::Vocabulary it automatically registers the Vocabulary.

Now if Mida is parsing some input and manages to match against the Review itemtype, it will only allow the specified properties and will reject any that don’t have the correct number. It will also set Item#vocabulary accordingly, e.g.

doc.items.first.vocabulary      # => Review

If you want to include the properties of another vocabulary you can use include_vocabulary:

class Thing < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/thing}i
  has_one 'name', 'description'
end

class Book < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/book}i
  include_vocabulary Thing
  has_one 'title', 'author'
end

class Collection < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/collection}i
  has_many 'item' do
    extract Thing
  end
end

In the above if you gave a Book as an item of Collection this would be accepted because it includes the Thing vocabulary.

Bugs/Feature Requests¶ ↑

If you find a bug or want to make a feature request, please report it at the Mida project’s issues tracker on github.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
bin		bin
lib		lib
resources/schema.org		resources/schema.org
spec		spec
.gitignore		.gitignore
.rspec		.rspec
.travis.yml		.travis.yml
CHANGELOG.rdoc		CHANGELOG.rdoc
Gemfile		Gemfile
LICENCE.rdoc		LICENCE.rdoc
README.rdoc		README.rdoc
Rakefile		Rakefile
TODO.rdoc		TODO.rdoc
mida.gemspec		mida.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mida¶ ↑

Description¶ ↑

Installation¶ ↑

Requirements:¶ ↑

Command Line Usage¶ ↑

Library Usage¶ ↑

Extracting Microdata from a page¶ ↑

Searching¶ ↑

Inspecting an `Item`¶ ↑

Working with Vocabularies¶ ↑

Bugs/Feature Requests¶ ↑

Licence¶ ↑

About

Releases

Packages

Contributors 7

Languages

License

lawrencewoodman/mida

Folders and files

Latest commit

History

Repository files navigation

Mida¶ ↑

Description¶ ↑

Installation¶ ↑

Requirements:¶ ↑

Command Line Usage¶ ↑

Library Usage¶ ↑

Extracting Microdata from a page¶ ↑

Searching¶ ↑

Inspecting an Item¶ ↑

Working with Vocabularies¶ ↑

Bugs/Feature Requests¶ ↑

Licence¶ ↑

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Inspecting an `Item`¶ ↑

Packages