public
Description: Fork of A Ruby Information Extraction Library
Homepage: http://rubyforge.org/projects/ariel
Clone URL: git://github.com/jashmenn/ariel.git
ariel /
name age message
file LICENSE Wed Aug 09 07:55:50 -0700 2006 Readying for release, added extra labeled examp... [Alex Bradbury]
file README Tue Aug 22 11:17:00 -0700 2006 Ready for release [Alex Bradbury]
file Rakefile Mon Sep 11 03:14:38 -0700 2006 r33@faifbook: asb | 2006-09-06 14:12:43 +0100... [Alex Bradbury]
file TODO Mon Sep 11 03:15:57 -0700 2006 r38@faifbook: asb | 2006-09-10 16:13:03 +0100... [Alex Bradbury]
file ariel.gemspec Sun Aug 20 15:27:32 -0700 2006 More changes and fixups before release. Mostly ... [Alex Bradbury]
directory bin/ Sun Aug 20 15:27:32 -0700 2006 More changes and fixups before release. Mostly ... [Alex Bradbury]
directory examples/ Sun Aug 20 13:24:16 -0700 2006 Hopefull very almost ready for release. Complet... [Alex Bradbury]
directory lib/ Fri Sep 15 04:44:18 -0700 2006 r48@faifbook: asb | 2006-09-15 12:43:57 +0100... [Alex Bradbury]
directory site/ Sat Sep 02 13:04:17 -0700 2006 Webgen site pages [Alex Bradbury]
directory test/ Fri Sep 15 04:44:18 -0700 2006 r48@faifbook: asb | 2006-09-15 12:43:57 +0100... [Alex Bradbury]
README
= Ariel release 0.1.0

== About - Ariel: A Ruby Information Extraction Library
Ariel is a library that allows you to extract information from semi-structured
documents (such as websites). It is different to existing tools because rather
than expecting the developer to write rules to extract the desired information,
Ariel will use a small number of labeled examples to generate and learn
effective extraction rules. It is developed by Alex Bradbury and released under
the MIT license. Ariel was started as a Google Summer of Code project mentored
by Austin Ziegler in 2006.

== Install
gem install ariel

== Announcement

I'm happy to announce the release of Ariel 0.1.0, the result of my Summer of
Code work. This release should be easy to use, very functional, and hopefully
useful - so it's worth trying out. I've put a lot of effort in to writing clear
and straightforward documentation to get your started, so take a look at the
docs available at http://ariel.rubyforge.org. In particular, flick through the
tutorial and quick start guide. If you're interested, you may also want to take
a look at the theory page where I've made a good start on describing the method
Ariel uses to learn extraction rules. If you have any problems or find any bugs,
just send me an email or add it to the issue tracker (see link below). Enjoy.
See the FAQ for a vim snippet to make labeling examples a little easier.

== Quickstart/Basic usage

* @require 'ariel'@
* Define a structure for the information you wish to extract: 
    structure = Ariel::Node::Structure.new do |r|
      r.item :title
      r.item :body
      r.list :comments do |c|
        c.list_item :comment do |d|
          d.item :author
          d.item :body
        end
      end
     end
* Collect a few examples of the sort of document you wish to extract information
  from (pages from the same website for instance).
* Label each example with tags such as <l:title>, <l:comment> and so on in the
  relevant places.
*  Ariel.learn structure, labeled_file1, labeled_file2, labeled_file3
* Find the documents you want to extract information from.
*  extractions = Ariel.extract structure, unlabeled_file1,
  unlabeled_file2
*  extractions[0].search('comments/*/body').each {|e| puts e.extracted_text} =>
  "Great stuff, loving it", "I love life", .....
*  extractions[0].at('comments/34') => nil</tt> (there is no 34th comment, #at
  returns the first result rather than an array of matches).


== Credits
Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
mentoring of Austin Ziegler.

== Links
SVN Repository: http://rubyforge.org/projects/ariel
Issue tracker: http://code.google.com/p/ariel/issues/
Documentation/homepage: http://ariel.rubyforge.org
RDoc: http://ariel.rubyforge.org/rdoc/