<?xml version="1.0" encoding="UTF-8"?>
<commit>
  <added type="array">
    <added>
      <filename>LICENSE</filename>
    </added>
    <added>
      <filename>ariel.gemspec</filename>
    </added>
    <added>
      <filename>bin/ariel</filename>
    </added>
    <added>
      <filename>examples/raa/labeled/highline.html</filename>
    </added>
  </added>
  <modified type="array">
    <modified>
      <diff>@@ -0,0 +1,98 @@
+= Ariel release 0.0.1
+
+== Install
+gem install ariel
+
+== Announcement
+This is the first public release of Ariel - A Ruby Information Extraction
+Library. See my previous post, ruby-talk:200140[http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/200140]
+for more background information. This release supports defining a tree document
+structure and learning rules to extract each node of this true. Handling of list
+extraction and learning is not yet implemented, and is the next immediate
+priority. See the examples directory included in this release and below for
+discussion of the included examples. Rule learning is functional, and appears to
+work well, but many refinements are possible. Look out for more updates and a
+new releases shortly.
+
+== About Ariel
+Ariel intends to assist in extracting information from semi-structured
+documents including (but not in any way limited to) web pages. Although you
+may use libraries such as Hpricot or Rubyful Soup, or even plain Regular
+Expressions to achieve the same goal, Ariel approaches the problem very
+differently. Ariel relies on the user labeling examples of the data they
+want to extract, and then finds patterns across several such labeled
+examples in order to produce a set of general rules for extracting this
+information from any similar document. It uses the MIT license.
+
+== Examples
+This release includes two examples in the example directory (which should now
+be in the directory to which rubygems installed ariel). The first is the
+google_calculator directory (inspired by Justin Bailey's post to my Ariel
+progress report). The structure is very simple, a calculation is extracted from
+the page, and then the actual result is extracted from that calculation. 3
+labeled examples are included. Ariel reads each of these, tokenizes them,
+and extracts each label. 4 sets of rules are learnt:
+1. Rules to locate the start of the calculation in the original document.
+2. Rules to locate the end of the calculation in the original document (applied
+   from the end of the document).
+3. Rules to locate the start of the result of the calculation from the
+   extracted calculation.
+4. Rules to locate the end of the result of the calculation from the extracted
+   calculation (applied from the end of the calculation).
+
+Take note of 3 and 4 - this is the advantage of treating a document as a tree in
+this way. Deeply nested elements can be located by generating a series of simple
+rules, rather than generating a rule with complexity that increases at each
+level. Sets of rules are generated because it may not be possible to generate a
+single rule that will catch all cases. A rule is found that matches as many of
+the examples as possible (and fails on the rest), these examples are then removed
+and a rule is found that will match as many of the remaining examples and so on.
+When it comes to applying these learnt rules, the rules are applied in order
+until there is a rule that matches.
+
+To see this example for yourself just execute structure.rb in the
+examples/google_calculator directory to create a locally writable
+structure.yaml. Then do:
+  ariel -D -m learn -s structure.yaml -d /path/to/examples/google_calculator/labeled
+
+You'll have to wait a while (see my note about performance below). At the end,
+the learnt rules will be printed in YAML format, and structure.yaml will be
+updated to include these rules. Apply these learnt rules to some unlabeled
+documents by doing:
+  ariel -D -m extract -s structure.yaml -d /path/to/examples/google_calculator/unlabeled
+
+You should see the results of a successful extraction printed to your terminal,
+such as this one:
+
+  Results for unlabeled/2:
+  calculation: 3.5 U.S. dollars = 1.8486241 British pounds
+  result: 1.8486241 British pounds
+
+The second example (raa) learns rules using just 2 labeled examples. This is probably
+fewer than I'd recommend in most cases, but as it works... This example consists
+of project entries in the Ruby Application Archive. The structure of the page is
+very flat, so all rules are applied to the full page. Rules are learnt and
+applied as shown above. The structure.yaml files included in the examples
+directories already include rules generated by Ariel, use these if you just want
+to see extraction working.
+
+Note: The interface demonstrated by ariel above is not very flexible or
+friendly, it's just to serve as a demonstration for the moment.
+
+== Performance
+Generating rules takes quite a long time. It is always going to be an intensive
+operation, but there are some very simple and obvious improvements in efficiency
+that can be made. For a start, the rule candidate refining process currently
+re-applies the same rules over and over every time the remaining rule candidates
+are ranked. This is where most time is spent, and caching these should make a
+big difference. This will definitely be implemented. Other performance
+enhancements are bound to be there, but my focus at this time is to get
+something that works.
+
+== Credits
+Ariel is developed by Alex Bradbury as a Google Summer of Code project under the
+mentoring of Austin Ziegler.
+
+== Links
+Watch my development through the subversion repository at http://rubyforge.org/projects/ariel
+I've also just started using the tracker at http://code.google.com/p/ariel/</diff>
      <filename>README</filename>
    </modified>
    <modified>
      <diff>@@ -1,12 +1,12 @@
 --- &amp;id001 !ruby/object:Ariel::StructureNode 
 children: 
-  :short_description: !ruby/object:Ariel::StructureNode 
+  :version_history: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :version_history
         :node_type: :not_list
-        :name: :short_description
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
@@ -14,65 +14,64 @@ children:
         direction: :back
         landmarks: 
         - - &lt;/td&gt;
-        - - Category
-        - - &lt;/td&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
         - - &lt;td&gt;
-  :current_version: !ruby/object:Ariel::StructureNode 
+        - - Versions
+        - - &lt;td&gt;
+  :short_description: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :short_description
         :node_type: :not_list
-        :name: :current_version
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :back
         landmarks: 
-        - - &lt;/p&gt;
-        - - table
-        - - &lt;/p&gt;
+        - - &lt;/td&gt;
+        - - Category
+        - - &lt;/td&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
-        - - /
-        - - caption
-        - - /
-  :category: !ruby/object:Ariel::StructureNode 
+        - - &lt;td&gt;
+  :current_version: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :current_version
         :node_type: :not_list
-        :name: :category
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :back
         landmarks: 
-        - - &lt;/td&gt;
-        - - Status
-        - - &lt;/td&gt;
+        - - &lt;/p&gt;
+        - - table
+        - - &lt;/p&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
-        - - &lt;td&gt;
-        - - &lt;td&gt;
+        - - /
+        - - caption
+        - - /
   :homepage: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
-        :node_type: :not_list
         :name: :homepage
+        :node_type: :not_list
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
@@ -89,36 +88,35 @@ children:
         - - &quot;&gt;&quot;
         - - rubyforge
         - - &quot;&gt;&quot;
-  :owner: !ruby/object:Ariel::StructureNode 
+  :category: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :category
         :node_type: :not_list
-        :name: :owner
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :back
         landmarks: 
-        - - &lt;/a&gt;
-        - - id
-        - - &lt;/a&gt;
+        - - &lt;/td&gt;
+        - - Status
+        - - &lt;/td&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
-        - - &quot;&gt;&quot;
-        - - com
-        - - &quot;&gt;&quot;
+        - - &lt;td&gt;
+        - - &lt;td&gt;
   :name: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
-        :node_type: :not_list
         :name: :name
+        :node_type: :not_list
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
@@ -133,36 +131,36 @@ children:
         - - &quot;-&quot;
         - - RAA
           - &quot;-&quot;
-  :license: !ruby/object:Ariel::StructureNode 
+  :owner: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :owner
         :node_type: :not_list
-        :name: :license
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :back
         landmarks: 
-        - - &lt;/td&gt;
-        - - Dependency
-        - - &lt;/td&gt;
+        - - &lt;/a&gt;
+        - - id
+        - - &lt;/a&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
-        - - &lt;td&gt;
-        - - License
-        - - &lt;td&gt;
-  :version_history: !ruby/object:Ariel::StructureNode 
+        - - &quot;&gt;&quot;
+        - - Owner
+        - - &quot;&gt;&quot;
+  :license: !ruby/object:Ariel::StructureNode 
     children: {}
 
     meta: !ruby/object:OpenStruct 
       table: 
+        :name: :license
         :node_type: :not_list
-        :name: :version_history
     parent: *id001
     ruleset: !ruby/object:Ariel::RuleSet 
       end_rules: 
@@ -170,14 +168,16 @@ children:
         direction: :back
         landmarks: 
         - - &lt;/td&gt;
+        - - Dependency
+        - - &lt;/td&gt;
       start_rules: 
       - !ruby/object:Ariel::Rule 
         direction: :forward
         landmarks: 
         - - &lt;td&gt;
-        - - Versions
+        - - License
         - - &lt;td&gt;
 meta: !ruby/object:OpenStruct 
   table: 
-    :node_type: :not_list
     :name: :root
+    :node_type: :not_list</diff>
      <filename>examples/raa/structure.yaml</filename>
    </modified>
    <modified>
      <diff>@@ -12,12 +12,11 @@ require 'ariel/example_document_loader'
 require 'ariel/rule_set'
 
 if $DEBUG
-  require 'breakpoint'
-  require 'logger'
+#  require 'logger'
 
-  DEBUGLOG = Logger.new(File.open('debug.log', 'wb'))
-  DEBUGLOG.datetime_format = &quot; \010&quot;
-  DEBUGLOG.progname = &quot;\010\010\010&quot;
+#  DEBUGLOG = Logger.new(File.open('debug.log', 'wb'))
+#  DEBUGLOG.datetime_format = &quot; \010&quot;
+#  DEBUGLOG.progname = &quot;\010\010\010&quot;
 
   def debug(message)
      p message
@@ -40,7 +39,7 @@ end
 #
 # When working with Ariel, your workflow might look something like this:
 # 1. Define a structure for the data you wish to extract. For example:
-#    
+#
 #     @structure = Ariel::StructureNode.new do |r|
 #       r.article do |a|
 #         a.title</diff>
      <filename>lib/ariel.rb</filename>
    </modified>
    <modified>
      <diff>@@ -52,7 +52,6 @@ module Ariel
       end
       tokenstream.rewind
       regex = self.label_regex(name.to_s)[re_index]
-      p regex
       debug &quot;Seeking #{name.to_s} of type #{type}&quot;
       nesting_level=0
       tokenstream.each do |token|</diff>
      <filename>lib/ariel/label_utils.rb</filename>
    </modified>
  </modified>
  <removed type="array">
    <removed>
      <filename>bin/ariel.rb</filename>
    </removed>
  </removed>
  <parents type="array">
    <parent>
      <id>5e45781e0c614a1f10793e60f852cdc6fd98769e</id>
    </parent>
  </parents>
  <author>
    <name>Alex Bradbury</name>
    <email>rforge @nospam@ tekcentral.org</email>
  </author>
  <url>http://github.com/jashmenn/ariel/commit/d17cf0f0d258c1d1334a8949d61cef43142bdfdf</url>
  <id>d17cf0f0d258c1d1334a8949d61cef43142bdfdf</id>
  <committed-date>2006-08-09T07:55:50-07:00</committed-date>
  <authored-date>2006-08-09T07:55:50-07:00</authored-date>
  <message>Readying for release, added extra labeled example for raa, moved ariel.rb-&gt;ariel (why did I give it the .rb extension in the first place?). Added LICENSE (MIT), wrote a gemspec.</message>
  <tree>3be692ed85158191617da0d0984b0c34329fba2a</tree>
  <committer>
    <name>Alex Bradbury</name>
    <email>rforge @nospam@ tekcentral.org</email>
  </committer>
</commit>
