Browse files

Initial commit.

  • Loading branch information...
0 parents commit 2e581b7907540f2ebae032c1e8b1e53a9758d264 @aarongough committed Oct 15, 2010
Showing with 153 additions and 0 deletions.
  1. +55 −0 Readme.rdoc
  2. +37 −0 node_extensions.rb
  3. +30 −0 parser.rb
  4. +31 −0 sexp_parser.treetop
55 Readme.rdoc
@@ -0,0 +1,55 @@
+= Treetop S-Expression Parser
+
+This is a very small & simple demonstration parser built using {Treetop}[http://treetop.rubyforge.org/]. It is meant solely to teach the very basics of implementing a grammar in Treetop, not as an actual parser for S-Expressions (which is why it's not packaged as a gem, and why it has no tests).
+
+A real parser *should* improve in several main areas:
+
+* Test coverage: Unit tests are vital for parsers.
+* Performance: This example is slow in the extreme compared to simpler parsing methods, a real parser should be performant.
+* Error reporting: Even the best written parser is a complete pain if it doesn't have intelligent error reporting.
+
+=== The Core Grammar
+
+For easy reference the core grammar of the parser is reproduced here:
+
+ grammar Sexp
+
+ rule expression
+ space? '(' body ')' space? <Expression>
+ end
+
+ rule body
+ (expression / identifier / float / integer / string / space )* <Body>
+ end
+
+ rule integer
+ ('+' / '-')? [0-9]+ <IntegerLiteral>
+ end
+
+ rule float
+ ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
+ end
+
+ rule string
+ '"' ([^"\\] / "\\" . )* '"' <StringLiteral>
+ end
+
+ rule identifier
+ [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
+ end
+
+ rule space
+ [\s]+
+ end
+
+ end
+
+=== More Info
+
+For more info visit the {Treetop website}[http://treetop.rubyforge.org/], and have a read through the source-code of this parser (don't worry, there isn't much of it!).
+
+=== Author & Credits
+
+Author:: {Aaron Gough}[mailto:aaron@aarongough.com]
+
+Copyright (c) 2010 {Aaron Gough}[http://thingsaaronmade.com/] ({thingsaaronmade.com}[http://thingsaaronmade.com/]), released under the MIT license
37 node_extensions.rb
@@ -0,0 +1,37 @@
+module Sexp
+ class IntegerLiteral < Treetop::Runtime::SyntaxNode
+ def to_array
+ return self.text_value.to_i
+ end
+ end
+
+ class StringLiteral < Treetop::Runtime::SyntaxNode
+ def to_array
+ return eval self.text_value
+ end
+ end
+
+ class FloatLiteral < Treetop::Runtime::SyntaxNode
+ def to_array
+ return self.text_value.to_f
+ end
+ end
+
+ class Identifier < Treetop::Runtime::SyntaxNode
+ def to_array
+ return self.text_value.to_sym
+ end
+ end
+
+ class Expression < Treetop::Runtime::SyntaxNode
+ def to_array
+ return self.elements[0].to_array
+ end
+ end
+
+ class Body < Treetop::Runtime::SyntaxNode
+ def to_array
+ return self.elements.map {|x| x.to_array}
+ end
+ end
+end
30 parser.rb
@@ -0,0 +1,30 @@
+require 'treetop'
+
+require File.expand_path(File.join(File.dirname(__FILE__), 'node_extensions.rb'))
+
+class Parser
+
+ Treetop.load(File.expand_path(File.join(File.dirname(__FILE__), 'sexp_parser.treetop')))
+ @@parser = SexpParser.new
+
+ def self.parse(data)
+ tree = @@parser.parse(data)
+
+ if(tree.nil?)
+ raise Exception, "Parse error at offset: #{@@parser.index}"
+ end
+
+ self.clean_tree(tree)
+
+ return tree.to_array
+ end
+
+ private
+
+ def self.clean_tree(root_node)
+ return if(root_node.elements.nil?)
+ root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
+ root_node.elements.each {|node| self.clean_tree(node) }
+ end
+
+end
31 sexp_parser.treetop
@@ -0,0 +1,31 @@
+grammar Sexp
+
+ rule expression
+ space? '(' body ')' space? <Expression>
+ end
+
+ rule body
+ (expression / identifier / float / integer / string / space )* <Body>
+ end
+
+ rule integer
+ ('+' / '-')? [0-9]+ <IntegerLiteral>
+ end
+
+ rule float
+ ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
+ end
@timcharper
timcharper added a line comment Jul 5, 2012

Hi Aaron,

Thank you for your blog post tutorial on how to use treetop.

I wasn't able to get this code to work as is... I wonder if there have been updates to Treetop since? Parsing integers work, but when I use a decimal, it's like the integer rule takes precedence, parses the number, and then errors out when reaching to the decimal.

Transposing the integer and float rules fixed it and allowed me to parse the sexp example on your blog post.

I'll send a pull request if you like.

@aarongough
Owner
aarongough added a line comment Jul 16, 2012
@timcharper
timcharper added a line comment Jul 16, 2012

Now that I've learned treetop and I revisit this, I question why I wasn't able to get it to work as is.The order of float / integer is clearly specified in the rule above. I'll take a second look and see why I ran into it.

The pull request is unrelated. I found that treetop provided it's own failure reason, and it turns out to be pretty helpful. I can understand the hesitation to pull something in without testing it yourself. However, I think if you look at the diff you'll see that it's a very small change.

@timcharper
timcharper added a line comment Jul 16, 2012

I went back to try to reproduce my error, and I think I was deceived into thinking it was a float parse error. I understand how it happened and on what I was tripped up, but probably too lengthy to describe.

The code here works just fine.

@aarongough
Owner
aarongough added a line comment Jul 17, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+
+ rule string
+ '"' ([^"\\] / "\\" . )* '"' <StringLiteral>
+ end
+
+ rule identifier
+ [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
+ end
+
+ rule space
+ [\s]+
+ end
+
+end

0 comments on commit 2e581b7

Please sign in to comment.