-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Real Parser. Closes #229 and closes #225. #235
Changes from 38 commits
2332d86
ee14775
61a6deb
76272a1
b20a594
84be895
f43e973
4da7b36
0453d7e
17d818b
6738266
24ddaf1
c0b9d53
8896b55
8b1dff9
87b8ee7
4dc9cc0
83e71ac
1b43bf5
bacacf2
bf53e51
a892e69
8dcf44e
be4a04e
bc76c0d
c8bd0b9
d5d41a8
8f4b398
525e1ff
8ca0098
3b3961b
346e92a
1458396
84f0c1b
c5afdc5
f6eacbf
8242312
ace12e2
48f50ee
15b53b7
6cde983
eb68a75
047900d
324d26d
0beb4a4
14a1752
93fcd56
b0cba52
77db92d
5bdfb62
86ba2f4
dd3196b
c94b5e8
e305edc
26eb9a0
1fa029a
7b52dfc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,12 +7,25 @@ require 'rubygems/package_task' | |
|
||
task :default => 'test' | ||
|
||
Rake::TestTask.new(:test) do |t| | ||
Rake::TestTask.new(:lax_test) do |t| | ||
t.libs << '.' << 'lib' << 'test' | ||
t.test_files = FileList['test/liquid/**/*_test.rb'] | ||
t.options = 'lax' | ||
t.verbose = false | ||
end | ||
|
||
Rake::TestTask.new(:strict_test) do |t| | ||
t.libs << '.' << 'lib' << 'test' | ||
t.test_files = FileList['test/liquid/**/*_test.rb'] | ||
t.verbose = false | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does this one not have |
||
|
||
desc 'runs test suite with both strict and lax parsers' | ||
task :test do | ||
Rake::Task['lax_test'].invoke | ||
Rake::Task['strict_test'].invoke | ||
end | ||
|
||
gemspec = eval(File.read('liquid.gemspec')) | ||
Gem::PackageTask.new(gemspec) do |pkg| | ||
pkg.gem_spec = gemspec | ||
|
@@ -27,9 +40,13 @@ namespace :benchmark do | |
|
||
desc "Run the liquid benchmark" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we change the description, like the other one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think the lax parser should be the default for the performance benchmarks, since that's the one that is going to be used mostly? |
||
task :run do | ||
ruby "./performance/benchmark.rb" | ||
ruby "./performance/benchmark.rb strict" | ||
end | ||
|
||
desc "Run the liquid benchmark with lax parsing" | ||
task :lax do | ||
ruby "./performance/benchmark.rb lax" | ||
end | ||
end | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
require "strscan" | ||
module Liquid | ||
class Lexer | ||
SPECIALS = { | ||
'|' => :pipe, | ||
'.' => :dot, | ||
':' => :colon, | ||
',' => :comma, | ||
'[' => :open_square, | ||
']' => :close_square, | ||
'(' => :open_round, | ||
')' => :close_round | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the reasoning here for having symbols for all those special chars instead of just using the chars themselves in the code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So that things are consistent and the token type is always a symbol and because symbols are fast to create and are more memory efficient. I might do a performance test on them later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to, just curious. But I agree, it's kind of cleaner. |
||
IDENTIFIER = /[\w\-?!]+/ | ||
SINGLE_STRING_LITERAL = /'[^\']*'/ | ||
DOUBLE_STRING_LITERAL = /"[^\"]*"/ | ||
NUMBER_LITERAL = /-?\d+(\.\d+)?/ | ||
COMPARISON_OPERATOR = /==|!=|<>|<=?|>=?|contains/ | ||
|
||
def initialize(input) | ||
@ss = StringScanner.new(input) | ||
end | ||
|
||
def tokenize | ||
@output = [] | ||
|
||
loop do | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like infinite loops because it's not immediately obvious when it terminates. while !@ss.eos?
end There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I used to have that but the problem is it needs to consume whitespace before checking for eos. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could consume it before the loop though, no? |
||
@ss.skip(/\s*/) | ||
|
||
tok = case | ||
when @ss.eos? then nil | ||
when t = @ss.scan(COMPARISON_OPERATOR) then [:comparison, t] | ||
when t = @ss.scan(SINGLE_STRING_LITERAL) then [:string, t] | ||
when t = @ss.scan(DOUBLE_STRING_LITERAL) then [:string, t] | ||
when t = @ss.scan(NUMBER_LITERAL) then [:number, t] | ||
when t = @ss.scan(IDENTIFIER) then [:id, t] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are the most common cases in usual Liquid code first here? Also I'm not sure if I'm a fan of using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They are in the right order such that they will parse correctly. For example, comparison can be As for the |
||
else | ||
c = @ss.getch | ||
if s = SPECIALS[c] | ||
[s,c] | ||
else | ||
raise SyntaxError, "Unexpected character #{c}." | ||
end | ||
end | ||
|
||
unless tok | ||
@output << [:end_of_string] | ||
return @output | ||
end | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @output << [:end_of_string] Returns the resulting return @output.push([:end_of_string]) unless tok |
||
@output << tok | ||
end | ||
end | ||
|
||
protected | ||
def lex_specials | ||
c = @ss.getch | ||
if s = SPECIALS[c] | ||
return Token.new(s,c) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's |
||
end | ||
|
||
raise SyntaxError, "Unexpected character #{c}." | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
module Liquid | ||
# This class is used by tags to parse themselves | ||
# it provides helpers and encapsulates state | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need that comment |
||
class Parser | ||
def initialize(input) | ||
l = Lexer.new(input) | ||
@tokens = l.tokenize | ||
@p = 0 # pointer to current location | ||
end | ||
|
||
def jump(point) | ||
@p = point | ||
end | ||
|
||
def consume(type = nil) | ||
token = @tokens[@p] | ||
if type && token[0] != type | ||
raise SyntaxError, "Expected #{type} but found #{@tokens[@p]}" | ||
end | ||
@p += 1 | ||
token[1] | ||
end | ||
|
||
# Only consumes the token if it matches the type | ||
# Returns the token's contents if it was consumed | ||
# or false otherwise. | ||
def consume?(type) | ||
token = @tokens[@p] | ||
return false unless token && token[0] == type | ||
@p += 1 | ||
token[1] | ||
end | ||
|
||
# Like consume? Except for an :id token of a certain name | ||
def id?(str) | ||
token = @tokens[@p] | ||
return false unless token && token[0] == :id | ||
return false unless token[1] == str | ||
@p += 1 | ||
token[1] | ||
end | ||
|
||
def look(type, ahead = 0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor thing, but There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's tricky. In parsing and compilers the concept is called lookahead so the operation is abbreviated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah cool! no worries |
||
tok = @tokens[@p + ahead] | ||
return false unless tok | ||
tok[0] == type | ||
end | ||
|
||
# === General Liquid parsing functions === | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Meh |
||
|
||
def expression | ||
token = @tokens[@p] | ||
if token[0] == :id | ||
variable_signature | ||
elsif [:string, :number].include? token[0] | ||
consume | ||
token[1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
elsif token.first == :open_round | ||
consume | ||
first = expression | ||
consume(:dot) | ||
consume(:dot) | ||
last = expression | ||
consume(:close_round) | ||
"(#{first}..#{last})" | ||
else | ||
raise SyntaxError, "#{token} is not a valid expression." | ||
end | ||
end | ||
|
||
def argument | ||
str = "" | ||
# might be a keyword argument (identifier: expression) | ||
if look(:id) && look(:colon, 1) | ||
str << consume << consume << ' ' | ||
end | ||
|
||
str << expression | ||
end | ||
|
||
def variable_signature | ||
str = consume(:id) | ||
if look(:open_square) | ||
str << consume | ||
str << expression | ||
str << consume(:close_square) | ||
end | ||
if look(:dot) | ||
str << consume | ||
str << variable_signature | ||
end | ||
str | ||
end | ||
end | ||
end |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,26 @@ | ||
module Liquid | ||
class Tag | ||
attr_accessor :nodelist | ||
attr_accessor :nodelist, :options | ||
|
||
def self.new_with_options(tag_name, markup, tokens, options) | ||
# Forgive me Matz for I have sinned. | ||
# I have forsaken the holy idioms of Ruby and used Class#allocate. | ||
# I fulfilled my mandate by maintaining API compatibility and performance, | ||
# even though it may displease your Lordship. | ||
# | ||
# In all seriousness though, I can prove to a reasonable degree of certainty | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The mathematician in me does not like the misuse of the word "prove" here. Can we change that to "argue" or "show" or something? :-) Or maybe remove that whole blog of comments. It's funny, but doesn't say anything. |
||
# that setting options before calling initialize is required to maintain API compatibility. | ||
# I tried doing it without it and not only did I break compatibility, it was much slower. | ||
new_tag = self.allocate | ||
new_tag.options = options | ||
new_tag.send(:initialize, tag_name, markup, tokens) | ||
new_tag | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explain why it's necessary to sin? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically the only way to maintain API compatibility and not mess everything up was to set the options attribute before calling initialize. The reason is that the parsing of tags is done in the initialize method and the options needs to get passed down before the parse. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should really redefine all liquid's Tag subclasses to take an optional options argument, then deprecate initialize without an options argument so that we can remove support for it in the next major release. We can check the method's arity to give a deprecation warning. |
||
end | ||
|
||
def initialize(tag_name, markup, tokens) | ||
@tag_name = tag_name | ||
@markup = markup | ||
@options ||= {} # needs || because might be set before initialize | ||
parse(tokens) | ||
end | ||
|
||
|
@@ -22,5 +38,20 @@ def render(context) | |
def blank? | ||
@blank || true | ||
end | ||
|
||
def switch_parse(markup) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, the name of that method is not super obvious to me. Can we call it |
||
case @options[:error_mode] || Template.error_mode | ||
when :strict then strict_parse(markup) | ||
when :lax then lax_parse(markup) | ||
when :warn | ||
begin | ||
return strict_parse(markup) | ||
rescue SyntaxError => e | ||
@warnings ||= [] | ||
@warnings << e | ||
return lax_parse(markup) | ||
end | ||
end | ||
end | ||
end # Tag | ||
end # Liquid |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -47,19 +47,7 @@ class For < Block | |
Syntax = /\A(#{VariableSegment}+)\s+in\s+(#{QuotedFragment}+)\s*(reversed)?/o | ||
|
||
def initialize(tag_name, markup, tokens) | ||
if markup =~ Syntax | ||
@variable_name = $1 | ||
@collection_name = $2 | ||
@name = "#{$1}-#{$2}" | ||
@reversed = $3 | ||
@attributes = {} | ||
markup.scan(TagAttributes) do |key, value| | ||
@attributes[key] = value | ||
end | ||
else | ||
raise SyntaxError.new("Syntax Error in 'for loop' - Valid syntax: for [item] in [collection]") | ||
end | ||
|
||
switch_parse(markup) | ||
@nodelist = @for_block = [] | ||
super | ||
end | ||
|
@@ -127,6 +115,43 @@ def render(context) | |
result | ||
end | ||
|
||
protected | ||
|
||
def lax_parse(markup) | ||
if markup =~ Syntax | ||
@variable_name = $1 | ||
@collection_name = $2 | ||
@name = "#{$1}-#{$2}" | ||
@reversed = $3 | ||
@attributes = {} | ||
markup.scan(TagAttributes) do |key, value| | ||
@attributes[key] = value | ||
end | ||
else | ||
raise SyntaxError.new("Syntax Error in 'for loop' - Valid syntax: for [item] in [collection]") | ||
end | ||
end | ||
|
||
def strict_parse(markup) | ||
p = Parser.new(markup) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kinda stop everytime I see There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point, however the parser methods are called a lot and it would look really ugly with a longer name. |
||
@variable_name = p.consume(:id) | ||
raise SyntaxError, "For loops require an 'in' clause" unless p.id?('in') | ||
@collection_name = p.expression | ||
@name = "#{@variable_name}-#{@collection_name}" | ||
@reversed = p.id?('reversed') | ||
|
||
@attributes = {} | ||
while p.look(:id) && p.look(:colon, 1) | ||
unless attribute = p.id?('limit') || p.id?('offset') | ||
raise SyntaxError, "Invalid attribute in for loop. Valid attributes are limit and offset" | ||
end | ||
p.consume | ||
val = p.expression | ||
@attributes[attribute] = val | ||
end | ||
p.consume(:end_of_string) | ||
end | ||
|
||
private | ||
|
||
def render_else(context) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we include the new template-local options thing in the docs too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.