Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Real Parser. Closes #229 and closes #225. #235

Merged
merged 57 commits into from
Aug 30, 2013
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
2332d86
Slow lexer and parser scaffold.
trishume Jul 24, 2013
ee14775
Replace hand-coded lexer with faster hacky lexer.
trishume Jul 24, 2013
61a6deb
Descriptive comment for lexer
trishume Jul 24, 2013
76272a1
Bring back the lexer
trishume Jul 24, 2013
b20a594
Better lexer
trishume Jul 24, 2013
84be895
Fancy StringScanner based lexer
trishume Jul 24, 2013
f43e973
Basic expression parsing
trishume Jul 24, 2013
4da7b36
New variable parser!
trishume Jul 25, 2013
0453d7e
Fix benchmarks to use only valid liquid.
trishume Jul 25, 2013
17d818b
Fix profiler
trishume Jul 25, 2013
6738266
Unfinished if statement parser.
trishume Jul 26, 2013
24ddaf1
Test a different lexer architechture
trishume Jul 26, 2013
c0b9d53
Revert "Test a different lexer architechture"
trishume Jul 26, 2013
8896b55
Parsing for if statements
trishume Jul 26, 2013
8b1dff9
Allow ! in identifiers like Ruby
trishume Jul 26, 2013
87b8ee7
Add error mode switching
trishume Jul 26, 2013
4dc9cc0
Add back tests for lax parsing
trishume Jul 26, 2013
83e71ac
Add lexer tests and fixes
trishume Jul 26, 2013
1b43bf5
Add parser tests
trishume Jul 26, 2013
bacacf2
Remove the Token class from the lexer in favour of less smart but fas…
airhorns Jul 26, 2013
bf53e51
Inline Parser#next_token to avoid method dispatch
airhorns Jul 26, 2013
a892e69
Hopefully fix CI build
trishume Jul 26, 2013
8dcf44e
Faster token creation, hopefully.
trishume Jul 29, 2013
be4a04e
Merged array_tokens into recursive-parsing
trishume Jul 29, 2013
bc76c0d
Collapse float and int into 'number'
trishume Jul 29, 2013
c8bd0b9
Catch easy cases
trishume Jul 29, 2013
d5d41a8
Make previous commit work
trishume Jul 29, 2013
8f4b398
Abstract parser switching into tag
trishume Jul 29, 2013
525e1ff
Add range support
trishume Jul 29, 2013
8ca0098
Fixed ranges and added for loop parser
trishume Jul 29, 2013
3b3961b
Use lax mode by default so nothing breaks
trishume Jul 29, 2013
346e92a
Describe error modes in Readme
trishume Jul 29, 2013
1458396
Fix benchmark
trishume Jul 30, 2013
84f0c1b
Initial options passing
trishume Jul 30, 2013
c5afdc5
Shuffle logic around.
trishume Jul 30, 2013
f6eacbf
Add prayer for forgiveness.
trishume Jul 31, 2013
8242312
Run test suite with both parsers
trishume Aug 1, 2013
ace12e2
Hopefully fix CI on Rubinius
trishume Aug 2, 2013
48f50ee
Remove unused lex_specials method
trishume Aug 2, 2013
15b53b7
Make stuff nicer
trishume Aug 2, 2013
6cde983
More little fixes and changed default benchmark
trishume Aug 2, 2013
eb68a75
Hopefully fix CI by improving multi-suite runner.
trishume Aug 16, 2013
047900d
Proper warning support
trishume Aug 19, 2013
324d26d
Consistent lack of periods in syntax errors.
trishume Aug 19, 2013
0beb4a4
Add handy context to strict parser error messages.
trishume Aug 19, 2013
14a1752
Merge branch 'master' into recursive-parsing
trishume Aug 22, 2013
93fcd56
Broken warnings implementation.
trishume Aug 22, 2013
b0cba52
Fix warnings and make tags a proper syntax tree.
trishume Aug 22, 2013
77db92d
Better testing of warn mode.
trishume Aug 22, 2013
5bdfb62
Remove old warning method
trishume Aug 22, 2013
86ba2f4
Fix error message 1.8 compatibility
trishume Aug 22, 2013
dd3196b
Consistency in warnings.
trishume Aug 22, 2013
c94b5e8
Use attr_reader for warnings.
trishume Aug 22, 2013
e305edc
Remove extra comment
trishume Aug 27, 2013
26eb9a0
Merge pull request #244 from Shopify/proper-parse-warnings
trishume Aug 27, 2013
1fa029a
Simplify lexer logic.
trishume Aug 27, 2013
7b52dfc
Clean up lexer logic
trishume Aug 27, 2013
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,22 @@ For standard use you can just pass it the content of a file and call render with
@template.render('name' => 'tobi') # => "hi tobi"
```

### Error Modes

Setting the error mode of Liquid lets you specify how strictly you want your templates to be interpreted.
Normally the parser is very lax and will accept almost anything without error. Unfortunately this can make
it very hard to debug and can lead to unexpected behaviour.

Liquid also comes with a stricter parser that can be used when editing templates to give better error messages
when templates are invalid. You can enable this new parser like this:

```ruby
Liquid::Template.error_mode = :strict # Raises a SyntaxError when invalid syntax is used
Liquid::Template.error_mode = :warn # Adds errors to template.errors but continues as normal
Liquid::Template.error_mode = :lax # The default mode, accepts almost anything.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we include the new template-local options thing in the docs too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

```

It is recommended that you enable `:strict` or `:warn` mode on new apps to stop invalid templates from being created.
It is also recommended that you use it in the template editors of existing apps to give editors better error messages.

[![Build Status](https://secure.travis-ci.org/Shopify/liquid.png)](http://travis-ci.org/Shopify/liquid)
21 changes: 19 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,25 @@ require 'rubygems/package_task'

task :default => 'test'

Rake::TestTask.new(:test) do |t|
Rake::TestTask.new(:lax_test) do |t|
t.libs << '.' << 'lib' << 'test'
t.test_files = FileList['test/liquid/**/*_test.rb']
t.options = 'lax'
t.verbose = false
end

Rake::TestTask.new(:strict_test) do |t|
t.libs << '.' << 'lib' << 'test'
t.test_files = FileList['test/liquid/**/*_test.rb']
t.verbose = false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this one not have t.options = 'strict' like the other one has with lax? If we ever change the default, this test will not do what we expect?


desc 'runs test suite with both strict and lax parsers'
task :test do
Rake::Task['lax_test'].invoke
Rake::Task['strict_test'].invoke
end

gemspec = eval(File.read('liquid.gemspec'))
Gem::PackageTask.new(gemspec) do |pkg|
pkg.gem_spec = gemspec
Expand All @@ -27,9 +40,13 @@ namespace :benchmark do

desc "Run the liquid benchmark"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the description, like the other one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the lax parser should be the default for the performance benchmarks, since that's the one that is going to be used mostly?

task :run do
ruby "./performance/benchmark.rb"
ruby "./performance/benchmark.rb strict"
end

desc "Run the liquid benchmark with lax parsing"
task :lax do
ruby "./performance/benchmark.rb lax"
end
end


Expand Down
2 changes: 2 additions & 0 deletions lib/liquid.rb
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ module Liquid
end

require "liquid/version"
require 'liquid/lexer'
require 'liquid/parser'
require 'liquid/drop'
require 'liquid/extensions'
require 'liquid/errors'
Expand Down
4 changes: 2 additions & 2 deletions lib/liquid/block.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def parse(tokens)

# fetch the tag from registered blocks
if tag = Template.tags[$1]
new_tag = tag.new($1, $2, tokens)
new_tag = tag.new_with_options($1, $2, tokens, @options || {})
@blank &&= new_tag.blank?
@nodelist << new_tag
else
Expand Down Expand Up @@ -80,7 +80,7 @@ def block_name

def create_variable(token)
token.scan(ContentOfVariable) do |content|
return Variable.new(content.first)
return Variable.new(content.first, @options)
end
raise SyntaxError.new("Variable '#{token}' was not properly terminated with regexp: #{VariableEnd.inspect} ")
end
Expand Down
3 changes: 2 additions & 1 deletion lib/liquid/document.rb
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
module Liquid
class Document < Block
# we don't need markup to open this block
def initialize(tokens)
def initialize(tokens, options = {})
@options = options
parse(tokens)
end

Expand Down
64 changes: 64 additions & 0 deletions lib/liquid/lexer.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
require "strscan"
module Liquid
class Lexer
SPECIALS = {
'|' => :pipe,
'.' => :dot,
':' => :colon,
',' => :comma,
'[' => :open_square,
']' => :close_square,
'(' => :open_round,
')' => :close_round
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning here for having symbols for all those special chars instead of just using the chars themselves in the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that things are consistent and the token type is always a symbol and because symbols are fast to create and are more memory efficient. I might do a performance test on them later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to, just curious. But I agree, it's kind of cleaner.

IDENTIFIER = /[\w\-?!]+/
SINGLE_STRING_LITERAL = /'[^\']*'/
DOUBLE_STRING_LITERAL = /"[^\"]*"/
NUMBER_LITERAL = /-?\d+(\.\d+)?/
COMPARISON_OPERATOR = /==|!=|<>|<=?|>=?|contains/

def initialize(input)
@ss = StringScanner.new(input)
end

def tokenize
@output = []

loop do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like infinite loops because it's not immediately obvious when it terminates.

while !@ss.eos?
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I used to have that but the problem is it needs to consume whitespace before checking for eos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consume it before the loop though, no?

@ss.skip(/\s*/)

tok = case
when @ss.eos? then nil
when t = @ss.scan(COMPARISON_OPERATOR) then [:comparison, t]
when t = @ss.scan(SINGLE_STRING_LITERAL) then [:string, t]
when t = @ss.scan(DOUBLE_STRING_LITERAL) then [:string, t]
when t = @ss.scan(NUMBER_LITERAL) then [:number, t]
when t = @ss.scan(IDENTIFIER) then [:id, t]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the most common cases in usual Liquid code first here?

Also I'm not sure if I'm a fan of using then, I'd prefer newlines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are in the right order such that they will parse correctly. For example, comparison can be contains which is also a valid identifier so it has to parse comparisons first.

As for the thens, there are just so many cases and the body of the cases is so small that splitting it up might make it harder to read.

else
c = @ss.getch
if s = SPECIALS[c]
[s,c]
else
raise SyntaxError, "Unexpected character #{c}."
end
end

unless tok
@output << [:end_of_string]
return @output
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@output << [:end_of_string]

Returns the resulting @output already.

  return @output.push([:end_of_string]) unless tok 

@output << tok
end
end

protected
def lex_specials
c = @ss.getch
if s = SPECIALS[c]
return Token.new(s,c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's Token? Didn't we remove that? Why is this still here (and why does it not break any tests)?

end

raise SyntaxError, "Unexpected character #{c}."
end
end
end
95 changes: 95 additions & 0 deletions lib/liquid/parser.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
module Liquid
# This class is used by tags to parse themselves
# it provides helpers and encapsulates state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need that comment

class Parser
def initialize(input)
l = Lexer.new(input)
@tokens = l.tokenize
@p = 0 # pointer to current location
end

def jump(point)
@p = point
end

def consume(type = nil)
token = @tokens[@p]
if type && token[0] != type
raise SyntaxError, "Expected #{type} but found #{@tokens[@p]}"
end
@p += 1
token[1]
end

# Only consumes the token if it matches the type
# Returns the token's contents if it was consumed
# or false otherwise.
def consume?(type)
token = @tokens[@p]
return false unless token && token[0] == type
@p += 1
token[1]
end

# Like consume? Except for an :id token of a certain name
def id?(str)
token = @tokens[@p]
return false unless token && token[0] == :id
return false unless token[1] == str
@p += 1
token[1]
end

def look(type, ahead = 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thing, but peek is a more standard name for this kind of method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's tricky. In parsing and compilers the concept is called lookahead so the operation is abbreviated look.
To resolve the dilemma I choose look because that is what the code already uses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah cool! no worries

tok = @tokens[@p + ahead]
return false unless tok
tok[0] == type
end

# === General Liquid parsing functions ===
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh


def expression
token = @tokens[@p]
if token[0] == :id
variable_signature
elsif [:string, :number].include? token[0]
consume
token[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consume already returns token[1], doesn't it?

elsif token.first == :open_round
consume
first = expression
consume(:dot)
consume(:dot)
last = expression
consume(:close_round)
"(#{first}..#{last})"
else
raise SyntaxError, "#{token} is not a valid expression."
end
end

def argument
str = ""
# might be a keyword argument (identifier: expression)
if look(:id) && look(:colon, 1)
str << consume << consume << ' '
end

str << expression
end

def variable_signature
str = consume(:id)
if look(:open_square)
str << consume
str << expression
str << consume(:close_square)
end
if look(:dot)
str << consume
str << variable_signature
end
str
end
end
end
33 changes: 32 additions & 1 deletion lib/liquid/tag.rb
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
module Liquid
class Tag
attr_accessor :nodelist
attr_accessor :nodelist, :options

def self.new_with_options(tag_name, markup, tokens, options)
# Forgive me Matz for I have sinned.
# I have forsaken the holy idioms of Ruby and used Class#allocate.
# I fulfilled my mandate by maintaining API compatibility and performance,
# even though it may displease your Lordship.
#
# In all seriousness though, I can prove to a reasonable degree of certainty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mathematician in me does not like the misuse of the word "prove" here. Can we change that to "argue" or "show" or something? :-) Or maybe remove that whole blog of comments. It's funny, but doesn't say anything.

# that setting options before calling initialize is required to maintain API compatibility.
# I tried doing it without it and not only did I break compatibility, it was much slower.
new_tag = self.allocate
new_tag.options = options
new_tag.send(:initialize, tag_name, markup, tokens)
new_tag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why it's necessary to sin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the only way to maintain API compatibility and not mess everything up was to set the options attribute before calling initialize. The reason is that the parsing of tags is done in the initialize method and the options needs to get passed down before the parse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really redefine all liquid's Tag subclasses to take an optional options argument, then deprecate initialize without an options argument so that we can remove support for it in the next major release. We can check the method's arity to give a deprecation warning.

end

def initialize(tag_name, markup, tokens)
@tag_name = tag_name
@markup = markup
@options ||= {} # needs || because might be set before initialize
parse(tokens)
end

Expand All @@ -22,5 +38,20 @@ def render(context)
def blank?
@blank || true
end

def switch_parse(markup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the name of that method is not super obvious to me. Can we call it select_parser_from_options or something more obvious?

case @options[:error_mode] || Template.error_mode
when :strict then strict_parse(markup)
when :lax then lax_parse(markup)
when :warn
begin
return strict_parse(markup)
rescue SyntaxError => e
@warnings ||= []
@warnings << e
return lax_parse(markup)
end
end
end
end # Tag
end # Liquid
51 changes: 38 additions & 13 deletions lib/liquid/tags/for.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,19 +47,7 @@ class For < Block
Syntax = /\A(#{VariableSegment}+)\s+in\s+(#{QuotedFragment}+)\s*(reversed)?/o

def initialize(tag_name, markup, tokens)
if markup =~ Syntax
@variable_name = $1
@collection_name = $2
@name = "#{$1}-#{$2}"
@reversed = $3
@attributes = {}
markup.scan(TagAttributes) do |key, value|
@attributes[key] = value
end
else
raise SyntaxError.new("Syntax Error in 'for loop' - Valid syntax: for [item] in [collection]")
end

switch_parse(markup)
@nodelist = @for_block = []
super
end
Expand Down Expand Up @@ -127,6 +115,43 @@ def render(context)
result
end

protected

def lax_parse(markup)
if markup =~ Syntax
@variable_name = $1
@collection_name = $2
@name = "#{$1}-#{$2}"
@reversed = $3
@attributes = {}
markup.scan(TagAttributes) do |key, value|
@attributes[key] = value
end
else
raise SyntaxError.new("Syntax Error in 'for loop' - Valid syntax: for [item] in [collection]")
end
end

def strict_parse(markup)
p = Parser.new(markup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda stop everytime I see p thinking it's the p method, but it's probably fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, however the parser methods are called a lot and it would look really ugly with a longer name.

@variable_name = p.consume(:id)
raise SyntaxError, "For loops require an 'in' clause" unless p.id?('in')
@collection_name = p.expression
@name = "#{@variable_name}-#{@collection_name}"
@reversed = p.id?('reversed')

@attributes = {}
while p.look(:id) && p.look(:colon, 1)
unless attribute = p.id?('limit') || p.id?('offset')
raise SyntaxError, "Invalid attribute in for loop. Valid attributes are limit and offset"
end
p.consume
val = p.expression
@attributes[attribute] = val
end
p.consume(:end_of_string)
end

private

def render_else(context)
Expand Down
Loading