Add a Real Parser. Closes #229 and closes #225. #235

Merged
merged 57 commits into from Aug 30, 2013

Projects

None yet

8 participants

@trishume
Contributor

@fw42 @RichardBlair @boourns @tobi
This massive pull request contains a brand new parser and lexer for variable tags and if statements that can be easily extended to work with all other tags. It uses proper parsing so it is both easier to understand and less fraught with quirks and bugs. It gives real syntax errors that tell the user what they did wrong.

The parser and lexer are made as utilities that can easily be used in other tags.

This fixes many problems with the liquid parser including closing #229 and #225. It makes the variable tag sane and
tells users what they did wrong instead of silently doing whatever the heck it wants.

Backwards Compatibility

I also added a system for determining the parsing mode. By default it is set to a compatibility mode where if a syntax error occurs it will add an error message and then fall back on the old parser instead of crashing. It can also be set to use either only the old parser or only the new parser.

Shopify could be configured to use the warning mode so that existing templates would continue working but it would show syntax warnings in the theme editor. In Liquid 2.6 the warning mode should be the default but in Liquid 3.0 the strict mode could be the default.

Tests

I added many tests for both the lexer and parser classes as well as fixing tests to expect syntax errors. I kept the tests for the lax parser and added a helper that sets the parsing mode to lax. Interestingly, there were even tests
that asserted that certain bugs like #225 existed.

Internals

Two new classes were added Liquid::Parser and Liquid::Lexer. The lexer is a hand-written one using StringScanner and the parser is a hand-written LL(k) recursive descent parser.

Tags can use the parser class as a utility to do their own parsing. Here is an example of using the parser class:

p = Parser.new(markup)
# Could be just filters with no input
@name = p.look(:pipe) ? '' : p.expression
while p.consume?(:pipe)
  filtername = p.consume(:id)
  filterargs = p.consume?(:colon) ? parse_filterargs(p) : []
  @filters << [filtername, filterargs]
end
p.consume(:end_of_string)

Examples

Here are some examples of what it fixes:

{{ variable.: }}
old => ""
new => Liquid::SyntaxError: Expected id but found <colon: ':'>

{% assign foo = 1 + 2 %}{{ foo }}
old => "1"
new => Liquid::SyntaxError: Unexpected character +

{% if true && false %} wrong {% endif %}
old => "wrong"
new => Liquid::SyntaxError: Unexpected character &

{{---E(R[\'\"\'\"\'(+=EH%*^(@#^%$)?||?\eE,PUZE:::~~~~``}}
old => ""
new => Liquid::SyntaxError: Unexpected character (.

Benchmarks

I added a new benchmarking command rake benchmark:lax which runs the benchmarks with the old parser.

Old parser (rake benchmark:lax):

Rehearsal ------------------------------------------------
parse:         3.840000   0.050000   3.890000 (  4.026889)
parse & run:   8.110000   0.050000   8.160000 (  8.154465)
-------------------------------------- total: 12.050000sec

                   user     system      total        real
parse:         3.410000   0.020000   3.430000 (  3.436315)
parse & run:   7.950000   0.040000   7.990000 (  8.140029)

New parser (rake benchmark:run):

Rehearsal ------------------------------------------------
parse:         5.560000   0.040000   5.600000 (  5.745558)
parse & run:  10.800000   0.040000  10.840000 ( 10.930402)
-------------------------------------- total: 16.440000sec

                   user     system      total        real
parse:         5.450000   0.020000   5.470000 (  5.575437)
parse & run:  10.630000   0.040000  10.670000 ( 10.783157)

The new parser is a bit slower. According to basic profiling the performance issues are mostly in the lexer, which I could try writing in Ragel but I think the added complexity of Ragel outweighs the performance benefits.

@trishume
Contributor

The CI seems to break but the tests pass fine on my computer. I will look into why they fail later. I think I just have to require "stringscanner" in the lexer.

@fw42
Member
fw42 commented Jul 26, 2013

Haven't looked at the code yet, but it sounds awesome from the description. Although "fast" might be more important than "awesome" :-(

@fw42
Member
fw42 commented Jul 26, 2013

cc @hornairs for performance discussion

@airhorns
Member

Herm. I don't think we can accept a perf hit for cleaner code either. That sucks tho cause this is glorious.

I tried removing the Token class and inlining some Parser methods and shaved a bit of time off, see recursive-parsing...array_tokens :

oration ~/C/liquid  (array_tokens*) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         4.300000   0.020000   4.320000 (  4.329285)
parse & run:   9.470000   0.030000   9.500000 (  9.499332)
-------------------------------------- total: 13.820000sec

                   user     system      total        real
parse:         4.260000   0.010000   4.270000 (  4.263429)
parse & run:   9.380000   0.030000   9.410000 (  9.409803)
oration ~/C/liquid  (array_tokens) ➜  git co recursive-parsing
oration ~/C/liquid  (recursive-parsing) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         5.240000   0.020000   5.260000 (  5.259000)
parse & run:  10.710000   0.040000  10.750000 ( 10.755690)
-------------------------------------- total: 16.010000sec

                   user     system      total        real
parse:         5.170000   0.010000   5.180000 (  5.183431)
parse & run:  10.310000   0.020000  10.330000 ( 10.325539)

but as you said @trishume it looks like a lot of the time is spent doing Regexp#===. Is the order in which the regexps are scanned for determined by token precedence? Could we mess with that order to get more "hits" earlier on in the case statement? Also, I think it might be worth exploring Ragel cause I believe there are already a few grammars hanging out on the internet and you did the awesome work to let two versions co-exist, which I think was most of the battle.

Awesome job @trishume

EDIT: added stuff about method inlining

@trishume
Contributor

@hornairs I don't think the Regexps can be rearanged that much. I used to have them in optimal order and it was a little bit faster but then I kept finding problems like floating point numbers being lexed as an integer and then a dot and another integer. Right now they are in the right order for dependencies.

I also tried putting all the smaller regexes into a bigger one and then disambiguating with capture groups but that wasn't any faster.

Also keep in mind that this is not only about cleaner code. It also has the significant benefit of telling theme editors where they made a mistake rather than silently doing the wrong thing.

I'll see what I can do about optimizing it some more.

@nickpearson
Contributor

I've never written a complex parser before so I'm not sure this is practical, but I thought it'd be worth throwing out there. Is it possible with the new parser to look at a Liquid segment/tag before parsing it? If so, it may save on performance to parse a segment/tag with the existing regex parser if it can be known beforehand that the segment/tag uses proper syntax.

I realize this adds an extra regex check for every segment/tag, but if the cost of this check pays off in a high enough percentage of cases (where the Liquid code is very simple) by using the faster parser, it may be worth exploring.

For example, to check for simple, valid segments:

r = /\{\{ *\w+(?: *\| *\w+)* *\}\}/

# matches
"{{some_var}}"                       =~ r #=> 0
"{{some_var|upcase}}"                =~ r #=> 0
"{{ some_var | upcase }}"            =~ r #=> 0
"{{ some_var | upcase | downcase }}" =~ r #=> 0

# non-matches
"{{'val'}}"           =~ r #=> nil
"{{ array[0] }}"      =~ r #=> nil
"{{ var | test: 1 }}" =~ r #=> nil
"{{ var | test 1 }}"  =~ r #=> nil
"{{ var | test: }}"   =~ r #=> nil

This regex will match the simplest segments: a variable reference with zero or more filters with no filter parameters. My guess is that a large chunk of the invalid Liquid syntax is in filter parameters (the other large chunk being in tag attributes). Something similar might be practical for simple tags, though it would be need to be a bit more complex to also match a high percentage of tags.

If this is worth looking into, it may also be worth knowing what percentage of Liquid segments in Shopify templates are without any filter parameters. I just used this to check all the templates in my app:

simple = 0
total = 0
Template.all.each do |template|
  segments = template.text.to_s.scan(/\{\{.*?\}\}/)
  total += segments.size
  simple += segments.select do |segment|
    segment =~ /\{\{ *\w+(?: *\| *\w+)* *\}\}/
  end.size
end
puts "#{simple}/#{total} = #{simple/total.to_f}"

And by the way, awesome work on this @trishume. I'm hoping the performance hit can be worked out or at least be shown to be worth it.

@fw42 fw42 and 1 other commented on an outdated diff Jul 27, 2013
lib/liquid/lexer.rb
+ end
+
+ def next_token
+ consume_whitespace
+ return if @ss.eos?
+
+ case
+ when t = @ss.scan(COMPARISON_OPERATOR) then Token[:comparison, t]
+ when t = @ss.scan(SINGLE_STRING_LITERAL) then Token[:string, t]
+ when t = @ss.scan(DOUBLE_STRING_LITERAL) then Token[:string, t]
+ when t = @ss.scan(FLOAT_LITERAL) then Token[:float, t]
+ when t = @ss.scan(INTEGER_LITERAL) then Token[:integer, t]
+ when t = @ss.scan(IDENTIFIER) then Token[:id, t]
+ else
+ lex_specials
+ end
@fw42
fw42 Jul 27, 2013 Member

Maybe I don't get it, but this whole case block looks like it's scanning the same string 6 times in the worst case. Shouldn't that be possible with only one iteration?

@trishume
trishume Jul 29, 2013 Contributor

Tobi, Harry and I talked about that, I'm going to try something out.

@fw42 fw42 and 1 other commented on an outdated diff Jul 27, 2013
lib/liquid/lexer.rb
@@ -0,0 +1,92 @@
+require "strscan"
+module Liquid
+ class Token
+ attr_accessor :type, :contents
+ def initialize(*args)
+ @type, @contents = args
+ end
+
+ def self.[](*args)
@fw42
fw42 Jul 27, 2013 Member

What is this for exactly? Is that just syntax sugar (aka one additional method call) or is it necessary?

@trishume
trishume Jul 29, 2013 Contributor

You're right. Harry says using arrays instead of the Token object is faster and this might be why. I like the syntactic sugar but if getting rid of it means I get to keep token objects then I will.

I'll try this and benchmark it.

@trishume
Contributor

Ok, update on performance improvements:

  • Using a single regex instead of multiple regexes is not faster.
  • Deciding which regex to use based on the first character is not faster.
  • Array tokens are faster. Thanks @hornairs.
  • Making float and int together into number is slightly faster.
  • Using a regex to catch easy cases is faster. Thanks @nickpearson.

Recent benchmarks

New parser:

Rehearsal ------------------------------------------------
parse:         4.020000   0.030000   4.050000 (  4.042116)
parse & run:   9.600000   0.040000   9.640000 (  9.821808)
-------------------------------------- total: 13.690000sec

                   user     system      total        real
parse:         3.980000   0.020000   4.000000 (  3.998113)
parse & run:   9.960000   0.040000  10.000000 ( 10.198723)

Old parser:

Rehearsal ------------------------------------------------
parse:         3.620000   0.040000   3.660000 (  3.808780)
parse & run:   8.480000   0.040000   8.520000 (  8.626910)
-------------------------------------- total: 12.180000sec

                   user     system      total        real
parse:         3.510000   0.020000   3.530000 (  3.530844)
parse & run:   8.360000   0.040000   8.400000 (  8.497708)

How about those benchmarks @hornairs?

@fw42
Member
fw42 commented Jul 29, 2013

What are we benchmarking here? Only the variables and if tags? How much slower is the benchmark going to be when we do this for all the tags? Also, how much faster is it going to be if we remove the "which parser to use?" check?

@fw42 fw42 commented on an outdated diff Jul 29, 2013
lib/liquid/tags/if.rb
- condition = Condition.new($1, $2, $3)
+ def parse_condition(markup)
+ case Template.error_mode
@fw42
fw42 Jul 29, 2013 Member

Is this check happening for every tag? Can't we somehow check that only once when beginning to parse?

@trishume
Contributor

@fw42 The which parser to use check takes almost no time. I don't think it will get that much slower moving to other tags because if statements and variable tags are two of the most common tags, they are also some of the tags with the most complex parsing logic. I'm going to add new parsing code for for tags as well and see what that does.

@trishume
Contributor

Ok I added for loop support and ranges which made things a tiny bit slower but now all the tags that are both common and complex are covered so benchmarks shouldn't get significantly worse from now on. Even now the benchmarks aren't that bad, especially when you consider that you can fall back on lax mode during normal template rendering and only use strict mode when editing templates to make sure they are correct.

@fw42 I checked and the error mode flag checking does not affect performance of the normal parser. In lax mode the parser is just as fast as normal.

Updated benchmarks

New parser:

Rehearsal ------------------------------------------------
parse:         4.620000   0.040000   4.660000 (  4.834257)
parse & run:  10.750000   0.050000  10.800000 ( 10.944449)
-------------------------------------- total: 15.460000sec

                   user     system      total        real
parse:         4.520000   0.020000   4.540000 (  4.693500)
parse & run:  10.740000   0.060000  10.800000 ( 10.820457)

Old parser:

Rehearsal ------------------------------------------------
parse:         3.570000   0.040000   3.610000 (  3.601569)
parse & run:   8.310000   0.040000   8.350000 (  8.491684)
-------------------------------------- total: 11.960000sec

                   user     system      total        real
parse:         3.540000   0.020000   3.560000 (  3.565765)
parse & run:   8.230000   0.040000   8.270000 (  8.408360)
@trishume
Contributor

@tobi and I discussed this and we have decided to put it in lax mode by default for now but document the other mode in the Readme for developers. Further pull requests can add the ability to set the error mode per-template or bump up the default error mode.

I think this is ready to merge. Any more code review @fw42 @hornairs?

@fw42
Member
fw42 commented Jul 29, 2013

lax mode is the old one? So nothing changes? What's the point of merging that (and running it on Shopify)? We should at least log the warnings so we can begin with fixing templates, shouldn't we?

@trishume
Contributor

The difference at the moment is that it is possible to enable better error checking. This is much better for new projects.

It is possible that apps could treat things in the .errors array as failure even if the template renders properly.
In order to use it in Shopify it needs the additional step of refactoring a bunch of things so that per-template error_mode options are possible. With the current architecture it has to be global. Once we have that we can enable warn mode in the template editor and use lax mode for normal theme rendering because it is faster.

@nickpearson
Contributor

Even with lax mode being the default, I think this would be really useful for template validation. In my apps, I only allow a template to be saved if it doesn't contain any Liquid syntax errors (which is a fairly limited check at present).

It would be great to use the new parser for validating syntax errors during admin operations (e.g., modifying a template) even if the lax mode is used for runtime.

@trishume
Contributor

The question is should I wait to merge this till the error_mode can be refactored to be per-template or merge it now?
Error mode is not that useful globally since the main use is setting it differently depending on where the template is being used.

@fw42
Member
fw42 commented Jul 29, 2013

@trishume: Hm, I'm not sure if we should merge stuff into Liquid master that is not (yet) used at Shopify. True, it might be useful to other people. But in the end, it's an additional (unused) piece of code which WE need to maintain. How about we wait until we need this and have the pieces in place that make use of this?

@jduff
Member
jduff commented Jul 30, 2013

👍 to the comment from @fw42

Code in master should be vetted by real production use (at Shopify).

This is amazing though and we should continue exploring it. If we can get the performance improved enough I don't see why we couldn't use it. You have made significant improvements already, I'm sure there are more to be had.

@trishume
Contributor

Ok I added a way to specify the parsing mode per-template. This means that you can now do things like enable strict mode in the theme editor but not the main template rendering code.

Like this:

Liquid::Template.parse(code, :error_mode => :strict)

The code to do it is ugly but I did what I had to to maintain API compatibility. I included a prayer to Matz for good measure. 🙏

@fw42
Member
fw42 commented Jul 31, 2013

Can we make sure that all tests are run with both parsers?

@fw42
Member

Do we need Template.error_mode at all if we have options hashes now?

@trishume
Contributor

Template.error_mode is still useful to globally switch an app over such as when developing a new app. In fact It is possible that we can have tests run with both parsers, but it would be a monumental amount of effort and would look terribly ugly unless we use Template.error_mode

@fw42
Member
fw42 commented Jul 31, 2013

Maybe we can somehow create two rake targets in the Rakefile, one for tests running with the old parser, one for the new one (and then one, which runs those two)? So we don't have to uglify all the test code?

@yerich
yerich commented Jul 31, 2013

Do you have to use regex in your lexer? Using less elegant but faster methods may result in performance benefits. For instance:

n = 500000

Benchmark.bmbm do |x|
  x.report("regex") { n.times do ; "==".scan(/==|!=|<>|<=?|>=?|contains/); end }
  x.report("if") { n.times do ; ["==", "!=", "<>", "<=", ">=", "<", ">", "contains"].include?("=="); end }
end
Rehearsal -----------------------------------------
regex   2.090000   0.050000   2.140000 (  2.140008)
if      1.300000   0.010000   1.310000 (  1.309568)
-------------------------------- total: 3.450000sec

            user     system      total        real
regex   2.080000   0.050000   2.130000 (  2.132636)
if      1.240000   0.020000   1.260000 (  1.254238)

As you can see, using Array::include? instead of a regex is roughly 80% more fast. I wonder if similar improvements can be found in other regexes. Obviously doing this is less convenient than using scan, but if performance is a factor...

Edit: made a stupid.

@fw42
Member
fw42 commented Jul 31, 2013

Yeah, that might be worth looking into. The expressions are not that complex. Maybe we can do the scan() without using Regexp.

@trishume
Contributor

The thing is I am using the StringScanner class from the standard library which is an optimized way to scan through strings written in C. It only accepts regexes as matching patterns.

If I ditched that library then any regex match would be absurdly slow since Ruby does not have the ability to match regexes starting at a certain index AFAIK without the StringScanner library.

@trishume
Contributor
trishume commented Aug 1, 2013

Ok so I can get the test suite to run with both parsers but it inexplicably breaks CI only on Rubinius 2.0...

@trishume
Contributor
trishume commented Aug 2, 2013

@fw42 should I make it so that running the tests with the lax parser is a separate task so that CI runs properly?
Then when someone makes changes to the old parser they can run the tests manually. Also that way the test suite is faster.

@fw42
Member
fw42 commented Aug 2, 2013

Hm I think by default, all tests should run with both parsers (and all Ruby versions we support). Liquid tests take about 100ms in total to run on my machine, so no big woop.

@trishume
Contributor
trishume commented Aug 2, 2013

What about CI with Rubinius? I think it fails because my test/default task is not an instance of Rake::TestTask.
I'm not sure how to fix that problem because to run the test suite twice I need something other than a testtask.

In reality though I have no idea why the CI bulid fails for Rubinius and no idea what to do about it.

@fw42
Member
fw42 commented Aug 2, 2013

I think you can tell TravisCI which tasks/commands to run. I don't think it knows anything about rake. Take a look at the ".travis.yml" file.

@trishume
Contributor
trishume commented Aug 2, 2013

The travis.yml file says it just runs rake test but according to the CI output it is using some weird rake_test_loader.rb file on Rubinius which throws an exception before it reaches Liquid.

@trishume
Contributor
trishume commented Aug 2, 2013

Wait I misread something, this might be easier to solve than I thought.

@trishume
Contributor
trishume commented Aug 2, 2013

Hmmmm, I think that Rubinius has a bug which causes ARGV.pop to insert a nil into the array.

@fw42
Member
fw42 commented Aug 2, 2013

Where is ARGV.pop coming from? Why did this bug not happen before if your code is not actually touched?

@trishume
Contributor
trishume commented Aug 2, 2013

I misread stuff and my code was touched. What seems to be happening is a nil is getting into the argument array somehow. I use ARGV.pop to remove my mode argument from the ARGV array so that it doesn't get misinterpreted by minitest or rake or anything.

@trishume
Contributor
trishume commented Aug 2, 2013

Ok I figured out why it only breaks on Rubinius. My code gets run as part of this ARGV.select block as one of the require statements: https://github.com/jimweirich/rake/blob/master/lib/rake/rake_test_loader.rb

Every other ruby interpreter is OK with it when I modify the ARGV array while it is selecting over it but Rubinius fails and gives a nil. The problem is that I have to pass parameters in to the tests in order to specify the parser but things will fail if I don't remove the parameter from the ARGV list.

I'm going to look into using an environment variable instead to avoid modifying ARGV

@fw42 fw42 commented on the diff Aug 2, 2013
README.md
@@ -47,4 +47,22 @@ For standard use you can just pass it the content of a file and call render with
@template.render('name' => 'tobi') # => "hi tobi"
```
+### Error Modes
+
+Setting the error mode of Liquid lets you specify how strictly you want your templates to be interpreted.
+Normally the parser is very lax and will accept almost anything without error. Unfortunately this can make
+it very hard to debug and can lead to unexpected behaviour.
+
+Liquid also comes with a stricter parser that can be used when editing templates to give better error messages
+when templates are invalid. You can enable this new parser like this:
+
+```ruby
+Liquid::Template.error_mode = :strict # Raises a SyntaxError when invalid syntax is used
+Liquid::Template.error_mode = :warn # Adds errors to template.errors but continues as normal
+Liquid::Template.error_mode = :lax # The default mode, accepts almost anything.
@fw42
fw42 Aug 2, 2013 Member

Can we include the new template-local options thing in the docs too?

@trishume
trishume Aug 2, 2013 Contributor

Done.

@fw42 fw42 commented on an outdated diff Aug 2, 2013
t.verbose = false
end
+Rake::TestTask.new(:strict_test) do |t|
+ t.libs << '.' << 'lib' << 'test'
+ t.test_files = FileList['test/liquid/**/*_test.rb']
+ t.verbose = false
+end
@fw42
fw42 Aug 2, 2013 Member

Why does this one not have t.options = 'strict' like the other one has with lax? If we ever change the default, this test will not do what we expect?

@fw42 fw42 commented on an outdated diff Aug 2, 2013
@@ -27,9 +40,13 @@ namespace :benchmark do
desc "Run the liquid benchmark"
@fw42
fw42 Aug 2, 2013 Member

Can we change the description, like the other one?

@fw42
fw42 Aug 2, 2013 Member

Also, I think the lax parser should be the default for the performance benchmarks, since that's the one that is going to be used mostly?

@fw42 fw42 commented on the diff Aug 2, 2013
lib/liquid/lexer.rb
@@ -0,0 +1,64 @@
+require "strscan"
+module Liquid
+ class Lexer
+ SPECIALS = {
+ '|' => :pipe,
+ '.' => :dot,
+ ':' => :colon,
+ ',' => :comma,
+ '[' => :open_square,
+ ']' => :close_square,
+ '(' => :open_round,
+ ')' => :close_round
+ }
@fw42
fw42 Aug 2, 2013 Member

What's the reasoning here for having symbols for all those special chars instead of just using the chars themselves in the code?

@trishume
trishume Aug 2, 2013 Contributor

So that things are consistent and the token type is always a symbol and because symbols are fast to create and are more memory efficient. I might do a performance test on them later.

@fw42
fw42 Aug 2, 2013 Member

No need to, just curious. But I agree, it's kind of cleaner.

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/lexer.rb
+ end
+ end
+
+ unless tok
+ @output << [:end_of_string]
+ return @output
+ end
+ @output << tok
+ end
+ end
+
+ protected
+ def lex_specials
+ c = @ss.getch
+ if s = SPECIALS[c]
+ return Token.new(s,c)
@fw42
fw42 Aug 2, 2013 Member

What's Token? Didn't we remove that? Why is this still here (and why does it not break any tests)?

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/parser.rb
@@ -0,0 +1,95 @@
+module Liquid
+ # This class is used by tags to parse themselves
+ # it provides helpers and encapsulates state
@fw42
fw42 Aug 2, 2013 Member

I don't think we need that comment

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/parser.rb
+ # Like consume? Except for an :id token of a certain name
+ def id?(str)
+ token = @tokens[@p]
+ return false unless token && token[0] == :id
+ return false unless token[1] == str
+ @p += 1
+ token[1]
+ end
+
+ def look(type, ahead = 0)
+ tok = @tokens[@p + ahead]
+ return false unless tok
+ tok[0] == type
+ end
+
+ # === General Liquid parsing functions ===
@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/parser.rb
+
+ def look(type, ahead = 0)
+ tok = @tokens[@p + ahead]
+ return false unless tok
+ tok[0] == type
+ end
+
+ # === General Liquid parsing functions ===
+
+ def expression
+ token = @tokens[@p]
+ if token[0] == :id
+ variable_signature
+ elsif [:string, :number].include? token[0]
+ consume
+ token[1]
@fw42
fw42 Aug 2, 2013 Member

consume already returns token[1], doesn't it?

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/tag.rb
@@ -1,10 +1,26 @@
module Liquid
class Tag
- attr_accessor :nodelist
+ attr_accessor :nodelist, :options
+
+ def self.new_with_options(tag_name, markup, tokens, options)
+ # Forgive me Matz for I have sinned.
+ # I have forsaken the holy idioms of Ruby and used Class#allocate.
+ # I fulfilled my mandate by maintaining API compatibility and performance,
+ # even though it may displease your Lordship.
+ #
+ # In all seriousness though, I can prove to a reasonable degree of certainty
@fw42
fw42 Aug 2, 2013 Member

The mathematician in me does not like the misuse of the word "prove" here. Can we change that to "argue" or "show" or something? :-) Or maybe remove that whole blog of comments. It's funny, but doesn't say anything.

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/tag.rb
@@ -22,5 +38,20 @@ def render(context)
def blank?
@blank || true
end
+
+ def switch_parse(markup)
@fw42
fw42 Aug 2, 2013 Member

Hm, the name of that method is not super obvious to me. Can we call it select_parser_from_options or something more obvious?

@fw42 fw42 commented on an outdated diff Aug 2, 2013
lib/liquid/tags/if.rb
end
- @blocks.push(block)
- @nodelist = block.attach(Array.new)
+ condition
+ end
+
+ def strict_parse(markup)
+ p = Parser.new(markup)
+
+ condition = parse_comparison(p)
+
+ while op = (p.id?('and') || p.id?('or'))
+ new_cond = parse_comparison(p)
+ new_cond.send(op.to_sym, condition)
@fw42
fw42 Aug 2, 2013 Member

send can take method names as a string, no need to to_sym it.

@fw42
Member
fw42 commented Aug 2, 2013

I am a tiny little bit concerned about the two parsers having slightly different semantics. Is there an easy way for the warn-parser to show a warning when stuff like the weird "silently eats logic" thing is used?

@trishume
Contributor
trishume commented Aug 2, 2013

@fw42 not sure what you mean by different semantics. It is only invalid syntax that is caught.
In :warn mode the parser will add a warning when invalid syntax is used and then continue with the old parser.

@fw42
Member
fw42 commented Aug 2, 2013

Right, sorry, I misunderstood something. The new one will complain when using "&&" anyways. Forgot that. All good.

@trishume trishume was assigned Aug 6, 2013
@trishume
Contributor

Ok I just fixed CI. This could be merged and theoretically it would have no effect on anything unless someone enabled strict parsing. @fw42

@fw42
Member
fw42 commented Aug 19, 2013

Great job Tristan, but I still stand by what I said earlier. As long as we don't use this in Shopify (or at least plan to use it soon), we shouldn't merge it yet.

@fw42
Member
fw42 commented Aug 19, 2013

That being said, I have nothing against the idea of actually using this for template validation in the theme editor in Shopify (but not for storefront rendering). Feel free to get a Shopify PR ready (or talk to someone who might - let me know if you can't find anyone). @hornairs, @jduff, any comments?

@trishume
Contributor

I'm looking into it. It will likely take a bit of work since right now there is no provision for saving the template successfully but also showing an error message, both in the API and Admin.

@fw42
Member
fw42 commented Aug 19, 2013

If you want to discuss Shopify-related issues, please don't use the public Liquid repository.

@trishume
Contributor

Ok. I just fixed up warnings and error messages so that things will work for situations where warnings will be used to debug templates. The syntax errors now have extra context since they don't include line numbers.

@trishume
Contributor

I have seen the wisdom of not merging until it is used. I started work on actually using it and discovered that in order to get warnings out of it I have to render and not just parse.

Now I will try to remedy this by allowing warnings to be retrieved without rendering the template.

@fw42
Member
fw42 commented Aug 27, 2013

Can someone else besides me (one of @arthurnn, @Sirupsen, @hornairs, @csfrancis, ... maybe) please review this too please? (preferably today)

👍 from me, but this is pretty big, so I want a second opinion. For reference, see also Shopify/shopify#7885.

@sirupsen sirupsen commented on the diff Aug 27, 2013
lib/liquid/lexer.rb
+ @ss = StringScanner.new(input)
+ end
+
+ def tokenize
+ @output = []
+
+ loop do
+ @ss.skip(/\s*/)
+
+ tok = case
+ when @ss.eos? then nil
+ when t = @ss.scan(COMPARISON_OPERATOR) then [:comparison, t]
+ when t = @ss.scan(SINGLE_STRING_LITERAL) then [:string, t]
+ when t = @ss.scan(DOUBLE_STRING_LITERAL) then [:string, t]
+ when t = @ss.scan(NUMBER_LITERAL) then [:number, t]
+ when t = @ss.scan(IDENTIFIER) then [:id, t]
@sirupsen
sirupsen Aug 27, 2013 Member

Are the most common cases in usual Liquid code first here?

Also I'm not sure if I'm a fan of using then, I'd prefer newlines.

@trishume
trishume Aug 27, 2013 Contributor

They are in the right order such that they will parse correctly. For example, comparison can be contains which is also a valid identifier so it has to parse comparisons first.

As for the thens, there are just so many cases and the body of the cases is so small that splitting it up might make it harder to read.

@sirupsen sirupsen and 1 other commented on an outdated diff Aug 27, 2013
lib/liquid/lexer.rb
+ ')' => :close_round
+ }
+ IDENTIFIER = /[\w\-?!]+/
+ SINGLE_STRING_LITERAL = /'[^\']*'/
+ DOUBLE_STRING_LITERAL = /"[^\"]*"/
+ NUMBER_LITERAL = /-?\d+(\.\d+)?/
+ COMPARISON_OPERATOR = /==|!=|<>|<=?|>=?|contains/
+
+ def initialize(input)
+ @ss = StringScanner.new(input)
+ end
+
+ def tokenize
+ @output = []
+
+ loop do
@sirupsen
sirupsen Aug 27, 2013 Member

I don't like infinite loops because it's not immediately obvious when it terminates.

while !@ss.eos?
end
@trishume
trishume Aug 27, 2013 Contributor

I think I used to have that but the problem is it needs to consume whitespace before checking for eos.

@sirupsen
sirupsen Aug 27, 2013 Member

Could consume it before the loop though, no?

@sirupsen sirupsen commented on an outdated diff Aug 27, 2013
lib/liquid/lexer.rb
+ when t = @ss.scan(DOUBLE_STRING_LITERAL) then [:string, t]
+ when t = @ss.scan(NUMBER_LITERAL) then [:number, t]
+ when t = @ss.scan(IDENTIFIER) then [:id, t]
+ else
+ c = @ss.getch
+ if s = SPECIALS[c]
+ [s,c]
+ else
+ raise SyntaxError, "Unexpected character #{c}"
+ end
+ end
+
+ unless tok
+ @output << [:end_of_string]
+ return @output
+ end
@sirupsen
sirupsen Aug 27, 2013 Member
@output << [:end_of_string]

Returns the resulting @output already.

  return @output.push([:end_of_string]) unless tok 
@sirupsen sirupsen commented on the diff Aug 27, 2013
lib/liquid/parser.rb
+ token = @tokens[@p]
+ return false unless token && token[0] == type
+ @p += 1
+ token[1]
+ end
+
+ # Like consume? Except for an :id token of a certain name
+ def id?(str)
+ token = @tokens[@p]
+ return false unless token && token[0] == :id
+ return false unless token[1] == str
+ @p += 1
+ token[1]
+ end
+
+ def look(type, ahead = 0)
@sirupsen
sirupsen Aug 27, 2013 Member

Minor thing, but peek is a more standard name for this kind of method

@trishume
trishume Aug 27, 2013 Contributor

That's tricky. In parsing and compilers the concept is called lookahead so the operation is abbreviated look.
To resolve the dilemma I choose look because that is what the code already uses.

@sirupsen
sirupsen Aug 27, 2013 Member

ah cool! no worries

@sirupsen sirupsen commented on the diff Aug 27, 2013
lib/liquid/tag.rb
@@ -1,10 +1,21 @@
module Liquid
class Tag
- attr_accessor :nodelist
+ attr_accessor :nodelist, :options
+ attr_reader :warnings
+
+ def self.new_with_options(tag_name, markup, tokens, options)
+ # Forgive me Matz for I have sinned. I know this code is weird
+ # but it was necessary to maintain API compatibility.
+ new_tag = self.allocate
+ new_tag.options = options
+ new_tag.send(:initialize, tag_name, markup, tokens)
+ new_tag
@sirupsen
sirupsen Aug 27, 2013 Member

Can you explain why it's necessary to sin?

@trishume
trishume Aug 27, 2013 Contributor

Basically the only way to maintain API compatibility and not mess everything up was to set the options attribute before calling initialize. The reason is that the parsing of tags is done in the initialize method and the options needs to get passed down before the parse.

@dylanahsmith
dylanahsmith Feb 27, 2014 Member

We should really redefine all liquid's Tag subclasses to take an optional options argument, then deprecate initialize without an options argument so that we can remove support for it in the next major release. We can check the method's arity to give a deprecation warning.

@sirupsen sirupsen commented on the diff Aug 27, 2013
lib/liquid/tags/for.rb
+ if markup =~ Syntax
+ @variable_name = $1
+ @collection_name = $2
+ @name = "#{$1}-#{$2}"
+ @reversed = $3
+ @attributes = {}
+ markup.scan(TagAttributes) do |key, value|
+ @attributes[key] = value
+ end
+ else
+ raise SyntaxError.new("Syntax Error in 'for loop' - Valid syntax: for [item] in [collection]")
+ end
+ end
+
+ def strict_parse(markup)
+ p = Parser.new(markup)
@sirupsen
sirupsen Aug 27, 2013 Member

I kinda stop everytime I see p thinking it's the p method, but it's probably fine

@trishume
trishume Aug 27, 2013 Contributor

Fair point, however the parser methods are called a lot and it would look really ugly with a longer name.

@sirupsen
Member

Overall it looks like excellent work @trishume! It seems clean, the Liquid parsing errors are going to be amazing. 👍 Huge props.

@sirupsen

edit: nvm

@trishume
Contributor

Ready to merge? CI is passing, it is ready to be used in Shopify, any last comments? :shipit:

@sirupsen
Member

🐳 🐳 🐳

@fw42
Member
fw42 commented Aug 30, 2013

@trishume trishume merged commit 0e41c2c into master Aug 30, 2013

1 check passed

default The Travis CI build passed
Details
@trishume trishume deleted the recursive-parsing branch Aug 30, 2013
@trishume
Contributor

Yay! 🎊

@tgjones tgjones referenced this pull request in dotliquid/dotliquid May 15, 2014
Closed

Allow non-global template configuration #94

@trishume trishume removed their assignment May 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment