New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Real Parser. Closes #229 and closes #225. #235

Merged
merged 57 commits into from Aug 30, 2013

Conversation

Projects
None yet
8 participants
@trishume
Contributor

trishume commented Jul 26, 2013

@fw42 @RichardBlair @boourns @tobi
This massive pull request contains a brand new parser and lexer for variable tags and if statements that can be easily extended to work with all other tags. It uses proper parsing so it is both easier to understand and less fraught with quirks and bugs. It gives real syntax errors that tell the user what they did wrong.

The parser and lexer are made as utilities that can easily be used in other tags.

This fixes many problems with the liquid parser including closing #229 and #225. It makes the variable tag sane and
tells users what they did wrong instead of silently doing whatever the heck it wants.

Backwards Compatibility

I also added a system for determining the parsing mode. By default it is set to a compatibility mode where if a syntax error occurs it will add an error message and then fall back on the old parser instead of crashing. It can also be set to use either only the old parser or only the new parser.

Shopify could be configured to use the warning mode so that existing templates would continue working but it would show syntax warnings in the theme editor. In Liquid 2.6 the warning mode should be the default but in Liquid 3.0 the strict mode could be the default.

Tests

I added many tests for both the lexer and parser classes as well as fixing tests to expect syntax errors. I kept the tests for the lax parser and added a helper that sets the parsing mode to lax. Interestingly, there were even tests
that asserted that certain bugs like #225 existed.

Internals

Two new classes were added Liquid::Parser and Liquid::Lexer. The lexer is a hand-written one using StringScanner and the parser is a hand-written LL(k) recursive descent parser.

Tags can use the parser class as a utility to do their own parsing. Here is an example of using the parser class:

p = Parser.new(markup)
# Could be just filters with no input
@name = p.look(:pipe) ? '' : p.expression
while p.consume?(:pipe)
  filtername = p.consume(:id)
  filterargs = p.consume?(:colon) ? parse_filterargs(p) : []
  @filters << [filtername, filterargs]
end
p.consume(:end_of_string)

Examples

Here are some examples of what it fixes:

{{ variable.: }}
old => ""
new => Liquid::SyntaxError: Expected id but found <colon: ':'>

{% assign foo = 1 + 2 %}{{ foo }}
old => "1"
new => Liquid::SyntaxError: Unexpected character +

{% if true && false %} wrong {% endif %}
old => "wrong"
new => Liquid::SyntaxError: Unexpected character &

{{---E(R[\'\"\'\"\'(+=EH%*^(@#^%$)?||?\eE,PUZE:::~~~~``}}
old => ""
new => Liquid::SyntaxError: Unexpected character (.

Benchmarks

I added a new benchmarking command rake benchmark:lax which runs the benchmarks with the old parser.

Old parser (rake benchmark:lax):

Rehearsal ------------------------------------------------
parse:         3.840000   0.050000   3.890000 (  4.026889)
parse & run:   8.110000   0.050000   8.160000 (  8.154465)
-------------------------------------- total: 12.050000sec

                   user     system      total        real
parse:         3.410000   0.020000   3.430000 (  3.436315)
parse & run:   7.950000   0.040000   7.990000 (  8.140029)

New parser (rake benchmark:run):

Rehearsal ------------------------------------------------
parse:         5.560000   0.040000   5.600000 (  5.745558)
parse & run:  10.800000   0.040000  10.840000 ( 10.930402)
-------------------------------------- total: 16.440000sec

                   user     system      total        real
parse:         5.450000   0.020000   5.470000 (  5.575437)
parse & run:  10.630000   0.040000  10.670000 ( 10.783157)

The new parser is a bit slower. According to basic profiling the performance issues are mostly in the lexer, which I could try writing in Ragel but I think the added complexity of Ragel outweighs the performance benefits.

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Jul 26, 2013

Contributor

The CI seems to break but the tests pass fine on my computer. I will look into why they fail later. I think I just have to require "stringscanner" in the lexer.

Contributor

trishume commented Jul 26, 2013

The CI seems to break but the tests pass fine on my computer. I will look into why they fail later. I think I just have to require "stringscanner" in the lexer.

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Jul 26, 2013

Member

Haven't looked at the code yet, but it sounds awesome from the description. Although "fast" might be more important than "awesome" :-(

Member

fw42 commented Jul 26, 2013

Haven't looked at the code yet, but it sounds awesome from the description. Although "fast" might be more important than "awesome" :-(

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Jul 26, 2013

Member

cc @hornairs for performance discussion

Member

fw42 commented Jul 26, 2013

cc @hornairs for performance discussion

@airhorns

This comment has been minimized.

Show comment
Hide comment
@airhorns

airhorns Jul 26, 2013

Contributor

Herm. I don't think we can accept a perf hit for cleaner code either. That sucks tho cause this is glorious.

I tried removing the Token class and inlining some Parser methods and shaved a bit of time off, see recursive-parsing...array_tokens :

oration ~/C/liquid  (array_tokens*) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         4.300000   0.020000   4.320000 (  4.329285)
parse & run:   9.470000   0.030000   9.500000 (  9.499332)
-------------------------------------- total: 13.820000sec

                   user     system      total        real
parse:         4.260000   0.010000   4.270000 (  4.263429)
parse & run:   9.380000   0.030000   9.410000 (  9.409803)
oration ~/C/liquid  (array_tokens) ➜  git co recursive-parsing
oration ~/C/liquid  (recursive-parsing) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         5.240000   0.020000   5.260000 (  5.259000)
parse & run:  10.710000   0.040000  10.750000 ( 10.755690)
-------------------------------------- total: 16.010000sec

                   user     system      total        real
parse:         5.170000   0.010000   5.180000 (  5.183431)
parse & run:  10.310000   0.020000  10.330000 ( 10.325539)

but as you said @trishume it looks like a lot of the time is spent doing Regexp#===. Is the order in which the regexps are scanned for determined by token precedence? Could we mess with that order to get more "hits" earlier on in the case statement? Also, I think it might be worth exploring Ragel cause I believe there are already a few grammars hanging out on the internet and you did the awesome work to let two versions co-exist, which I think was most of the battle.

Awesome job @trishume

EDIT: added stuff about method inlining

Contributor

airhorns commented Jul 26, 2013

Herm. I don't think we can accept a perf hit for cleaner code either. That sucks tho cause this is glorious.

I tried removing the Token class and inlining some Parser methods and shaved a bit of time off, see recursive-parsing...array_tokens :

oration ~/C/liquid  (array_tokens*) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         4.300000   0.020000   4.320000 (  4.329285)
parse & run:   9.470000   0.030000   9.500000 (  9.499332)
-------------------------------------- total: 13.820000sec

                   user     system      total        real
parse:         4.260000   0.010000   4.270000 (  4.263429)
parse & run:   9.380000   0.030000   9.410000 (  9.409803)
oration ~/C/liquid  (array_tokens) ➜  git co recursive-parsing
oration ~/C/liquid  (recursive-parsing) ➜  rake benchmark:run
/opt/boxen/rbenv/versions/1.9.3-p448/bin/ruby ./performance/benchmark.rb
Rehearsal ------------------------------------------------
parse:         5.240000   0.020000   5.260000 (  5.259000)
parse & run:  10.710000   0.040000  10.750000 ( 10.755690)
-------------------------------------- total: 16.010000sec

                   user     system      total        real
parse:         5.170000   0.010000   5.180000 (  5.183431)
parse & run:  10.310000   0.020000  10.330000 ( 10.325539)

but as you said @trishume it looks like a lot of the time is spent doing Regexp#===. Is the order in which the regexps are scanned for determined by token precedence? Could we mess with that order to get more "hits" earlier on in the case statement? Also, I think it might be worth exploring Ragel cause I believe there are already a few grammars hanging out on the internet and you did the awesome work to let two versions co-exist, which I think was most of the battle.

Awesome job @trishume

EDIT: added stuff about method inlining

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Jul 26, 2013

Contributor

@hornairs I don't think the Regexps can be rearanged that much. I used to have them in optimal order and it was a little bit faster but then I kept finding problems like floating point numbers being lexed as an integer and then a dot and another integer. Right now they are in the right order for dependencies.

I also tried putting all the smaller regexes into a bigger one and then disambiguating with capture groups but that wasn't any faster.

Also keep in mind that this is not only about cleaner code. It also has the significant benefit of telling theme editors where they made a mistake rather than silently doing the wrong thing.

I'll see what I can do about optimizing it some more.

Contributor

trishume commented Jul 26, 2013

@hornairs I don't think the Regexps can be rearanged that much. I used to have them in optimal order and it was a little bit faster but then I kept finding problems like floating point numbers being lexed as an integer and then a dot and another integer. Right now they are in the right order for dependencies.

I also tried putting all the smaller regexes into a bigger one and then disambiguating with capture groups but that wasn't any faster.

Also keep in mind that this is not only about cleaner code. It also has the significant benefit of telling theme editors where they made a mistake rather than silently doing the wrong thing.

I'll see what I can do about optimizing it some more.

@nickpearson

This comment has been minimized.

Show comment
Hide comment
@nickpearson

nickpearson Jul 26, 2013

Contributor

I've never written a complex parser before so I'm not sure this is practical, but I thought it'd be worth throwing out there. Is it possible with the new parser to look at a Liquid segment/tag before parsing it? If so, it may save on performance to parse a segment/tag with the existing regex parser if it can be known beforehand that the segment/tag uses proper syntax.

I realize this adds an extra regex check for every segment/tag, but if the cost of this check pays off in a high enough percentage of cases (where the Liquid code is very simple) by using the faster parser, it may be worth exploring.

For example, to check for simple, valid segments:

r = /\{\{ *\w+(?: *\| *\w+)* *\}\}/

# matches
"{{some_var}}"                       =~ r #=> 0
"{{some_var|upcase}}"                =~ r #=> 0
"{{ some_var | upcase }}"            =~ r #=> 0
"{{ some_var | upcase | downcase }}" =~ r #=> 0

# non-matches
"{{'val'}}"           =~ r #=> nil
"{{ array[0] }}"      =~ r #=> nil
"{{ var | test: 1 }}" =~ r #=> nil
"{{ var | test 1 }}"  =~ r #=> nil
"{{ var | test: }}"   =~ r #=> nil

This regex will match the simplest segments: a variable reference with zero or more filters with no filter parameters. My guess is that a large chunk of the invalid Liquid syntax is in filter parameters (the other large chunk being in tag attributes). Something similar might be practical for simple tags, though it would be need to be a bit more complex to also match a high percentage of tags.

If this is worth looking into, it may also be worth knowing what percentage of Liquid segments in Shopify templates are without any filter parameters. I just used this to check all the templates in my app:

simple = 0
total = 0
Template.all.each do |template|
  segments = template.text.to_s.scan(/\{\{.*?\}\}/)
  total += segments.size
  simple += segments.select do |segment|
    segment =~ /\{\{ *\w+(?: *\| *\w+)* *\}\}/
  end.size
end
puts "#{simple}/#{total} = #{simple/total.to_f}"

And by the way, awesome work on this @trishume. I'm hoping the performance hit can be worked out or at least be shown to be worth it.

Contributor

nickpearson commented Jul 26, 2013

I've never written a complex parser before so I'm not sure this is practical, but I thought it'd be worth throwing out there. Is it possible with the new parser to look at a Liquid segment/tag before parsing it? If so, it may save on performance to parse a segment/tag with the existing regex parser if it can be known beforehand that the segment/tag uses proper syntax.

I realize this adds an extra regex check for every segment/tag, but if the cost of this check pays off in a high enough percentage of cases (where the Liquid code is very simple) by using the faster parser, it may be worth exploring.

For example, to check for simple, valid segments:

r = /\{\{ *\w+(?: *\| *\w+)* *\}\}/

# matches
"{{some_var}}"                       =~ r #=> 0
"{{some_var|upcase}}"                =~ r #=> 0
"{{ some_var | upcase }}"            =~ r #=> 0
"{{ some_var | upcase | downcase }}" =~ r #=> 0

# non-matches
"{{'val'}}"           =~ r #=> nil
"{{ array[0] }}"      =~ r #=> nil
"{{ var | test: 1 }}" =~ r #=> nil
"{{ var | test 1 }}"  =~ r #=> nil
"{{ var | test: }}"   =~ r #=> nil

This regex will match the simplest segments: a variable reference with zero or more filters with no filter parameters. My guess is that a large chunk of the invalid Liquid syntax is in filter parameters (the other large chunk being in tag attributes). Something similar might be practical for simple tags, though it would be need to be a bit more complex to also match a high percentage of tags.

If this is worth looking into, it may also be worth knowing what percentage of Liquid segments in Shopify templates are without any filter parameters. I just used this to check all the templates in my app:

simple = 0
total = 0
Template.all.each do |template|
  segments = template.text.to_s.scan(/\{\{.*?\}\}/)
  total += segments.size
  simple += segments.select do |segment|
    segment =~ /\{\{ *\w+(?: *\| *\w+)* *\}\}/
  end.size
end
puts "#{simple}/#{total} = #{simple/total.to_f}"

And by the way, awesome work on this @trishume. I'm hoping the performance hit can be worked out or at least be shown to be worth it.

Show outdated Hide outdated lib/liquid/lexer.rb
when t = @ss.scan(IDENTIFIER) then Token[:id, t]
else
lex_specials
end

This comment has been minimized.

@fw42

fw42 Jul 27, 2013

Member

Maybe I don't get it, but this whole case block looks like it's scanning the same string 6 times in the worst case. Shouldn't that be possible with only one iteration?

@fw42

fw42 Jul 27, 2013

Member

Maybe I don't get it, but this whole case block looks like it's scanning the same string 6 times in the worst case. Shouldn't that be possible with only one iteration?

This comment has been minimized.

@trishume

trishume Jul 29, 2013

Contributor

Tobi, Harry and I talked about that, I'm going to try something out.

@trishume

trishume Jul 29, 2013

Contributor

Tobi, Harry and I talked about that, I'm going to try something out.

Show outdated Hide outdated lib/liquid/lexer.rb
@type, @contents = args
end
def self.[](*args)

This comment has been minimized.

@fw42

fw42 Jul 27, 2013

Member

What is this for exactly? Is that just syntax sugar (aka one additional method call) or is it necessary?

@fw42

fw42 Jul 27, 2013

Member

What is this for exactly? Is that just syntax sugar (aka one additional method call) or is it necessary?

This comment has been minimized.

@trishume

trishume Jul 29, 2013

Contributor

You're right. Harry says using arrays instead of the Token object is faster and this might be why. I like the syntactic sugar but if getting rid of it means I get to keep token objects then I will.

I'll try this and benchmark it.

@trishume

trishume Jul 29, 2013

Contributor

You're right. Harry says using arrays instead of the Token object is faster and this might be why. I like the syntactic sugar but if getting rid of it means I get to keep token objects then I will.

I'll try this and benchmark it.

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Jul 29, 2013

Contributor

Ok, update on performance improvements:

  • Using a single regex instead of multiple regexes is not faster.
  • Deciding which regex to use based on the first character is not faster.
  • Array tokens are faster. Thanks @hornairs.
  • Making float and int together into number is slightly faster.
  • Using a regex to catch easy cases is faster. Thanks @nickpearson.

Recent benchmarks

New parser:

Rehearsal ------------------------------------------------
parse:         4.020000   0.030000   4.050000 (  4.042116)
parse & run:   9.600000   0.040000   9.640000 (  9.821808)
-------------------------------------- total: 13.690000sec

                   user     system      total        real
parse:         3.980000   0.020000   4.000000 (  3.998113)
parse & run:   9.960000   0.040000  10.000000 ( 10.198723)

Old parser:

Rehearsal ------------------------------------------------
parse:         3.620000   0.040000   3.660000 (  3.808780)
parse & run:   8.480000   0.040000   8.520000 (  8.626910)
-------------------------------------- total: 12.180000sec

                   user     system      total        real
parse:         3.510000   0.020000   3.530000 (  3.530844)
parse & run:   8.360000   0.040000   8.400000 (  8.497708)

How about those benchmarks @hornairs?

Contributor

trishume commented Jul 29, 2013

Ok, update on performance improvements:

  • Using a single regex instead of multiple regexes is not faster.
  • Deciding which regex to use based on the first character is not faster.
  • Array tokens are faster. Thanks @hornairs.
  • Making float and int together into number is slightly faster.
  • Using a regex to catch easy cases is faster. Thanks @nickpearson.

Recent benchmarks

New parser:

Rehearsal ------------------------------------------------
parse:         4.020000   0.030000   4.050000 (  4.042116)
parse & run:   9.600000   0.040000   9.640000 (  9.821808)
-------------------------------------- total: 13.690000sec

                   user     system      total        real
parse:         3.980000   0.020000   4.000000 (  3.998113)
parse & run:   9.960000   0.040000  10.000000 ( 10.198723)

Old parser:

Rehearsal ------------------------------------------------
parse:         3.620000   0.040000   3.660000 (  3.808780)
parse & run:   8.480000   0.040000   8.520000 (  8.626910)
-------------------------------------- total: 12.180000sec

                   user     system      total        real
parse:         3.510000   0.020000   3.530000 (  3.530844)
parse & run:   8.360000   0.040000   8.400000 (  8.497708)

How about those benchmarks @hornairs?

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Jul 29, 2013

Member

What are we benchmarking here? Only the variables and if tags? How much slower is the benchmark going to be when we do this for all the tags? Also, how much faster is it going to be if we remove the "which parser to use?" check?

Member

fw42 commented Jul 29, 2013

What are we benchmarking here? Only the variables and if tags? How much slower is the benchmark going to be when we do this for all the tags? Also, how much faster is it going to be if we remove the "which parser to use?" check?

Show outdated Hide outdated lib/liquid/tags/if.rb
condition = Condition.new($1, $2, $3)
def parse_condition(markup)
case Template.error_mode

This comment has been minimized.

@fw42

fw42 Jul 29, 2013

Member

Is this check happening for every tag? Can't we somehow check that only once when beginning to parse?

@fw42

fw42 Jul 29, 2013

Member

Is this check happening for every tag? Can't we somehow check that only once when beginning to parse?

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Jul 29, 2013

Contributor

@fw42 The which parser to use check takes almost no time. I don't think it will get that much slower moving to other tags because if statements and variable tags are two of the most common tags, they are also some of the tags with the most complex parsing logic. I'm going to add new parsing code for for tags as well and see what that does.

Contributor

trishume commented Jul 29, 2013

@fw42 The which parser to use check takes almost no time. I don't think it will get that much slower moving to other tags because if statements and variable tags are two of the most common tags, they are also some of the tags with the most complex parsing logic. I'm going to add new parsing code for for tags as well and see what that does.

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Aug 19, 2013

Member

Great job Tristan, but I still stand by what I said earlier. As long as we don't use this in Shopify (or at least plan to use it soon), we shouldn't merge it yet.

Member

fw42 commented Aug 19, 2013

Great job Tristan, but I still stand by what I said earlier. As long as we don't use this in Shopify (or at least plan to use it soon), we shouldn't merge it yet.

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Aug 19, 2013

Member

That being said, I have nothing against the idea of actually using this for template validation in the theme editor in Shopify (but not for storefront rendering). Feel free to get a Shopify PR ready (or talk to someone who might - let me know if you can't find anyone). @hornairs, @jduff, any comments?

Member

fw42 commented Aug 19, 2013

That being said, I have nothing against the idea of actually using this for template validation in the theme editor in Shopify (but not for storefront rendering). Feel free to get a Shopify PR ready (or talk to someone who might - let me know if you can't find anyone). @hornairs, @jduff, any comments?

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Aug 19, 2013

Contributor

I'm looking into it. It will likely take a bit of work since right now there is no provision for saving the template successfully but also showing an error message, both in the API and Admin.

Contributor

trishume commented Aug 19, 2013

I'm looking into it. It will likely take a bit of work since right now there is no provision for saving the template successfully but also showing an error message, both in the API and Admin.

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Aug 19, 2013

Member

If you want to discuss Shopify-related issues, please don't use the public Liquid repository.

Member

fw42 commented Aug 19, 2013

If you want to discuss Shopify-related issues, please don't use the public Liquid repository.

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Aug 19, 2013

Contributor

Ok. I just fixed up warnings and error messages so that things will work for situations where warnings will be used to debug templates. The syntax errors now have extra context since they don't include line numbers.

Contributor

trishume commented Aug 19, 2013

Ok. I just fixed up warnings and error messages so that things will work for situations where warnings will be used to debug templates. The syntax errors now have extra context since they don't include line numbers.

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Aug 22, 2013

Contributor

I have seen the wisdom of not merging until it is used. I started work on actually using it and discovered that in order to get warnings out of it I have to render and not just parse.

Now I will try to remedy this by allowing warnings to be retrieved without rendering the template.

Contributor

trishume commented Aug 22, 2013

I have seen the wisdom of not merging until it is used. I started work on actually using it and discovered that in order to get warnings out of it I have to render and not just parse.

Now I will try to remedy this by allowing warnings to be retrieved without rendering the template.

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42

fw42 Aug 27, 2013

Member

Can someone else besides me (one of @arthurnn, @sirupsen, @hornairs, @csfrancis, ... maybe) please review this too please? (preferably today)

👍 from me, but this is pretty big, so I want a second opinion. For reference, see also Shopify/shopify#7885.

Member

fw42 commented Aug 27, 2013

Can someone else besides me (one of @arthurnn, @sirupsen, @hornairs, @csfrancis, ... maybe) please review this too please? (preferably today)

👍 from me, but this is pretty big, so I want a second opinion. For reference, see also Shopify/shopify#7885.

when t = @ss.scan(SINGLE_STRING_LITERAL) then [:string, t]
when t = @ss.scan(DOUBLE_STRING_LITERAL) then [:string, t]
when t = @ss.scan(NUMBER_LITERAL) then [:number, t]
when t = @ss.scan(IDENTIFIER) then [:id, t]

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

Are the most common cases in usual Liquid code first here?

Also I'm not sure if I'm a fan of using then, I'd prefer newlines.

@sirupsen

sirupsen Aug 27, 2013

Member

Are the most common cases in usual Liquid code first here?

Also I'm not sure if I'm a fan of using then, I'd prefer newlines.

This comment has been minimized.

@trishume

trishume Aug 27, 2013

Contributor

They are in the right order such that they will parse correctly. For example, comparison can be contains which is also a valid identifier so it has to parse comparisons first.

As for the thens, there are just so many cases and the body of the cases is so small that splitting it up might make it harder to read.

@trishume

trishume Aug 27, 2013

Contributor

They are in the right order such that they will parse correctly. For example, comparison can be contains which is also a valid identifier so it has to parse comparisons first.

As for the thens, there are just so many cases and the body of the cases is so small that splitting it up might make it harder to read.

Show outdated Hide outdated lib/liquid/lexer.rb
def tokenize
@output = []
loop do

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

I don't like infinite loops because it's not immediately obvious when it terminates.

while !@ss.eos?
end
@sirupsen

sirupsen Aug 27, 2013

Member

I don't like infinite loops because it's not immediately obvious when it terminates.

while !@ss.eos?
end

This comment has been minimized.

@trishume

trishume Aug 27, 2013

Contributor

I think I used to have that but the problem is it needs to consume whitespace before checking for eos.

@trishume

trishume Aug 27, 2013

Contributor

I think I used to have that but the problem is it needs to consume whitespace before checking for eos.

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

Could consume it before the loop though, no?

@sirupsen

sirupsen Aug 27, 2013

Member

Could consume it before the loop though, no?

Show outdated Hide outdated lib/liquid/lexer.rb
unless tok
@output << [:end_of_string]
return @output
end

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member
@output << [:end_of_string]

Returns the resulting @output already.

  return @output.push([:end_of_string]) unless tok 
@sirupsen

sirupsen Aug 27, 2013

Member
@output << [:end_of_string]

Returns the resulting @output already.

  return @output.push([:end_of_string]) unless tok 
token[1]
end
def look(type, ahead = 0)

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

Minor thing, but peek is a more standard name for this kind of method

@sirupsen

sirupsen Aug 27, 2013

Member

Minor thing, but peek is a more standard name for this kind of method

This comment has been minimized.

@trishume

trishume Aug 27, 2013

Contributor

That's tricky. In parsing and compilers the concept is called lookahead so the operation is abbreviated look.
To resolve the dilemma I choose look because that is what the code already uses.

@trishume

trishume Aug 27, 2013

Contributor

That's tricky. In parsing and compilers the concept is called lookahead so the operation is abbreviated look.
To resolve the dilemma I choose look because that is what the code already uses.

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

ah cool! no worries

@sirupsen

sirupsen Aug 27, 2013

Member

ah cool! no worries

new_tag = self.allocate
new_tag.options = options
new_tag.send(:initialize, tag_name, markup, tokens)
new_tag

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

Can you explain why it's necessary to sin?

@sirupsen

sirupsen Aug 27, 2013

Member

Can you explain why it's necessary to sin?

This comment has been minimized.

@trishume

trishume Aug 27, 2013

Contributor

Basically the only way to maintain API compatibility and not mess everything up was to set the options attribute before calling initialize. The reason is that the parsing of tags is done in the initialize method and the options needs to get passed down before the parse.

@trishume

trishume Aug 27, 2013

Contributor

Basically the only way to maintain API compatibility and not mess everything up was to set the options attribute before calling initialize. The reason is that the parsing of tags is done in the initialize method and the options needs to get passed down before the parse.

This comment has been minimized.

@dylanahsmith

dylanahsmith Feb 27, 2014

Member

We should really redefine all liquid's Tag subclasses to take an optional options argument, then deprecate initialize without an options argument so that we can remove support for it in the next major release. We can check the method's arity to give a deprecation warning.

@dylanahsmith

dylanahsmith Feb 27, 2014

Member

We should really redefine all liquid's Tag subclasses to take an optional options argument, then deprecate initialize without an options argument so that we can remove support for it in the next major release. We can check the method's arity to give a deprecation warning.

end
def strict_parse(markup)
p = Parser.new(markup)

This comment has been minimized.

@sirupsen

sirupsen Aug 27, 2013

Member

I kinda stop everytime I see p thinking it's the p method, but it's probably fine

@sirupsen

sirupsen Aug 27, 2013

Member

I kinda stop everytime I see p thinking it's the p method, but it's probably fine

This comment has been minimized.

@trishume

trishume Aug 27, 2013

Contributor

Fair point, however the parser methods are called a lot and it would look really ugly with a longer name.

@trishume

trishume Aug 27, 2013

Contributor

Fair point, however the parser methods are called a lot and it would look really ugly with a longer name.

@sirupsen

This comment has been minimized.

Show comment
Hide comment
@sirupsen

sirupsen Aug 27, 2013

Member

Overall it looks like excellent work @trishume! It seems clean, the Liquid parsing errors are going to be amazing. 👍 Huge props.

Member

sirupsen commented Aug 27, 2013

Overall it looks like excellent work @trishume! It seems clean, the Liquid parsing errors are going to be amazing. 👍 Huge props.

@sirupsen

This comment has been minimized.

Show comment
Hide comment
@sirupsen

sirupsen Aug 27, 2013

Member

edit: nvm

Member

sirupsen commented on lib/liquid/lexer.rb in 1fa029a Aug 27, 2013

edit: nvm

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Aug 30, 2013

Contributor

Ready to merge? CI is passing, it is ready to be used in Shopify, any last comments? :shipit:

Contributor

trishume commented Aug 30, 2013

Ready to merge? CI is passing, it is ready to be used in Shopify, any last comments? :shipit:

@sirupsen

This comment has been minimized.

Show comment
Hide comment
@sirupsen

sirupsen Aug 30, 2013

Member

🐳 🐳 🐳

Member

sirupsen commented Aug 30, 2013

🐳 🐳 🐳

@fw42

This comment has been minimized.

Show comment
Hide comment
@fw42
Member

fw42 commented Aug 30, 2013

trishume added a commit that referenced this pull request Aug 30, 2013

Merge pull request #235 from Shopify/recursive-parsing
Add a Real Parser. Closes #229 and closes #225.

@trishume trishume merged commit 0e41c2c into master Aug 30, 2013

1 check passed

default The Travis CI build passed
Details

@trishume trishume deleted the recursive-parsing branch Aug 30, 2013

@trishume

This comment has been minimized.

Show comment
Hide comment
@trishume

trishume Aug 30, 2013

Contributor

Yay! 🎊

Contributor

trishume commented Aug 30, 2013

Yay! 🎊

@trishume trishume removed their assignment May 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment