Skip to content

Add support for whitespace control character allowing for ERB like syntax#743

Closed
evulse wants to merge 8 commits intoShopify:masterfrom
evulse:whitespace-dash-feature
Closed

Add support for whitespace control character allowing for ERB like syntax#743
evulse wants to merge 8 commits intoShopify:masterfrom
evulse:whitespace-dash-feature

Conversation

@evulse
Copy link
Copy Markdown
Contributor

@evulse evulse commented Apr 13, 2016

This is a continuation of #215 and #216 . It contains all PR changes that were needed for @nickpearson to complete his PR. @fw42

Comment thread lib/liquid/template.rb Outdated
source = source.gsub(white_space[0], white_space[1])
end
end

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sake of readability, your block parameter might be better as:

WhiteSpaceTrimTags.each do |(regexp, string)|
  source = source.gsub(regexp, string)

@fw42
Copy link
Copy Markdown
Contributor

fw42 commented Apr 14, 2016

Hmm not sure why Travis CI didn't run your tests here..

Comment thread lib/liquid/template.rb Outdated

if source =~ WhiteSpaceTrimCheck
WhiteSpaceTrimTags.each do |(regexp, string)|
source = source.gsub(regexp, string)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm a bit concerned that this is essentially doing 4 passes (once for each regexp) over the entire source. I can't imagine that to not be a considerable performance hit.

@tobi
Copy link
Copy Markdown
Member

tobi commented Apr 15, 2016

Like I said in the other ticket. We really need the white space work to be included in the existing parser tokenizer. Liquid is used in many super high performance situations. 1% slowdown can easily lead to 100s of thousands in hardware costs for Shopify alone. Performance matters a lot and multiple passes are the death of performance

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 15, 2016

New changes bring it down to the tokenizer + whitespace pass but your right no reason it can't be a part of the tokenizer and just skip the whitespace during that step.

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 15, 2016

The overhead of parsing the white space to the parser tokenizer creates more performance issues than the current proposed solution of just clearing it out in one pass first.

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 15, 2016

I think the only way to handle white space as part of the parser tokenizer without a massive performance hit is to move away from the current regex solution and in to a stream parser and hand off tokens as they are found. If the parser remembers the previous token and uses logic based on the next token to decide how to process it then the stream only ever needs to move in one direction and only ever need to store a single previous token in memory at any one stage.

I've attempted regex look forwards and look backs as part of the parser and it just adds way to much performance loss.

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 15, 2016

I've come to the conclusion that this solution is in the top 95% percentile of all possible solutions. I went down the path of changing the tokenizer but to be honest it was starting to feel like this was no longer the same project. The simplicity of the tokenizer is core to the library and as many people use it, changing the tokenizer so heavily would no doubt introduce breaking changes for people who run custom regex for syntax changes.

Honestly I think the double regex solution above is the perfect balance of simplicity and performance. The regex passes have better performance separated than what they do have combined. For any liquid not using the new syntax it has a very performant regex check to avoid the full regex scan.

@nickpearson
Copy link
Copy Markdown
Contributor

One of the proposed solutions that crops up in the comments of these whitespace issues is to allow newlines inside of Liquid tags and outputs. I had tried this in the past but always got a Liquid::SyntaxError: Variable '{{' was not properly terminated with regexp: /\}\}/ error.

However, I just tried the following on the master branch, and it works:

str = "
a{{
  'b'
}}c{%
  if true
%}d{%
  endif 
%}e
".strip
Liquid::Template.parse(str).render
#=> "abcde"

That's of course a contrived example, but it shows that whitespace can be placed inside Liquid tags/outputs where it will be swallowed automatically. Here's a real-world example:

<ul>{%
  for product in products %}
    <li>{{ product.title }}</li>{%
  endfor %}
</ul>

which, with two products, will render as:

<ul>
    <li>Shirt</li>
    <li>Pants</li>
</ul>

Here's the code to run in a console if you want to try it out:

html = "
<ul>{%
  for product in products %}
    <li>{{ product.title }}</li>{%
  endfor %}
</ul>
".strip
data = { 'products' => [{ 'title' => 'Shirt' }, { 'title' => 'Pants' }] }
puts Liquid::Template.parse(html).render(data)

The syntax isn't as pretty as the {{- var -}} and {%- if x -%}, but the important part is that it works.

Is this behavior something that "just works," or is this something Liquid can commit to supporting officially? Official support might be the best option because there's no performance hit (no code changes needed). The only thing to add, I think, would be tests.

@dylanahsmith
Copy link
Copy Markdown
Contributor

@nickpearson Yes, newlines in tags is officially supported. It was intentionally added in #324 along with tests.

I would still like liquid to support the ERB like syntax, since I think it makes liquid code more readable for template writers. I think we can address Shopify performance concerns with this feature in liquid-c

Comment thread lib/liquid/template.rb Outdated
return [] if source.to_s.empty?

if source =~ WhiteSpaceTrimCheck
source = source.gsub(WhiteSpaceTrimTags, WhiteSpaceTrimReplace)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect the content of raw tags as well

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will also affect line numbers, which will be confusing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll added some tests for this. This really means the logic needs to be in the parser.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think Liquid::BlockBody#parse is the right place for this logic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to work something out. Currently the whitespace would be in the previous token so either need to bring the whitespace in to the current token which kills the performance of the tokenizer regex split or create logic that looks back and modifies the stack if it finds a whitespace control symbol.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a look at the liquid-c parser. I think based on that its better to change the tokenizer first as honestly this is where it should be handled to get the outer white space in the same token. I'll just tweak the performance the best I can in ruby as the c side won't have an issue looking at it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the whitespace would be in the previous token so either need to bring the whitespace in to the current token which kills the performance of the tokenizer regex split or create logic that looks back and modifies the stack if it finds a whitespace control symbol.

In the ruby implementation, I was thinking that BlockBody#parse can modify the last item in the nodelist to rstrip whitespace to trim the previous token.

Trimming the following token might be simplest to handle with the help of the tokenizer by setting a flag on the tokenizer to tell it to lstrip the next token. That avoids the complication of interacting with child tags which may or may not be blocks.

liquid-c is able to handle things in the tokenizer more easily, since it lazily tokenizes in a way that allows trimming to be turned off while parsing the raw tag. When matching a TOKEN_RAW token it actually need to read-ahead to find where it ends, so could easily check for an additional dash and trim whitespace from the end. When matching a tag or variable in tokenizer_next, it could easily check for a dash preceding the end of the tag, then set a flag on the tokenizer to ltrim whitespace on the next token.

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 16, 2016

The new change is slightly slower than the previous approach however I think it still has some optimisations I can do. It does however pass all tests and handles raw tags correctly.

Also I don't generally write in ruby so anything that can be written in a better syntax can you code review

Comments?

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 16, 2016

Also by slower I mean marginally, it could actually end up being faster once I do some more tweaks.

Comment thread lib/liquid/block.rb Outdated
@@ -1,9 +1,16 @@
module Liquid
class Block < Tag
IsTag = /^#{TagStart}/o
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be working off a really old commit. See how different the file is now: https://github.com/Shopify/liquid/blob/master/lib/liquid/block.rb

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That explains the merge conflict message. Updating my fork now as your right its from 2013.

@evulse
Copy link
Copy Markdown
Contributor Author

evulse commented Apr 16, 2016

Closing due to old commit

@evulse evulse closed this Apr 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants