Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for whitespace control character. #30

Merged
merged 8 commits into from Aug 11, 2016

Conversation

evulse
Copy link

@evulse evulse commented Jun 27, 2016

This matches up with Shopify/liquid#746 to resolve issues Shopify/liquid#216, Shopify/liquid#215, Shopify/liquid#214, Shopify/liquid#194, Shopify/liquid#171, Shopify/liquid#162

This pull request is designed to be standalone and will pass all tests without its counterpart Shopify/liquid#746 however Shopify/liquid#746 requires this pull request and will be updated with this dependancy once merged.

@fw42
Copy link
Contributor

fw42 commented Jun 27, 2016

ping @Shopify/liquid

@evulse
Copy link
Author

evulse commented Jun 27, 2016

This adds support for {{- and {%- syntax which will lstrip! and -}} and -%} which will rstrip!

@evulse
Copy link
Author

evulse commented Jun 27, 2016

Now matches up to Shopify/liquid#773

@tobi
Copy link
Member

tobi commented Jun 27, 2016

I'd love to avoid the cycle through the Ruby VM to call rstrip!/lstrip!.

Ideally we should parse white-space as a special token, then modify the token to be marked for render-skipping if {%- or -%} are encountered.

@tobi
Copy link
Member

tobi commented Jun 27, 2016

I see now that this would make it harder to match with Shopify/liquid#746 - alright, let's not overthink it. Thanks!

@evulse
Copy link
Author

evulse commented Jun 27, 2016

Yeah I would have liked to avoid it but the trade off isn't worth it at this stage. To keep the functions inline on both sides makes maintenance easier. Also as I'm pulling back the last token if needed it requires conversion from a ruby object then back again where as this approach keeps it clean and only a few lines

@@ -56,6 +56,10 @@ static VALUE rb_block_parse(VALUE self, VALUE tokens, VALUE options)
case TOKEN_RAW:
{
VALUE str = rb_enc_str_new(token.str, token.length, utf8_encoding);

if(token.trim_whitespace)
rb_funcall(str, rb_intern("lstrip!"), 0);
Copy link
Contributor

@pushrax pushrax Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not find the offset after any whitespace (see lstrip_offset) and create the string with that offset? Avoids a memmove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking that it could be a good idea to make read_while respect the codepoint length (see other usages of read_while for what it does).

@pushrax
Copy link
Contributor

pushrax commented Jun 28, 2016

Makes sense and looks pretty good overall.

const char *start = token.str + 2;
long length = token.length - 4;

if (token.str[2] == '-') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use start[0] or *start instead

@evulse
Copy link
Author

evulse commented Jul 7, 2016

Alright we are passing again. Detection of the whitespace control characters has now been pushed back in to the tokenizer so there is no need to push and pop any existing values.

@tobi
Copy link
Member

tobi commented Jul 7, 2016

really nice work Mike.

  • tobi
    CEO Shopify

On Thu, Jul 7, 2016 at 8:18 AM, Mike Angell notifications@github.com
wrote:

Alright we are passing again. Detection of the whitespace control
characters has now been pushed back in to the tokenizer so there is no need
to push and pop any existing values.


You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
#30 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAABW7m4s55YijtSESQGVQYhVRGXmrgdks5qTO6AgaJpZM4I_Yjh
.


VALUE str = rb_enc_str_new(token_start, end - token_start, utf8_encoding);
if(token.rstrip)
rb_funcall(str, intern_rstrip, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a reverse_read_while if the lstrip part is using read_while anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I actually was thinking to do exactly that. I really only stuck to this while refactoring so I could confirm i didn't introduce anything new. Should have some time tonight to pull this out and bring it up to where the lstrip sits.

@pushrax
Copy link
Contributor

pushrax commented Jul 7, 2016

Definitely the better solution, nice.

@evulse
Copy link
Author

evulse commented Jul 8, 2016

Ok new change from using ruby rstrip is done which makes this pretty final. As such I've just run the final performance benchmark

Current

Calculating -------------------------------------
          parse:     80.921  (± 3.7%) i/s -      4.851k in  60.028478s
    parse & run:     32.196  (± 3.1%) i/s -      1.932k in  60.069380s

This feature

Calculating -------------------------------------
          parse:     80.791  (± 3.7%) i/s -      4.844k in  60.031186s
    parse & run:     32.109  (± 3.1%) i/s -      1.926k in  60.046058s

The results show the performance impact of this feature is very minimal.

@evulse
Copy link
Author

evulse commented Jul 12, 2016

@pushrax @tobi @dylanahsmith Where to from here? We would love to be able to start using this


while (cursor < last) {
if (*cursor++ != '{')
continue;

char c = *cursor++;
char w = *cursor++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can read past the end of the string. E.g. tokenizing "{{" results in a token with "{{\u0000". The while (cursor < last) check only allows us to read two characters before having to do another bounds check to see if we can read another byte.

@dylanahsmith
Copy link
Contributor

There were some corner cases that you missed as noted above. But overall this is looking very good now. Nice work

@evulse
Copy link
Author

evulse commented Jul 13, 2016

@dylanahsmith is this what you were thinking?


while (cursor < last) {
if (*cursor++ != '{')
continue;

char c = *cursor++;
if (cursor <= last && *cursor == '-') {
cursor++;
token->rstrip = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the following if (c != '%' && c != '{') branch is taken, then this change won't be reversed.

@evulse
Copy link
Author

evulse commented Jul 14, 2016

I think just shifting this removes this issue. So flowing through this previously this would have allowed {.- where . being any character. However i think shifting this only allows {%- and {{- as the tag. So the only flows are

  • These 3 characters are at the end of a string and become a raw token
  • These 3 characters are at the start and become a token or variable and run their own loops
  • This is the last token and as such there will be no other tag following that will trigger the lstrip

@evulse
Copy link
Author

evulse commented Jul 18, 2016

Is there anything else needed for this to progress?

@dylanahsmith
Copy link
Contributor

It looks like https://github.com/Shopify/liquid-c/pull/30/files#r70563502 still hasn't been addressed. That was my last concern.

…bles set these based on their behaviour to raw tags
@evulse
Copy link
Author

evulse commented Jul 20, 2016

Ok i've added a reset after this line so it will only be set if it is found.

@evulse
Copy link
Author

evulse commented Aug 1, 2016

Ok this and the ruby version look like they are ready to go. What do we need to do to get this moving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants