Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Liquid::C::Expression to optimize expression evaluation #60

Merged
merged 2 commits into from
Oct 15, 2020

Conversation

dylanahsmith
Copy link
Contributor

@dylanahsmith dylanahsmith commented Sep 24, 2020

Depends on #59 and Shopify/liquid#1300

Problem

In #59, expressions were compiled to ruby constants or lookup ruby objects (Liquid::VariableLookup & Liquid::RangeLookup) that are evaluated with context_evaluate. This was done in that PR for simplicity, but they should really be compiled to more granular VM instructions for rendering performance and for serializability (for future caching of VM compiled templates).

Solution

I replaced the previous code for strict parsing expressions and variable lookups to instead emit VM code, which is now compiled directly into Liquid::C::BlockBody objects when compiling variables. Expressions are also often parsed and evaluated independent of Liquid::Variable in tags, so I introduced a Liquid::C::Expression object that gets returned from Liquid::Expression.parse if the expression includes a variable lookup and strict parsing of it succeeds. I ended up doing both of these in the same commit, since I also leverage the Liquid::C::Expression object for filter keyword argument parsing to handle automatic cleanup on a parse exception.

Note that a couple of direct uses of Liquid::Variable.new (in assign and echo tags) no longer leverage liquid-c for parsing, which slightly impacts parse speed. I'll embed those tags in the liquid VM code in a follow-up PR rather than introducing a Liquid::C::Variable class. This PR significantly speeds up rendering, as shown in the benchmarks, which more than makes up for this parsing regression.

Since Liquid::Variable.new is no longer monkey patched and Liquid::Expression.parse no longer turns a comparable object, I had to update the liquid-c variable unit tests so they are tested through normal template parsing and rendering like an integration test. I also stopped testing against the liquid gem's variable unit tests for the same reasons.

To help with detecting and debugging VM stack underflows or overflows, I added some assertions that are enabled by default for CI and gem development but are disabled by default for gem installation.

Benchmarks

Before this PR (on the #59 branch)

              parse:    167.388  (± 2.4%) i/s -      1.680k in  10.043356s
             render:    173.196  (± 3.5%) i/s -      1.734k in  10.022222s
     parse & render:     80.409  (± 1.2%) i/s -    808.000  in  10.050380s

after

              parse:    162.660  (± 2.5%) i/s -      1.632k in  10.040800s
             render:    231.388  (± 4.3%) i/s -      2.323k in  10.059088s
     parse & render:     86.735  (± 1.2%) i/s -    872.000  in  10.056157s

Copy link
Contributor

@macournoyer macournoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Just some questions.

test/unit/expression_test.rb Show resolved Hide resolved
break;
case OP_RENDER_VARIABLE_RESCUE:
args->node_line_number = (unsigned int)*const_ptr++;
args->ip = ip;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ip and const_ptr are saved because next instruction is assumed to be calling a Ruby function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR just moves this case to keep the rendering instruction handling together, since they shouldn't be used in the Liquid::C::Expression code.

This was added in the last PR (#59) and described as follows

Another design choice I made for this PR is to do error handling for the whole block body render, rather than for each variable render. This way we can reduce the state saving cost of rb_rescue from liquid code that doesn't encounter variable render errors. In order to recover from the exception, we just restore the stack size to what it was at the start of the block body render and iterate the instructions to jump just past the variable write.

In order to iterate the instructions to the end of the variable render on an exception, we need to save the instruction position at the start of variable rendering.

I'll add some comments to the code to make this clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also move code around in the last PR so that the diff is easier to read in this PR.

@dylanahsmith dylanahsmith force-pushed the vm-variable branch 3 times, most recently from 818b8a0 to 6f561d6 Compare October 7, 2020 20:45
Base automatically changed from vm-variable to master October 7, 2020 21:28
ext/liquid_c/expression.c Show resolved Hide resolved
ext/liquid_c/expression.c Outdated Show resolved Hide resolved
@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code)
switch (*ip++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark the ops that need to be marked and call liquid_vm_next_instruction in vm.c to move to the next instruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to explicitly handle all cases here to make sure we actually mark all ruby constants. Otherwise, it would be easy to accidentally miss a new instruction that has a ruby constant argument, where failing to GC mark that constant could end up being a subtle and hard to debug bug.

Perhaps we could use an assertion to ensure that liquid_vm_next_instruction moves the instruction and constant pointers consistently with this function. However, that brings the annoyance #ifndef NDEBUG specific state.

Instead, what I want to do in the near future is get rid of all non-ruby constants from the constants buffer. That will simplify GC marking, serialization and deserialization. However, that would add copying overhead for OP_WRITE_RAW, which makes sense for serialization & deserialization but is unnecessary overhead without serialization & deserialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of all non-ruby constants from the constants buffer

Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to add them to the instructions buffer if they can be used in their serialized form.

I was planning on using an immediate string argument for OP_WRITE_RAW, since it seems rare that we would re-use the string and we won't want the deserialization overhead.

For filter, variable and key names, it is actually quite likely that we will re-use the object, so I think a table lookup would make sense.

Longer term, I'm more interested in what is most efficient, where we can re-use code to make things easier.

Copy link
Contributor

@pushrax pushrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool stuff!

ext/liquid_c/expression.c Outdated Show resolved Hide resolved
if (rstring_eq(key, "size") || rstring_eq(key, "first") || rstring_eq(key, "last"))
vm_assembler_add_lookup_command(code, key);
else
vm_assembler_add_lookup_const_key(code, key);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like we could eventually avoid RString here entirely, but it's pretty minor.

result = empty_string;
break;
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very ugly and fast approach 💯

ext/liquid_c/parser.c Outdated Show resolved Hide resolved
rb_enc_raise(utf8_encoding, cLiquidSyntaxError, "[:%s] is not a valid expression", symbol_names[p.cur.type]);

return expr;
vm_assembler_add_push_nil(code);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases that could hit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above rb_enc_raise calls are NO_RETURN, so I guess this is just dead code. I'll remove this to avoid confusion.

}
if (is_command) {
Check_Type(key, T_STRING);
ID intern_key = rb_intern(RSTRING_PTR(key));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it seems like we could avoid the conversion through RString here as mentioned in another comment. Definitely not a blocker, tiny optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could optimize commands for this case, but we would still need to pass a string as the key to rb_funcall(object, id_aref, 1, key) if the object is a hash.

Check_Type(key, T_STRING);
ID intern_key = rb_intern(RSTRING_PTR(key));
if (rb_respond_to(object, intern_key)) {
VALUE next_object = rb_funcall(object, rb_intern(RSTRING_PTR(key)), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate rb_intern here (was in the original code, probably my fault)

@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code)
switch (*ip++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of all non-ruby constants from the constants buffer

Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).


def test_find_dynamic_variable
context = Liquid::Context.new({"x" => "y", "y" => 42})
expr = Liquid::C::Expression.strict_parse('[x]')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought [x] on its own was supposed to be a syntax error, and that it required an identifier in front. It looks like this will make {{ [x] }} possible, which is currently disallowed

Liquid::Template.parse("{{ [x] }}", "x" => "y", "y" => 42, error_mode: :strict).render
...
# => Liquid::SyntaxError (Liquid syntax error: [:open_square, "["] is not a valid expression in "{{ [x] }}")

I'm not necessarily opposed to making it allowed, but we should get consensus from @Shopify/guardians-of-the-liquid and modify this in Ruby too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was looking at what was possible with Liquid::Expression.parse("[x]"). I didn't realize we considered a dynamic variable lookup to be a strict parse error. I change it back to being a strict parse error, since I didn't intend to change it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll also note that if you try to parse and render it in lax mode, it returns an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~/src/liquid(master)$ bundle exec irb -Ilib -rliquid
irb> Liquid::Template.parse('{{ [x] }}').render({ 'x' => 'y', 'y' => 42 })
=> "42"
irb> context = Liquid::Context.new({ 'x' => 'y', 'y' => 42 })
irb> expression = Liquid::Expression.parse('[x]')
irb> context.evaluate(expression)
=> 42

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it was returning an empty string for you since you were passing the variable values as an argument to parse, when they are an argument to render

Copy link
Contributor

@pushrax pushrax Oct 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, obviously 🤦

Well, maybe we should make this work on the strict parser then. Because it works in prod with the lax fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like this has been the behaviour strict parse behaviour for variables in liquid-c for a long time now. I think since variable parsing was added to liquid-c (see how TOKEN_OPEN_SQUARE is allowed at the start of parse_variable in the PR that introduced the function https://github.com/Shopify/liquid-c/pull/13/files#diff-307861feddcb5da5f0af73741d9fdea2R93)

So I'm hesitant to change it. Especially since it could be a breaking change for strictly parsed code. I'm also unsure if it was really the intention of the strict parser to remove any features, so we could consider whether this should be considered a strict parser bug in the liquid gem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, in the end it seems we should add support for this to the Ruby strict parser.

vm_assembler_add_push_const(code, keyword_args);

if (const_keyword_args != Qnil) {
rb_hash_freeze(const_keyword_args);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to have to always freeze or always dup the hash. Otherwise, it is too easy to test a filter using dynamic keyword arguments and then get errors in production when a frozen hash is passed in for constant keyword arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #88 to remove the constant keyword arguments optimization on master. We can always add it back later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants