Implement Liquid::C::Expression to optimize expression evaluation #60

dylanahsmith · 2020-09-24T13:27:14Z

~~Depends on #59 and Shopify/liquid#1300~~

Problem

In #59, expressions were compiled to ruby constants or lookup ruby objects (Liquid::VariableLookup & Liquid::RangeLookup) that are evaluated with context_evaluate. This was done in that PR for simplicity, but they should really be compiled to more granular VM instructions for rendering performance and for serializability (for future caching of VM compiled templates).

Solution

I replaced the previous code for strict parsing expressions and variable lookups to instead emit VM code, which is now compiled directly into Liquid::C::BlockBody objects when compiling variables. Expressions are also often parsed and evaluated independent of Liquid::Variable in tags, so I introduced a Liquid::C::Expression object that gets returned from Liquid::Expression.parse if the expression includes a variable lookup and strict parsing of it succeeds. I ended up doing both of these in the same commit, since I also leverage the Liquid::C::Expression object for filter keyword argument parsing to handle automatic cleanup on a parse exception.

Note that a couple of direct uses of Liquid::Variable.new (in assign and echo tags) no longer leverage liquid-c for parsing, which slightly impacts parse speed. I'll embed those tags in the liquid VM code in a follow-up PR rather than introducing a Liquid::C::Variable class. This PR significantly speeds up rendering, as shown in the benchmarks, which more than makes up for this parsing regression.

Since Liquid::Variable.new is no longer monkey patched and Liquid::Expression.parse no longer turns a comparable object, I had to update the liquid-c variable unit tests so they are tested through normal template parsing and rendering like an integration test. I also stopped testing against the liquid gem's variable unit tests for the same reasons.

To help with detecting and debugging VM stack underflows or overflows, I added some assertions that are enabled by default for CI and gem development but are disabled by default for gem installation.

Benchmarks

Before this PR (on the #59 branch)

              parse:    167.388  (± 2.4%) i/s -      1.680k in  10.043356s
             render:    173.196  (± 3.5%) i/s -      1.734k in  10.022222s
     parse & render:     80.409  (± 1.2%) i/s -    808.000  in  10.050380s

after

              parse:    162.660  (± 2.5%) i/s -      1.632k in  10.040800s
             render:    231.388  (± 4.3%) i/s -      2.323k in  10.059088s
     parse & render:     86.735  (± 1.2%) i/s -    872.000  in  10.056157s

macournoyer

Looking good! Just some questions.

test/unit/expression_test.rb

macournoyer · 2020-09-25T18:23:04Z

ext/liquid_c/vm.c

+                break;
+            case OP_RENDER_VARIABLE_RESCUE:
+                args->node_line_number = (unsigned int)*const_ptr++;
+                args->ip = ip;


ip and const_ptr are saved because next instruction is assumed to be calling a Ruby function?

This PR just moves this case to keep the rendering instruction handling together, since they shouldn't be used in the Liquid::C::Expression code.

This was added in the last PR (#59) and described as follows

Another design choice I made for this PR is to do error handling for the whole block body render, rather than for each variable render. This way we can reduce the state saving cost of rb_rescue from liquid code that doesn't encounter variable render errors. In order to recover from the exception, we just restore the stack size to what it was at the start of the block body render and iterate the instructions to jump just past the variable write.

In order to iterate the instructions to the end of the variable render on an exception, we need to save the instruction position at the start of variable rendering.

I'll add some comments to the code to make this clearer.

I'll also move code around in the last PR so that the diff is easier to read in this PR.

ext/liquid_c/expression.c

peterzhu2118 · 2020-10-08T14:37:44Z

ext/liquid_c/vm_assembler.c

@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code)
        switch (*ip++) {


Can we mark the ops that need to be marked and call liquid_vm_next_instruction in vm.c to move to the next instruction?

I want to explicitly handle all cases here to make sure we actually mark all ruby constants. Otherwise, it would be easy to accidentally miss a new instruction that has a ruby constant argument, where failing to GC mark that constant could end up being a subtle and hard to debug bug.

Perhaps we could use an assertion to ensure that liquid_vm_next_instruction moves the instruction and constant pointers consistently with this function. However, that brings the annoyance #ifndef NDEBUG specific state.

Instead, what I want to do in the near future is get rid of all non-ruby constants from the constants buffer. That will simplify GC marking, serialization and deserialization. However, that would add copying overhead for OP_WRITE_RAW, which makes sense for serialization & deserialization but is unnecessary overhead without serialization & deserialization.

get rid of all non-ruby constants from the constants buffer

Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).

Yeah, I wanted to add them to the instructions buffer if they can be used in their serialized form.

I was planning on using an immediate string argument for OP_WRITE_RAW, since it seems rare that we would re-use the string and we won't want the deserialization overhead.

For filter, variable and key names, it is actually quite likely that we will re-use the object, so I think a table lookup would make sense.

Longer term, I'm more interested in what is most efficient, where we can re-use code to make things easier.

pushrax

Very cool stuff!

ext/liquid_c/expression.c

pushrax · 2020-10-07T23:14:40Z

ext/liquid_c/parser.c

+            if (rstring_eq(key, "size") || rstring_eq(key, "first") || rstring_eq(key, "last"))
+                vm_assembler_add_lookup_command(code, key);
+            else
+                vm_assembler_add_lookup_const_key(code, key);


Feels like we could eventually avoid RString here entirely, but it's pretty minor.

pushrax · 2020-10-07T23:16:10Z

ext/liquid_c/parser.c

+                        result = empty_string;
+                    break;
+            }
+            break;


Very ugly and fast approach 💯

ext/liquid_c/parser.c

pushrax · 2020-10-08T06:01:56Z

ext/liquid_c/parser.c

-        rb_enc_raise(utf8_encoding, cLiquidSyntaxError, "[:%s] is not a valid expression", symbol_names[p.cur.type]);
-
-    return expr;
+    vm_assembler_add_push_nil(code);


Are there cases that could hit this?

The above rb_enc_raise calls are NO_RETURN, so I guess this is just dead code. I'll remove this to avoid confusion.

pushrax · 2020-10-09T03:02:58Z

ext/liquid_c/variable_lookup.c

-            }
+    if (is_command) {
+        Check_Type(key, T_STRING);
+        ID intern_key = rb_intern(RSTRING_PTR(key));


Yeah it seems like we could avoid the conversion through RString here as mentioned in another comment. Definitely not a blocker, tiny optimization.

We could optimize commands for this case, but we would still need to pass a string as the key to rb_funcall(object, id_aref, 1, key) if the object is a hash.

pushrax · 2020-10-09T03:03:56Z

ext/liquid_c/variable_lookup.c

+        Check_Type(key, T_STRING);
+        ID intern_key = rb_intern(RSTRING_PTR(key));
+        if (rb_respond_to(object, intern_key)) {
+            VALUE next_object = rb_funcall(object, rb_intern(RSTRING_PTR(key)), 0);


Duplicate rb_intern here (was in the original code, probably my fault)

pushrax · 2020-10-09T03:06:43Z

ext/liquid_c/vm_assembler.c

@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code)
        switch (*ip++) {


get rid of all non-ruby constants from the constants buffer

Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).

pushrax · 2020-10-09T03:24:15Z

test/unit/expression_test.rb

+
+  def test_find_dynamic_variable
+    context = Liquid::Context.new({"x" => "y", "y" => 42})
+    expr = Liquid::C::Expression.strict_parse('[x]')


I thought [x] on its own was supposed to be a syntax error, and that it required an identifier in front. It looks like this will make {{ [x] }} possible, which is currently disallowed

Liquid::Template.parse("{{ [x] }}", "x" => "y", "y" => 42, error_mode: :strict).render ... # => Liquid::SyntaxError (Liquid syntax error: [:open_square, "["] is not a valid expression in "{{ [x] }}")

I'm not necessarily opposed to making it allowed, but we should get consensus from @Shopify/guardians-of-the-liquid and modify this in Ruby too.

Oh, I was looking at what was possible with Liquid::Expression.parse("[x]"). I didn't realize we considered a dynamic variable lookup to be a strict parse error. I change it back to being a strict parse error, since I didn't intend to change it here.

I’ll also note that if you try to parse and render it in lax mode, it returns an empty string.

~/src/liquid(master)$ bundle exec irb -Ilib -rliquid irb> Liquid::Template.parse('{{ [x] }}').render({ 'x' => 'y', 'y' => 42 }) => "42" irb> context = Liquid::Context.new({ 'x' => 'y', 'y' => 42 }) irb> expression = Liquid::Expression.parse('[x]') irb> context.evaluate(expression) => 42

Oh, it was returning an empty string for you since you were passing the variable values as an argument to parse, when they are an argument to render

Ah, obviously 🤦

Well, maybe we should make this work on the strict parser then. Because it works in prod with the lax fallback.

Actually, it looks like this has been the behaviour strict parse behaviour for variables in liquid-c for a long time now. I think since variable parsing was added to liquid-c (see how TOKEN_OPEN_SQUARE is allowed at the start of parse_variable in the PR that introduced the function https://github.com/Shopify/liquid-c/pull/13/files#diff-307861feddcb5da5f0af73741d9fdea2R93)

So I'm hesitant to change it. Especially since it could be a breaking change for strictly parsed code. I'm also unsure if it was really the intention of the strict parser to remove any features, so we could consider whether this should be considered a strict parser bug in the liquid gem.

I agree, in the end it seems we should add support for this to the Ruby strict parser.

dylanahsmith · 2020-10-14T19:14:44Z

ext/liquid_c/variable.c

-                vm_assembler_add_push_const(code, keyword_args);
+
+            if (const_keyword_args != Qnil) {
+                rb_hash_freeze(const_keyword_args);


I think we are going to have to always freeze or always dup the hash. Otherwise, it is too easy to test a filter using dynamic keyword arguments and then get errors in production when a frozen hash is passed in for constant keyword arguments.

I've opened #88 to remove the constant keyword arguments optimization on master. We can always add it back later.

dylanahsmith requested review from macournoyer and pushrax September 24, 2020 13:27

dylanahsmith mentioned this pull request Sep 24, 2020

Only use MethodLiteral in condition expressions Shopify/liquid#1300

Merged

dylanahsmith force-pushed the vm-expression branch from df377cf to afae259 Compare September 25, 2020 15:27

macournoyer approved these changes Sep 25, 2020

View reviewed changes

dylanahsmith force-pushed the vm-expression branch from d7d9ee0 to 900c227 Compare September 25, 2020 19:49

dylanahsmith mentioned this pull request Sep 25, 2020

Avoid an expression match on a dynamic variable #61

Merged

dylanahsmith force-pushed the vm-variable branch from ea00f8b to 5a5b34a Compare September 25, 2020 22:27

dylanahsmith force-pushed the vm-expression branch from 900c227 to 23b8e12 Compare September 25, 2020 22:35

dylanahsmith force-pushed the vm-variable branch from 5a5b34a to 9d8a3d7 Compare September 28, 2020 20:49

dylanahsmith force-pushed the vm-expression branch from fe51bc8 to 5dc1ca1 Compare September 28, 2020 20:50

dylanahsmith mentioned this pull request Sep 30, 2020

Compile variable nodes into the Liquid::C::BlockBody VM code #59

Merged

dylanahsmith force-pushed the vm-variable branch from 9d8a3d7 to 239b210 Compare September 30, 2020 19:27

dylanahsmith force-pushed the vm-expression branch from 5dc1ca1 to 5cee268 Compare September 30, 2020 19:38

dylanahsmith force-pushed the vm-variable branch from 239b210 to eeae3a1 Compare September 30, 2020 19:57

dylanahsmith force-pushed the vm-expression branch from 5cee268 to 56096af Compare September 30, 2020 20:26

dylanahsmith force-pushed the vm-variable branch from eeae3a1 to a2c628d Compare October 5, 2020 15:29

dylanahsmith mentioned this pull request Oct 5, 2020

Extract a vm_assembler_t struct from block_body_t #69

Merged

dylanahsmith force-pushed the vm-variable branch from a2c628d to 25be6e4 Compare October 6, 2020 18:45

dylanahsmith mentioned this pull request Oct 6, 2020

Set Context#initialize instance variables before squashing assigns Shopify/liquid#1307

Merged

dylanahsmith force-pushed the vm-expression branch from 56096af to 1fef4af Compare October 6, 2020 21:15

dylanahsmith force-pushed the vm-variable branch 3 times, most recently from 818b8a0 to 6f561d6 Compare October 7, 2020 20:45

Base automatically changed from vm-variable to master October 7, 2020 21:28

dylanahsmith force-pushed the vm-expression branch from 1fef4af to 2855e4a Compare October 7, 2020 21:34

dylanahsmith requested a review from peterzhu2118 October 7, 2020 21:34

peterzhu2118 reviewed Oct 8, 2020

View reviewed changes

peterzhu2118 approved these changes Oct 8, 2020

View reviewed changes

dylanahsmith force-pushed the vm-expression branch from 2e17c30 to 14e0e9a Compare October 8, 2020 16:38

dylanahsmith mentioned this pull request Oct 8, 2020

Add support for serializing liquid templates along with their VM code #77

Open

pushrax reviewed Oct 9, 2020

View reviewed changes

pushrax approved these changes Oct 9, 2020

View reviewed changes

dylanahsmith commented Oct 14, 2020

View reviewed changes

dylanahsmith mentioned this pull request Oct 14, 2020

Always pass a mutable keyword arguments hash to filters #88

Merged

dylanahsmith force-pushed the vm-expression branch from 6e7d0d2 to 6525790 Compare October 15, 2020 10:57

dylanahsmith added 2 commits October 15, 2020 11:57

Implement Liquid::C::Expression to optimize expression evaluation

6603033

Add debug assertions for VM stack operations

e5c3c4a

dylanahsmith force-pushed the vm-expression branch from 6525790 to e5c3c4a Compare October 15, 2020 15:59

dylanahsmith merged commit 7f0db41 into master Oct 15, 2020

dylanahsmith deleted the vm-expression branch October 15, 2020 16:01

This was referenced Oct 19, 2020

Fix strict parsing of find variable with a name expression Shopify/liquid#1317

Merged

Start compiling tags into liquid VM code #96

Open

Fix lookup on variable with literal name #98

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Liquid::C::Expression to optimize expression evaluation #60

Implement Liquid::C::Expression to optimize expression evaluation #60

dylanahsmith commented Sep 24, 2020 •

edited

Loading

macournoyer left a comment

macournoyer Sep 25, 2020

dylanahsmith Sep 25, 2020

dylanahsmith Sep 25, 2020

peterzhu2118 Oct 8, 2020

dylanahsmith Oct 8, 2020

pushrax Oct 9, 2020

dylanahsmith Oct 9, 2020

pushrax left a comment

pushrax Oct 7, 2020

pushrax Oct 7, 2020

pushrax Oct 8, 2020

dylanahsmith Oct 9, 2020

pushrax Oct 9, 2020

dylanahsmith Oct 9, 2020

pushrax Oct 9, 2020

pushrax Oct 9, 2020

pushrax Oct 9, 2020

dylanahsmith Oct 9, 2020

pushrax Oct 9, 2020

dylanahsmith Oct 9, 2020

dylanahsmith Oct 9, 2020

pushrax Oct 9, 2020 •

edited

Loading

dylanahsmith Oct 9, 2020

pushrax Oct 9, 2020

dylanahsmith Oct 14, 2020

dylanahsmith Oct 14, 2020

		@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code)
		switch (*ip++) {

Implement Liquid::C::Expression to optimize expression evaluation #60

Implement Liquid::C::Expression to optimize expression evaluation #60

Conversation

dylanahsmith commented Sep 24, 2020 • edited Loading

Problem

Solution

Benchmarks

macournoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pushrax left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pushrax Oct 9, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dylanahsmith commented Sep 24, 2020 •

edited

Loading

pushrax Oct 9, 2020 •

edited

Loading