Skip to content

Internals: backtracking

Nico Williams edited this page Aug 1, 2023 · 8 revisions

Pervasive Bactracking

In jq every expression, in every context, can be a generator, and backtracking is an essential part of how generators work.

Not every expression generates any number of values other than 1. For example, null, false, true, numeric literals, and string literals that do no interpolation all produce just one value -- not zero, not two, not more, just one. But expressions involving ,, .[], range, and/or recursion can generate multiple values.

empty, of course, generates no values -- it backtracks!

How | works

The pipe operator, |, simply concatenates the AST of the left-hand side with the right-hand side's:

Exp '|' Exp {
  $$ = block_join($1, $3);
} |

How can it be that simple?! It's not magic. It has to do with how the jq bytecode VM works.

Backtracking in the jq bytecode interpreter

The VM executes instructions in order, from the first to the last, just as one would expect. At any point in the execution the value at the top of the jq stack is the value that corresponds to . at that point in the program's execution. Each instruction may manipulate the stack in some way, possibly consume the . value, and if it consumes it, push a new one.

The BACKTRACK opcode reverse program flow, backtracking the execution of the bytecode to the nearest stack save point.

Generators create save points when they still have values to yield, and on backtrack they resume the VM's forward execution order, but with a new value (the next output of the generator).

Generators can also initiate backtracking when they run out of outputs. They can also produce a final output without creating a save point, in which case on backtrack that generator will not be resumed as it's no longer there.

The , operator corresponds to the FORK opcode. FORK instructions have an associated jump target: the address of the first instruction of the right-hand side of the ,. When a FORK is executed it creates a save point. When backtracking FORK simply jumps to the right-hand side's first instruction without creating a save point.

Resumption of the whole program

When a function or program outputs a value it executes a RET instruction, and that creates a save point. For the top-level program the RET instruction causes the C function jq_next() to return, though the state of the VM and its stack remain as-is, saved in the jq_state structure. When the application calls jq_next() again the VM initiates backtracking, which causes the nearest save point's generator to be resumed -- if there are no save points, or if none of them do anything other than backtrack, then jq_next() will output a jv_invalid() to indicate that the top-level program has stopped producing outputs.

When jq_next() returns jv_invalid() to indicate that the program is exhausted, the application can reset the program state and execute it again with a new input.

Thus the jq processor simply does that: it loops over inputs running the jq program for each input, and for each input it loops over jq_next().

Back to |

So then joining two blocks of jq program then simply causes the output of the one on the left to be at the top of the stack when the one on the right begins executing.

| is not where the interesting magic for generators and backtracking is! The magic is as described above, in: a) stack save points, b) FORK, EACH, and RANGE, c) in BACKTRACK and RET and any instruction that can initiate backtracking, d) in builtin C-coded functions that can return a jv_invalid() (backtrack) or jv_invalid_with_msg() (error).

Clone this wiki locally