Skip to content

Internals: jq Path Expressions

Nico Williams edited this page Jul 11, 2023 · 1 revision

In jq a "path expression" is an expression which can be passed to the special builtin function path/1 to match paths in the input value. (The path/1 function emits a JSON representation of matching paths.) Specifically a path expression is one consisting entirely of sub-expressions such as .a, .[0], .[], and so on which traverse an object or an array, or many objects/arrays (e.g., .a[0].b[].c.d), if/then/elif/else` where all sub-exressions other than the conditional are themselves path expressions, and/or calls to functions whose bodies are path expressions.

path/1 is byte-coded like this in src/builtin.c:

    struct bytecoded_builtin builtin_def_1arg[] = {
      {"path", BLOCK(gen_op_simple(PATH_BEGIN),
                     gen_call("arg", gen_noop()),
                     gen_op_simple(PATH_END))},
    };

which means that it evaluates its argument expression bracketed by PATH_BEGIN and PATH_END instructions.

The PATH_BEGIN instruction causes the jq VM to push the current path-matching state on the stack and begin a new path-matching state where each array or object traversal in . will be recorded in jq->path.

The PATH_END instruction causes the jq VM to produce the jq->path value as the new value at the top of the stack and restore the previous path-matching state.

There are also SUBEXP_BEGIN and SUBEXP_END instructions that are used to push an "exit" from path-matching state so that, for example, conditional expressions (if conditional_expression_here then ...) and index expressions (e.g., .[index-expression_here]) (among others) contribute neither to the building of the matching path, nor to detection of invalid path expressions. These two instructions bracket those sub-expressions that are not part of the path expression -- thus the two instructions' names: they are sub-expressions of path expressions.

Clone this wiki locally