[STAL-1960] Implement ddsa NamedCapture #391

jasonforal · 2024-06-06T12:42:27Z

What problem are you trying to solve?

When a tree-sitter query is run, it generates "match"es, which are a collection of "captures" (which are nodes tagged by a string name). Because some rules may want to implement a majority of their logic within JavaScript (as opposed to via the tree-sitter query itself), we need to be able to handle a large number of matches and captures performantly.

What is your solution?

Unlike all of our other JavaScript abstractions (which are classes with constructors), a "NamedCapture" is a plain object with only primitive values (no functions, getters, etc.). This means that we can (ergonomically) create these objects performantly from a v8 ObjectTemplate.

For this reason, we don't define the class in JavaScript. Rather, we just use capture.js to write JSDoc type annotations.

(Note: While it's true that we could also use an ObjectTemplate for our JavaScript classes, it would make the code much less maintainable, because it is much easier to change the JavaScript representation in JavaScript instead of via the v8 API--especially when it comes to functions and non-trivial objects).

Single vs Multi Capture

The most common capture will be a SingleCapture. This is generated by a query like

(variable_declarator
    name: (identifier) @varName
    value: (number) @numVal
)

In this case, @varName would be a SingleCapture, and @numVal would also be a SingleCapture.

However, some queries can be written like this:

(variable_declarator
    name: (identifier) @varName
    value: (array
         (number) @numVal
         (number) @numVal
         (number) @numVal
    )

@varName would still be a SingleCapture. However, @numVal would be a MultiCapture because the capture name is re-used across multiple query capture nodes.

SingleCapture contains a single v8 smi as the nodeId.
MultiCapture contains multiple v8 smis as the nodeIds.

Why a Uint32Array?

When there is a MultiCapture, we represent this as a Uint32Array, not a standard array ([]). The reason for this is performance. While both would end up storing uint32s, there is no way to pre-allocate a JavaScript array without having it be treated as "holey", which triggers v8 de-optimizations. For example, if we used the v8 equivalent of const captureIds = new Array(2):

d8> %DebugPrint(new Array(2))
DebugPrint: 0x1ab00047f7d: [JSArray]
 - map: 0x01ab0028ccdd <Map[16](HOLEY_SMI_ELEMENTS)> [FastProperties]

...it would still be a holey array because it is sparse upon initialization.

We could use an empty generic array, and then just do the equivalent of [].push(1, 2, 3, /* ... */), but then we risk triggering additional allocations and memcpy if the array needs to be resized.

Instead, by using a Uint32Array, we can preallocate exactly the amount of space needed, using just one allocation, and have the exact same indexing/iterable interface that a generic array provides.

Why a string name, not an integer id?

We use a string name for captures.

Elsewhere, ddsa uses integers for performing. So why not here? Because we are creating these v8 objects entirely from the v8 API, we can guarantee the use of a kInternalized string, which does not allocate. This simplifies our implementation, because even though we know all of a rule's capture names ahead of time, we don't have to deal with parsing them and creating a map from id to string.

Alternatives considered

What the reviewer should know

This is purely the (isolated) JavaScript implementation of a "capture"
- This doesn't implement the "match", as described in the description intro. That will be in a future PR.
- The translation between the tree-sitter query Rust bindings to this JavaScript object will be implemented in a future PR that switches the use of stella over to ddsa.

jasonforal requested a review from juli1 June 6, 2024 12:56

juli1 approved these changes Jun 7, 2024

View reviewed changes

jasonforal force-pushed the jf/STAL-1960 branch from 01ecc91 to 30af728 Compare June 7, 2024 15:19

Base automatically changed from jf/STAL-1960 to main June 7, 2024 15:35

Add ddsa NamedCapture

5bc7b5a

jasonforal force-pushed the jf/STAL-1960-2 branch from 94ec7ee to 5bc7b5a Compare June 7, 2024 15:39

jasonforal merged commit 40c96f6 into main Jun 7, 2024
58 of 62 checks passed

jasonforal deleted the jf/STAL-1960-2 branch June 7, 2024 15:39

jasonforal mentioned this pull request Jun 7, 2024

[STAL-1960] Implement ddsa QueryMatch + bridge #393

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[STAL-1960] Implement ddsa NamedCapture #391

[STAL-1960] Implement ddsa NamedCapture #391

jasonforal commented Jun 6, 2024 •

edited

Loading

[STAL-1960] Implement ddsa NamedCapture #391

[STAL-1960] Implement ddsa NamedCapture #391

Conversation

jasonforal commented Jun 6, 2024 • edited Loading

What problem are you trying to solve?

What is your solution?

Single vs Multi Capture

Why a Uint32Array?

Why a string name, not an integer id?

Alternatives considered

What the reviewer should know

jasonforal commented Jun 6, 2024 •

edited

Loading