Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STAL-1960] Implement ddsa NamedCapture #391

Merged
merged 1 commit into from
Jun 7, 2024
Merged

[STAL-1960] Implement ddsa NamedCapture #391

merged 1 commit into from
Jun 7, 2024

Conversation

jasonforal
Copy link
Collaborator

@jasonforal jasonforal commented Jun 6, 2024

What problem are you trying to solve?

When a tree-sitter query is run, it generates "match"es, which are a collection of "captures" (which are nodes tagged by a string name). Because some rules may want to implement a majority of their logic within JavaScript (as opposed to via the tree-sitter query itself), we need to be able to handle a large number of matches and captures performantly.

What is your solution?

Unlike all of our other JavaScript abstractions (which are classes with constructors), a "NamedCapture" is a plain object with only primitive values (no functions, getters, etc.). This means that we can (ergonomically) create these objects performantly from a v8 ObjectTemplate.

For this reason, we don't define the class in JavaScript. Rather, we just use capture.js to write JSDoc type annotations.

(Note: While it's true that we could also use an ObjectTemplate for our JavaScript classes, it would make the code much less maintainable, because it is much easier to change the JavaScript representation in JavaScript instead of via the v8 API--especially when it comes to functions and non-trivial objects).

Single vs Multi Capture

The most common capture will be a SingleCapture. This is generated by a query like

(variable_declarator
    name: (identifier) @varName
    value: (number) @numVal
)

In this case, @varName would be a SingleCapture, and @numVal would also be a SingleCapture.

However, some queries can be written like this:

(variable_declarator
    name: (identifier) @varName
    value: (array
         (number) @numVal
         (number) @numVal
         (number) @numVal
    )

@varName would still be a SingleCapture. However, @numVal would be a MultiCapture because the capture name is re-used across multiple query capture nodes.

SingleCapture contains a single v8 smi as the nodeId.
MultiCapture contains multiple v8 smis as the nodeIds.

Why a Uint32Array?

When there is a MultiCapture, we represent this as a Uint32Array, not a standard array ([]). The reason for this is performance. While both would end up storing uint32s, there is no way to pre-allocate a JavaScript array without having it be treated as "holey", which triggers v8 de-optimizations. For example, if we used the v8 equivalent of const captureIds = new Array(2):

d8> %DebugPrint(new Array(2))
DebugPrint: 0x1ab00047f7d: [JSArray]
 - map: 0x01ab0028ccdd <Map[16](HOLEY_SMI_ELEMENTS)> [FastProperties]

...it would still be a holey array because it is sparse upon initialization.

We could use an empty generic array, and then just do the equivalent of [].push(1, 2, 3, /* ... */), but then we risk triggering additional allocations and memcpy if the array needs to be resized.

Instead, by using a Uint32Array, we can preallocate exactly the amount of space needed, using just one allocation, and have the exact same indexing/iterable interface that a generic array provides.

Why a string name, not an integer id?

We use a string name for captures.

Elsewhere, ddsa uses integers for performing. So why not here? Because we are creating these v8 objects entirely from the v8 API, we can guarantee the use of a kInternalized string, which does not allocate. This simplifies our implementation, because even though we know all of a rule's capture names ahead of time, we don't have to deal with parsing them and creating a map from id to string.

Alternatives considered

What the reviewer should know

  • This is purely the (isolated) JavaScript implementation of a "capture"
    • This doesn't implement the "match", as described in the description intro. That will be in a future PR.
    • The translation between the tree-sitter query Rust bindings to this JavaScript object will be implemented in a future PR that switches the use of stella over to ddsa.

@jasonforal jasonforal requested a review from juli1 June 6, 2024 12:56
Base automatically changed from jf/STAL-1960 to main June 7, 2024 15:35
@jasonforal jasonforal merged commit 40c96f6 into main Jun 7, 2024
58 of 62 checks passed
@jasonforal jasonforal deleted the jf/STAL-1960-2 branch June 7, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants