[STAL-1960] Implement ddsa NamedCapture #391
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem are you trying to solve?
When a tree-sitter query is run, it generates "match"es, which are a collection of "captures" (which are nodes tagged by a string name). Because some rules may want to implement a majority of their logic within JavaScript (as opposed to via the tree-sitter query itself), we need to be able to handle a large number of matches and captures performantly.
What is your solution?
Unlike all of our other JavaScript abstractions (which are classes with constructors), a "NamedCapture" is a plain object with only primitive values (no functions, getters, etc.). This means that we can (ergonomically) create these objects performantly from a v8 ObjectTemplate.
For this reason, we don't define the class in JavaScript. Rather, we just use capture.js to write JSDoc type annotations.
(Note: While it's true that we could also use an ObjectTemplate for our JavaScript classes, it would make the code much less maintainable, because it is much easier to change the JavaScript representation in JavaScript instead of via the v8 API--especially when it comes to functions and non-trivial objects).
Single vs Multi Capture
The most common capture will be a
SingleCapture
. This is generated by a query likeIn this case,
@varName
would be aSingleCapture
, and@numVal
would also be aSingleCapture
.However, some queries can be written like this:
@varName
would still be aSingleCapture
. However,@numVal
would be aMultiCapture
because the capture name is re-used across multiple query capture nodes.SingleCapture
contains a single v8 smi as thenodeId
.MultiCapture
contains multiple v8 smis as thenodeIds
.Why a Uint32Array?
When there is a
MultiCapture
, we represent this as aUint32Array
, not a standard array ([]
). The reason for this is performance. While both would end up storinguint32
s, there is no way to pre-allocate a JavaScript array without having it be treated as "holey", which triggers v8 de-optimizations. For example, if we used the v8 equivalent ofconst captureIds = new Array(2)
:...it would still be a holey array because it is sparse upon initialization.
We could use an empty generic array, and then just do the equivalent of
[].push(1, 2, 3, /* ... */)
, but then we risk triggering additional allocations and memcpy if the array needs to be resized.Instead, by using a
Uint32Array
, we can preallocate exactly the amount of space needed, using just one allocation, and have the exact same indexing/iterable interface that a generic array provides.Why a string name, not an integer id?
We use a string name for captures.
Elsewhere, ddsa uses integers for performing. So why not here? Because we are creating these v8 objects entirely from the v8 API, we can guarantee the use of a kInternalized string, which does not allocate. This simplifies our implementation, because even though we know all of a rule's capture names ahead of time, we don't have to deal with parsing them and creating a map from id to string.
Alternatives considered
What the reviewer should know