Adding UAST Annotations

Intro

Once you have written the code to AST parser the next step is to write the annotation code. This is the Go language code that will establish the rules to transform the original AST into the UAST (a process we call normalizing). Most of the files related to the Go part of the driver are auto-generated by the bblfsh-sdk init tool, but in order to translate the native AST to the UAST you need to complete files that are referenced in the (auto-generated) driver/main.go file.

Transformations

To translate a native AST to UAST Babelfish defines a transformation DSL in Go. It is recommended to read this document before proceeding with annotations.

The SDK will automatically generate driver/normalizer/annotation.go file with two global variables: Native and Code. Each of them represent a single transformation pass.

Native defines all transformation stages that needs to be applied to transform a native AST to an annotated UAST. Each stage is defined as []Transform (list of individual transformations) that will be applied sequentially to each node to transform it to UAST.

Code defines the last transformation stage that has access to both UAST and the original source code. This stage is used primarily to fix positional information of nodes. See positioner for more information.

`Native` transformation pass

As was mentioned before, the Native pass needs to be defined to transform the native AST to UAST and annotate it.

It usually consist of several stages, for example:

1) Trim metadata from the driver response (ResponseMetadata).

2) Rewrite internal AST node type field to a standard name.

3) Rewrite positional information using standard field names (both node fields and position fields).

4) Annotate native AST nodes with roles.

5) Normalize or fix native AST nodes.

Stage 1 is optional and is necessary only for drivers that emit additional metadata as a part of parse response.

Stages 2 and 3 are either written as a part of native driver (emit standard field names directly), or performed by transformer.ObjectToNode helper.

Stages 4 and 5 are always unique for each programming language and does all the transformations from native AST to UAST. Since these two stages are highly related, they are usually fused for each individual node type.

Each stage is isolated in the sense that it's guaranteed that all transformations have been executed for each node when the stage ends. Thus, sometimes it makes sense to move a set of transformations into a separate stage, so they can observe all the changes done by previous stages.

However, for performance considerations, stages 2-5 are usually combined into a single transformation stage, since each stage does a full scan of an AST.

Stage 1: Metadata

Drivers may emit additional metadata as a part of the parse response. To trim it from AST the ResponseMetadata transform can be used:

ResponseMetadata{
    TopLevelIsRootNode: false,
}

Stage 2 and 3: Internal type and positions

For languages that does not contain an internal type field and doesn't have a positional information in their node fields, it is recommended to emit standard field names for these properties directly from the native driver layer. For more information see KeyType, KeyStart, KeyEnd, KeyPos* in the uast package.

For languages that already have either a position or internal type as a AST node fields, transformer.ObjectToNode can be used to tell the annotations layer what fields to use.

ObjectToNode{
    InternalTypeKey: "internalType",
    OffsetKey: "startPos",
    EndOffsetKey: "endPos",
}.Mapping()

Stage 4 and 5: Annotations and normalization

Both annotation and normalization are handled with a transformation DSL, specifically - the helpers defined in ast.go.

You can read a short introduction to these helpers below.

Annotate a specific node type

The most basic annotation rule is to add a fixed set of roles to every node with a specific type. It can be achieved with AnnotateType helper:

AnnotateType("File", nil, role.File)

This example will annotate all nodes with @type = File with a role.File. Note, that all helpers assume that type field was already renamed from native type field to a standard field name (@type).

Annotate node and its fields

In most cases annotating node types is not enough for representing AST structures like control flow, loops, functions, classes, etc. These structures usually refers to other nodes via specific fields. For example, an if statement in most languages has a "condition" node, a "body" node and an optional "else" node, and each of them should have distinct roles.

To handle this scenario, AnnotateType helper allows to optionally annotate specific node fields if they exists:

AnnotateType("IfStmt",
    ObjRoles{
        "Cond": {role.If, role.Condition},
        "Body": {role.Then, role.Body},
        "Else": {role.Else},
    },
    role.If, role.Statement,
),

This example will annotate all nodes with an IfStmt type with If,Statement roles, if the "Cond" field exists it will receive an additional If,Condition roles, "Body" will receive Then,Body roles and "Else" will receive Else role. Pretty straightforward.

Note that nodes referenced by "Cond" may already include roles like Identifier,Expression, for example. All role helpers take care of this under the hood by saving roles before editing the node and append them to roles mentioned in this annotation code when writing the node.

The problem with this approach is that it's doesn't strictly check that all fields must exists, thus it allows to process invalid if statements. Since we know that all ifs should have at least a "Cond" and a "Body", we should mark these two fields them as required, while only "Else" should be optional.

To achieve this, a FieldRoles object can be passed instead. It allows to specify more options for handling specific node fields and will require that all properties not market with Opt: true must exists:

AnnotateType("IfStmt",
    FieldRoles{
        "Cond": {Roles: role.Roles{role.If, role.Condition}},
        "Body": {Roles: role.Roles{role.Then, role.Body}},
        "Else": {Opt: true, Roles: role.Roles{role.Else}},
    },
    role.If, role.Statement,
),

Note that in this example a Opt: true option is set for "Else" field, marking it as optional. All other fields are required.

Annotate an array field

The DSL makes a strict distinction between node kinds. Thus, annotations made for fields that refer to a node will not work for fields that refers to an array of nodes.

The reason behind it is that all DSL transformations are reversible, and each transformation can reconstruct the tree branch it used to check. Because of this, a transformation needs to know if it should create a node or an array of nodes.

Regardless of the reason, handling of array fields requires to use FieldRoles with an Arr: true flag:

AnnotateType("CallExpr",
    FieldRoles{
        "Fun":  {Roles: role.Roles{role.Callee}},
        "Args": {Arr: true, Roles: role.Roles{role.Argument, role.Positional}},
    },
    role.Call, role.Expression,
),

Note that "Args" field is marked as an array of nodes. Arrays are always optional and are not compatible with Opt: true flag.

Simple normalization: Token field

Nodes types like identifier, primitives and literals has a specific field that stores the value of this node: a name in case of identifier, a literal value for primitives and a comment text for comments.

AST normalization stage implies that all such nodes should have a token field name set to a standard name.

To achieve this, a FieldRoles object provides a Rename field. This field will set the new name of the field in the destination node, and the old field name will be specified as a map key:

AnnotateType("Ident",
    FieldRoles{
        "Name": {Rename: uast.KeyToken},
    },
    role.Identifier, role.Expression,
),

In this example, the Ident node will receive Identifier,Expression roles and also will have its "Name" field renamed to a uast.KeyToken (@token).

Some nodes may not have a token set by the parser, for example an if statement might be missing an "if" token. In this case an Add option can be used - it instructs the transformer to create a new field using an Op expression. Thus we can create a synthetic token field by setting Add: true and by passing a String("if") operation as an Op:

AnnotateType("IfStmt",
    FieldRoles{
        uast.KeyToken: {Add: true, Op: String("if")},

        "Cond": {Roles: role.Roles{role.If, role.Condition}},
        "Body": {Roles: role.Roles{role.Then, role.Body}},
        "Else": {Opt: true, Roles: role.Roles{role.Else}},
    },
    role.If, role.Statement,
),

Note that this time a KeyToken is set as a field name instead of a Rename, because the transformer will create this field from scratch.

Simple annotations: Field constraints

As was shown in the previous example, annotation code might specify a custom DSL operation that will be used to generate a new field value with Add. This Op option can be also used without an Add flag, in which case it will be used as a any other DSL operation in both source and destination shapes.

This operation can be used to specify additional field constraints. Note that since the transformer will use the same operation in both source and destination shapes, it's not possible to change the structure of a field this way.

For example, we can check that a field is set to specific values by setting an Op field to a any constant value operation:

AnnotateType("GenDecl",
    FieldRoles{
        "Tok": {Op: String("VAR")},
    },
    role.Variable, role.Declaration,
),

When in doubt...

You can also ask any doubt on the project's public Babelfish Slack channel which is very friendly to newcomers to the project.

Finally, if you really think that there isn't a correspondence in the UAST roles for the native role that you want to map, you can open an issue on the SDK project or fork the Babelfish SDK project on Github, add the new role to the file uast/role/role.go and make a PR. Don't expect the role to be added immediately; we're somewhat picky about freely adding roles to the UAST and depending on the stage of the project we strive for adding the more generalizable roles first before adding exotic or very language-specific ones. If your role falls into the second category the PR will be tagged as "need-research" which means that it will be re-evaluated when a similar role is needed for other languages (and thus me can see how to generalize it to cover more ground) or there is a new version of the UAST.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding-uast-annotations.md

adding-uast-annotations.md

Adding UAST Annotations

Intro

Transformations

`Native` transformation pass

Stage 1: Metadata

Stage 2 and 3: Internal type and positions

Stage 4 and 5: Annotations and normalization

Annotate a specific node type

Annotate node and its fields

Annotate an array field

Simple normalization: Token field

Simple annotations: Field constraints

When in doubt...

Files

adding-uast-annotations.md

Latest commit

History

adding-uast-annotations.md

File metadata and controls

Adding UAST Annotations

Intro

Transformations

Native transformation pass

Stage 1: Metadata

Stage 2 and 3: Internal type and positions

Stage 4 and 5: Annotations and normalization

Annotate a specific node type

Annotate node and its fields

Annotate an array field

Simple normalization: Token field

Simple annotations: Field constraints

When in doubt...

`Native` transformation pass