Syntax Trees

Why Syntax Trees

Suppose you want to find the print token in a BASIC program. You might naively try pattern matching against print. But this would fail, if the program were, say,

10 print "print is tokenized as 186"

because of the print in the string literal. Next you write a regular expression that excludes quoted text. This works for the above, but not for a program like, say,

10 print "print is tokenized as 186": rem print something about "print"
20 data print,186

Now you have to exclude comments and DATA statements as well. Furthermore, when searching for string literals, do you want to include quoted text in comments? Say you have accounted for all that, and someone comes with a program

10 let print = 3
20 print print

Now your algorithm reports there are 3 print tokens, when the programmer intended two of these as variable names. Using print as a variable name is illegal, but the point is the parser should find the error.

The purpose of a syntax tree is to provide information about the content of an expression (in the broadest sense) in a way that does not require complex pattern matching. In other words, if you have the syntax tree, you don't need to worry about any of the above.

S-expressions

S-expressions are a way of writing a syntax tree in textual form. Given the second example above, this parser yields the S-expression

(source_file
  (line
    (linenum) (statement (tok_print) (str)) (statement (tok_rem) (comment_text)))
  (line
    (linenum) (statement (tok_data) (data_literal) (data_int))))

Containment of a bracket pair within another bracket pair indicates a child-parent relationship. For example, since (line...) occurs inside the brackets associated with (source_file...), line is a child of source_file.

Now consider the task of finding all the print tokens in the code. Rather than pattern matching against the actual text, you search for the node type you want, in this case tok_print. This search is a straightforward string comparison. The perfect parser would allow you to find anything you need in this simple way. Of course no parser is perfect, so you may need to use other information, such as looking at the node's parent type, to find what you want.

Walking the Tree

The specifics of walking a syntax tree depend on the parser API. This parser is built using Tree-sitter, which exposes an API that allows you to move a cursor through the tree. For example, using the WASM binding from TypeScript one might have

// assume we have `parser` object and code in the string `mycode`
const tree = parser.parse(mycode);
const curs = tree.walk();
curs.gotoFirstChild(); // should be a line
curs.gotoFirstChild(); // should be a line number
curs.gotoNextSibling(); // should be a statement
curs.gotoFirstChild(); // could be many things
if (curs.nodeType=='tok_print')
  console.log('first statement is print');

Applications will often make use of a standard function to walk the whole tree using cursor functions. As an example, vscode-language-applesoft does it like this:

	walk(syntaxTree: Parser.Tree,visit: (node: Parser.TreeCursor) => WalkerChoice) {
		this.depth = 0;
		const curs = syntaxTree.walk();
		let choice : WalkerChoice = WalkerOptions.gotoChild;
		do {
			if (choice == WalkerOptions.gotoChild && curs.gotoFirstChild()) {
				this.depth++;
				choice = visit(curs);
			}
			else if (choice == WalkerOptions.gotoParentSibling && curs.gotoParent() && curs.gotoNextSibling()) {
				this.depth--;
				choice = visit(curs);
			}
			else if (choice==WalkerOptions.gotoSibling && curs.gotoNextSibling())
				choice = visit(curs);
			else if (curs.gotoNextSibling())
				choice = visit(curs);
			else if (curs.gotoParent()) {
				this.depth--;
				choice = WalkerOptions.gotoSibling;
			}
			else
				choice = WalkerOptions.exit;
		} while (choice!=WalkerOptions.exit);
	}

Note that the visit function will generally need to be bound to some class of which walk is a member, in order to pass information out of the function. In particular, you will have something like

this.walk(tree,this.visit_node.bind(this));

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax Trees

Syntax Trees

Why Syntax Trees

S-expressions

Walking the Tree

Clone this wiki locally