Simple expression evaluator in golang for learning lexers, parsers, ast's and interpreters.
This project doesn't support unary operators, so expression "-1+2" will report us "Missing operand". For the unary operator support a modified version of Shunting-yard algorithm is needed (See this project)
// quote your expression as it might contain shell operator:
- Tokenize (Lexer)
- Convert infix form to postfix form using Shunting-yard algorithm (Parser)
- Construct Abstract syntax tree - AST (Parser)
- Evaluate AST (Interpreter)
lexer.go : Lex()
Purpose of lexer is to turn input into sequence of tokens (lex items) and determine their type while thrashing whitespaces and reporting unrecognized characters.
Lexer on its own doen't know grammar of parsed language so it does't report syntactic errors or semantic errors.
I heavily recommend watching Youtube - Lexical Scanning in Go - Rob Pike as I use these principles.
Given well-formed expression input: "1+2+3", lexer correctly produces sequence "1", "+", "2", "+", "3".
Given not well-formed expression, such as "1)+3(((" lexer doesn't report any errors, but produces sequence "1", ")", "+", "3", "(", "(", "(". Errors will be reported in parsing phase not in lexing phase.
Gived input with unrecognized characters "*12+死+3" lexer produces error:
Lexing error at 4: Invalid symbol: '死'
Our lexer also determines type and position of lexed items, which will come handy later.
Given expression "1+1*(5-5)" lexer produces:
Type: INUMBER, Val: "1", Pos: 1
Type: IADD, Val: "+", Pos: 2
Type: INUMBER, Val: "1", Pos: 3
Type: IMUL, Val: "*", Pos: 4
Type: ILPAR, Val: "(", Pos: 5
Type: INUMBER, Val: "5", Pos: 6
Type: ISUB, Val: "-", Pos: 7
Type: INUMBER, Val: "5", Pos: 8
Type: IRPAR, Val: ")", Pos: 9
Type: EOF, Val: "", Pos: 9
parser.go : toPostfix()
The classical format of expressions "<operand> <operator> <operand>" is called infix form. This form is somewhat troublesome when constructing AST so first we will convert Infox form to Postfix form using Shunting yard algorithm. Postfix form uses rather weird syntax "<operand> <operand> <operator>". The advantage is that we get rid of the need for parentheses (those are required in infix form).
This algorithm goes trough out lexer tokens and using stack data structure and list converts the form. Stack is used as stacked operator store and list for postfix output.
The algorithm is following:
- If current token is number (INUMBER) then put it to output
- If current token is left parenthesis then Push it to stack
- If current token is right parenthesis then Pop stack until we hit left parenthesis and Pop it out
- If current token is operator +,-,*,/ Pop the stack until we find operator with higher or equal precedence (see below) and Push current operator
- When we go trough all tokens, Pop all items from stack to output
Add and Subtract = 1
Multiply and Divide = 2
Constructing Abstract syntax tree - AST
parser.go : constructAst()
AST for expressions is a simple binary tree. Once we have our expression in postfix form, creating AST is trivial. For construction we will go trough our postfix formed lex items and using stack we build a tree.
In the tree we distinguish types of nodes we will use later in interpretation. Nodes containing operator (+-*/) hold no value, just left ,right operand children. Nodes containing numbers have no children (leaves).
Algorithm for AST from postfix form is following:
Go trough all postfix items
- If current item is number, create node with this value and Push it to stack.
- If its operator, create node for it, then Pop the stack and put it to right child, Pop stack again and put it to left child (order is important). Then push new operator node to stack.
When there are no more postfix items, Pop the stack last time, this is your AST root node.
Interpreting (Evaluating AST)
interpreter.go : Interpret(), postOrderTraversal()
Once we have AST the interpreting is childs play. All we need to do is do a recursive Post order traversal of our AST binary tree and execute corresponding operation.
Post order traversal interpretation algorithm:
- Start at root
- If current node is leaf node return its value
- Recurse to left child and store return value
- Recurse to right child and store return value
- Perform arithmetic operation on stored left and right values
- Return result of operation
Result of traversal is calculated value.