# Arithmetic Parser

Creating a simple arithmetic parser in Python involves several steps:

- **Tokenization**: Break the input string into a list of tokens. In the case of arithmetic operations, tokens could be numbers, parentheses, or operators like `+`, `-`, `*`, `/`.
- **Parsing**: Convert the list of tokens into an abstract syntax tree (AST). In the case of arithmetic expressions, this will involve handling operator precedence and associativity.
- **Evaluation**: Walk through the AST and calculate the result of the expression.

In [2]:
import re

# Step 1: Tokenization
def tokenize(expression):
    return re.findall(r"\d+|\+|\-|\*|\/|\(|\)", expression)

# Step 2: Parsing
def parse(tokens):
    def evaluate(tokens):
        args = [term(tokens)]
        while tokens and tokens[0] in "+-":
            op = tokens.pop(0)
            if op == '+':
                args.append(term(tokens))
            else:
                args.append(-term(tokens))
        return sum(args)

    def term(tokens):
        args = [factor(tokens)]
        while tokens and tokens[0] in "*/":
            op = tokens.pop(0)
            if op == '*':
                args.append(factor(tokens))
            else:
                args.append(1 / factor(tokens))
        result = 1
        for arg in args:
            result *= arg
        return result

    def factor(tokens):
        if tokens[0] == '(':
            tokens.pop(0)  # Remove '('
            result = evaluate(tokens)
            tokens.pop(0)  # Remove ')'
        else:
            result = float(tokens.pop(0))
        return result

    return evaluate(tokens)

# Step 3: Evaluation
def evaluate_expression(expression):
    tokens = tokenize(expression)
    return parse(tokens)

In [3]:
evaluate_expression("2+3")  # 5

5.0

In [4]:
# let's check (2+3)*4 which should be 20
evaluate_expression("(2+3)*4")  # 20

20.0

In [5]:
# let's try triple parentheses (2+3)*((4+5)/3) which should be 15
evaluate_expression("(2+3)*((4+5)/3)")  # 15

15.0

In [6]:
# let's add some text in between like Valdis and RBS
# so 2 + Valdis RBS 6 should be 8
evaluate_expression("2 + Valdis RBS 6")  # 8

8.0

## Building your own tokenizer without regular expressions

In [None]:
# we could make our own tokenizer without using regex but that would be more work
# we could do it using finite state machin
# TODO: implement own tokenizer without regex

## Parser design decisions

- This is a simple recursive-descent parser. It doesn't handle many edge cases or provide detailed error messages.
- The tokenization step is done with a regular expression to capture digits, operators, and parentheses.
- The parsing step converts the list of tokens into a number by recursively breaking down the terms and factors, taking into account the precedence and associativity of the operators.
- Finally, the evaluation step is simple because we've constructed our AST such that each node immediately knows how to evaluate itself. In this simple example, the AST is implicitly built into the recursive structure of the `parse()` function.

In [3]:
test_expressions = [
    "2+3",
    "2-3",
    "2*3",
    "6/3",
    "(2+3)*4",
    "(2+3)/5",
    "2+3*4",
    "2*3+4",
]
for expression in test_expressions:
    print(f"{expression} = {evaluate_expression(expression)}")

2+3 = 5.0
2-3 = -1.0
2*3 = 6.0
6/3 = 2.0
(2+3)*4 = 20.0
(2+3)/5 = 1.0
2+3*4 = 14.0
2*3+4 = 10.0
