In [1]:
from IPython.core.display import HTML
with open('../style.css') as file:
    css = file.read()
HTML(css)

# The Shunting Yard Algorithm (Operator Precedence Parsing)

The function $\texttt{toInt}(s)$ tries to convert the string $s$ to an integer.  If this works out, the integer is returned.  Otherwise, the string $s$ is returned unchanged.

In [2]:
def toInt(s):
    try:
        return int(s)   
    except ValueError:
        return s

In [3]:
toInt('123')

123

In [4]:
toInt('**')

'**'

The module `re` provides support for <a href='https://en.wikipedia.org/wiki/Regular_expression'>regular expressions</a>.  These are needed for
<em style="color:blue;">tokenizing</em> a string.

In [5]:
import re

The function $\texttt{tokenize}(s)$ takes a string $s$ representing an arithmetic expression and splits this string into a list of tokens.
The string `regExp` in the implementation below is interpreted as follows:

  - The `r` in front of the apostrophe `'` specifies that the regular expression is defined as a *raw string*.  
    In a *raw string* the backslash does not have to be escaped because it is treated as a literal character.
  - The regular expression is divided into three parts. These parts are separated by the character `|`.  
      1. `[0-9]+` matches a natural number.  For example, it matches `0` or `123`.  It would also match a string like `007`.
         The `+` at the end of the substring `[0-9]+` specifies that there are any positive number of the characters in the range `[0-9]`.
      2. `\*\*` matches the operator `**`.
      3. `[()+*/%-]` matches a parenthesis or an arithmetical operator.  Note that we have 
         to put the symbol `-` last in this group as otherwise this symbol would be 
         interpreted as a range operator.

In [6]:
def tokenize(s):
    regExp = r'[0-9]+|\*\*|[()+\-*%/]'
    L = [ toInt(t) for t in re.findall(regExp, s) ]
    return list(reversed(L))

In [7]:
re.findall(r'[0-9]+|\*\*|[()+*%/-]', '11 * 22 * 33**45')

['11', '*', '22', '*', '33', '**', '45']

In [8]:
tokenize('12 * 23 * 34**45')

[45, '**', 34, '*', 23, '*', 12]

Given an operator $o$, the expression $\texttt{precedence}(o)$ returns the precedence of the operator
$o$.  If $o_1$ and $o_2$ are different operators and the <em style="color:blue">precedence</em> of $\texttt{o}_1$ is at least as high than the 
<em style="color:blue">precedence</em> of $\texttt{o}_2$, then the expression
$$ a \;\texttt{o}_1\; b \;\texttt{o}_2\; c $$ 
should be evaluated as
$$ (a \;\texttt{o}_1\; b) \;\texttt{o}_2\; c. $$
Otherwise, the expression $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$ should be evaluated as
$$ a \;\texttt{o}_1\; (b \;\texttt{o}_2\; c). $$ 

In [9]:
def precedence(o):
    Precedence = { '+': 1, '-': 1, '*': 2, '/': 2, '%': 2, '**' : 3 }
    return Precedence[o]

The expression `isLeftAssociative}(o)` is `True` iff the operator $o$ 
*associates to the left*.  If $o$ *associates to the right*, this expression is `False`.  
If the operator $o$ is unknown, evaluation of the expression results 
in an error.

In [10]:
def isLeftAssociative(o):
    if o in { '+', '-', '*', '/', '%' }:
        return True
    if o in { '**' }:
        return False
    assert False, f'unknown operator {o}'

The function `evalBefore(o1, o2)` receives to strings representing arithmetical operators.  It returns `True` if the operator $o_1$ should be evaluated before the operator $o_2$ in an arithmetical expression of the form $a \;\texttt{o}_1\; b \;\texttt{o}_2\; c$.  In order to determine whether $o_1$ should be evaluated before $o_2$ it uses the *precedence* and the *associativity* of the operators.  
Its behavior is specified by the following rules:
- $\texttt{precedence}(o_1) > \texttt{precedence}(o_2) \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{True}$,
- $o_1 = o_2 \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{isLeftAssociative}(o_1)$,
- $\texttt{precedence}(o_1) = \texttt{precedence}(o_2) \wedge o_1 \not= o_2 \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{True}$,
- $\texttt{precedence}(o_1) < \texttt{precedence}(o_2) \rightarrow \texttt{evalBefore}(\texttt{o}_1, \texttt{o}_2) = \texttt{False}$.

In [11]:
def evalBefore(stackOp, nextOp):
    if precedence(stackOp) > precedence(nextOp):
        return True
    if stackOp == nextOp:
        return isLeftAssociative(stackOp)
    if precedence(stackOp) == precedence(nextOp) and stackOp != nextOp:
        return True
    if precedence(stackOp) < precedence(nextOp):
        return False
    assert False, f'incomplete case distinction in evalBefore({stackOp}, {nextOp})'

In [12]:
%%capture
%run Stack.ipynb

The class `Calculator` supports three member variables:
  - the token stack `mTokenStack` 
  - the operator stack `mOperators`
  - the argument stack `mArguments`
  
The constructor takes a string that is tokenized and pushes the tokens onto the token stack such that the first token is on top of the token stack.

In [13]:
class Calculator:
    def __init__(self, s):
        self.mTokens    = createStack(tokenize(s))
        self.mOperators = Stack()
        self.mArguments = Stack()    

The method `__str__` is used to convert an object of class `Calculator` to a string.

In [14]:
def toString(self):
    return '\n'.join(['_'*50, 
                      'Tokens:    ', str(self.mTokens), 
                      'Arguments: ', str(self.mArguments), 
                      'Operators: ', str(self.mOperators), 
                      '_'*50])

Calculator.__str__ = toString
del toString

In [15]:
Calculator.__repr__ = Calculator.__str__

The function $\texttt{evaluate}(\texttt{self})$ evaluates the expression that is given by the tokens on the `mTokenStack`.  
There are two phases:
1. The first phase is the <em style="color:blue">reading phase</em>. In this phase
   the tokens are removed from the token stack `mTokens`.  
2. The second phase is the <em style="color:blue">evaluation phase</em>.  In this phase,
   the remaining operators on the operator stack `mOperators` are evaluated.  Note that some operators are already 
   evaluated in the *reading phase*.

We can describe what happens in the *reading phase* using 
<em style="color:blue">rewrite rules</em> that describe how the three stacks `mTokens`, `mArguments` and `mOperators`
are changed in each *step*.  Here, a *step* is one iteration of the first `while`-loop of the function `evaluate`.
The following *rewrite rules* are executed until the token stack `mTokens` is empty.
1. If the token on top of the token stack is an integer, it is removed from the token stack and pushed onto the argument stack.
   The operator stack remains unchanged in this case.  
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{token} ] & \wedge \\
     \texttt{isInteger}(\texttt{token}) & \Rightarrow \\[0.2cm]
     \texttt{mArguments}' = \texttt{mArguments} + [\texttt{token}] & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
     \texttt{mOperators}' = \texttt{mOperators}
     \end{array} 
   $$
   Here, the primed variable $\texttt{mArguments}'$ refers to the argument stack after  $\texttt{token}$
   has been pushed onto it.
   
   In the following rules we implicitly assume that the token on top of the token stack is not an integer but 
   rather a parenthesis or a proper operator.  In order to be more concise, we suppress this precondition from the 
   following rewrite rules.
2. If the operator stack is empty, the next token is pushed onto the operator stack.
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{op} ] & \wedge \\
     \texttt{mOperators} = [] & \Rightarrow \\[0.2cm]
     \texttt{mOperators}' = \texttt{mOperators} + [\texttt{op}] & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
     \texttt{mArguments}' = \texttt{mArguments} 
     \end{array} 
   $$
3. If the next token is an opening parenthesis, this parenthesis token is pushed onto the operator stack.
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{'('} ] & \Rightarrow \\[0.2cm]
     \texttt{mOperators}' = \texttt{mOperators} + [\texttt{'('}] & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
     \texttt{mArguments}' = \texttt{mArguments} 
     \end{array} 
   $$
4. If the next token is a closing parenthesis and the operator on top of the operator stack is an opening parenthesis, then both 
   parentheses are removed.
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{')'} ] & \wedge \\
     \texttt{mOperators} =\texttt{mOperatorsRest} + [\texttt{'('}]                  & \Rightarrow \\[0.2cm]
     \texttt{mOperators}' = \texttt{mOperatorsRest} & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
     \texttt{mArguments}' = \texttt{mArguments} 
     \end{array} 
   $$
5. If the next token is a closing parenthesis but the operator on top of the operator stack is not an opening parenthesis, 
   the operator on top of the operator stack is evaluated.  Note that the token stack is not changed in this case.
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{')'} ] & \wedge \\
     \texttt{mOperatorsRest} + [\texttt{op}]                   & \wedge \\
     \texttt{op} \not= \texttt{'('}                            & \wedge \\
     \texttt{mArguments} = \texttt{mArgumentsRest} + [\texttt{lhs}, \texttt{rhs}] & \Rightarrow \\[0.2cm]
        \texttt{mOperators}' = \texttt{mOperatorsRest} & \wedge \\
         \texttt{mTokens}' = \texttt{mTokens} & \wedge \\
         \texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;\texttt{op}\; \texttt{rhs}]
     \end{array} 
   $$
   Here, the expression $\texttt{lhs} \;\texttt{op}\; \texttt{rhs}$ denotes evaluating the operator $\texttt{op}$ with the arguments
   $\texttt{lhs}$ and $\texttt{rhs}$.
6. If the token on top of the operator stack is an opening parenthesis, then the operator on top of the token stack
   is pushed onto the operator stack.
   $$\begin{array}{lc}
     \texttt{mTokens} = \texttt{mTokensRest} + [\texttt{op}] & \wedge \\
     \texttt{op} \not= \texttt{')'}                          & \wedge \\
     \texttt{mOperators} = \texttt{mOperatorsRest} + [\texttt{'('}] & \Rightarrow \\[0.2cm]
     \texttt{mOperator}' = \texttt{mOperator} + [\texttt{op}] & \wedge \\
     \texttt{mTokens}' = \texttt{mTokensRest} & \wedge \\
     \texttt{mArguments}' = \texttt{mArguments}
     \end{array} 
   $$
   
   In the remaining cases neither the token on top of the token stack nor the operator on top of the operator stack can be a
   a parenthesis.  The following rules will implicitly assume that this is the case.
7. If the operator on top of the operator stack needs to be evaluated before the operator on top of the token stack,
   the operator on top of the operator stack is evaluated.
      $$\begin{array}{lc}
        \texttt{mTokens} = \texttt{mTokensRest} + [o_2]                                        & \wedge \\
        \texttt{mOperatorsRest} + [o_1]                                                        & \wedge \\
        \texttt{evalBefore}(o_1, o_2)                                                          & \wedge \\ 
        \texttt{mArguments} = \texttt{mArgumentsRest} + [\texttt{lhs}, \texttt{rhs}]           & \Rightarrow \\[0.2cm]
        \texttt{mOperators}' = \texttt{mOperatorRest}                                          & \wedge \\
        \texttt{mTokens}' = \texttt{mTokens}                                                   & \wedge \\
        \texttt{mArguments}' = \texttt{mArgumentsRest} + [\texttt{lhs} \;o_1\; \texttt{rhs}]
        \end{array} 
      $$
8. Otherwise, the operator on top of the token stack is pushed onto the operator stack.
   $$\begin{array}{lc}
         \texttt{mTokens} = \texttt{mTokensRest} + [o_2]           & \wedge \\
         \texttt{mOperators} = \texttt{mOperatorsRest} + [o_1]     & \wedge \\
         \neg \texttt{evalBefore}(o_1, o_2)                        & \Rightarrow \\[0.2cm]
        \texttt{mOperators}' = \texttt{mOperators} + [o_2]         & \wedge \\
        \texttt{mTokens}' = \texttt{mTokensRest}                   & \wedge \\
        \texttt{mArguments}' = \texttt{mArguments}
      \end{array} 
    $$
   
In every step of the evaluation phase we 
- remove one operator from the operator stack, 
- remove its arguments from the argument stack, 
- evaluate the operator, and 
- push the result back on the argument stack. 

In [16]:
def evaluate(self):
    while not self.mTokens.isEmpty():
        print(self) # only for debugging
        nextOp = self.mTokens.top(); self.mTokens.pop()
        if isinstance(nextOp, int):
            self.mArguments.push(nextOp)
            continue
        if self.mOperators.isEmpty():
            self.mOperators.push(nextOp)
            continue
        if nextOp == "(":
            self.mOperators.push(nextOp)
            continue
        stackOp = self.mOperators.top()
        if stackOp == "(" and nextOp == ")":
            self.mOperators.pop()
            continue
        if nextOp == ")":
            self.popAndEvaluate()
            self.mTokens.push(nextOp)
            continue
        if stackOp == '(':
            self.mOperators.push(nextOp)
            continue
        if evalBefore(stackOp, nextOp):
            self.popAndEvaluate()
            self.mTokens.push(nextOp)
        else:
            self.mOperators.push(nextOp)
    while not self.mOperators.isEmpty():
        print(self) # only for debugging
        self.popAndEvaluate()
    print(self)
    return self.mArguments.top()
    
Calculator.evaluate = evaluate
del evaluate

The method $\texttt{popAndEvaluate}(\texttt{self})$ removes the two topmost numbers $\texttt{rhs}$ and $\texttt{lhs}$ from the argument stack and 
removes the topmost operator $\texttt{op}$ from the operator stack.  It applies the operator $\texttt{op}$ to these numbers
by computing $\texttt{lhs} \;\texttt{op}\; \texttt{rhs}$
and then pushes this value back on the argument stack.

In [19]:
def popAndEvaluate(self):
    rhs = self.mArguments.top(); self.mArguments.pop()
    lhs = self.mArguments.top(); self.mArguments.pop()
    op  = self.mOperators.top(); self.mOperators.pop()
    result = None
    if op == '+':
        result = lhs + rhs
    if op == '-':
        result = lhs - rhs
    if op == '*':
        result = lhs * rhs
    if op == '/':
        result = lhs // rhs
    if op == '%':
        result = lhs % rhs
    if op == '**':
        result = lhs ** rhs
    assert result != None, f'ERROR: *** Unknown Operator *** "{op}"'
    self.mArguments.push(result)
    
Calculator.popAndEvaluate = popAndEvaluate
del popAndEvaluate

In [25]:
C = Calculator('1)*(3*(2+1)-4)**2')

-----
| 2 |
-----
----------
| 2 | ** |
----------
--------------
| 2 | ** | ) |
--------------
------------------
| 2 | ** | ) | 4 |
------------------
----------------------
| 2 | ** | ) | 4 | - |
----------------------
--------------------------
| 2 | ** | ) | 4 | - | ) |
--------------------------
------------------------------
| 2 | ** | ) | 4 | - | ) | 1 |
------------------------------
----------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + |
----------------------------------
--------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 |
--------------------------------------
------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( |
------------------------------------------
----------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( | * |
----------------------------------------------
--------------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( | * | 3 |
----------

In [26]:
C.evaluate()

__________________________________________________
Tokens:    
------------------------------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( | * | 3 | ( | * | ) | 1 |
------------------------------------------------------------------
Arguments: 
-
|
-
Operators: 
-
|
-
__________________________________________________
__________________________________________________
Tokens:    
--------------------------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( | * | 3 | ( | * | ) |
--------------------------------------------------------------
Arguments: 
-----
| 1 |
-----
Operators: 
-
|
-
__________________________________________________
__________________________________________________
Tokens:    
----------------------------------------------------------
| 2 | ** | ) | 4 | - | ) | 1 | + | 2 | ( | * | 3 | ( | * |
----------------------------------------------------------
Arguments: 
-----
| 1 |
-----
Operators: 
-----
| ) |
----

KeyError: ')'