# Parser 5.00 - Grouping Parentheses

The goal of this version of the parser is to be able to alter the default order of evaluation by using grouping parentheses.
Whatever part of an expression they enclose should be performed first. Sets of parentheses should also be capable of nesting, and inner enclosed expressions evaluated before outer ones.

Grouping parentheses come in pairs, left and right. It is an error to encounter one without the other or in the wrong order.

### Parser

One approach to implementing grouping parentheses would be to use recursion. The idea would be that encountering a left (or open) parenthesis would cause *PEdoparse()* to call itself with the remaining expression as its argument. This would put a new *EOE* on the operator stack, overlaying and “hiding” the original one.

A right (or close) parenthesis would be treated as an end of expression. This would clear off the *EOE* put there by the recursive call, and the return from it would resume processing whatever is still left of the original expression.

That’s the overview, anyway. The implementation might be a bit tricky. For one thing, calling *PEdoparse()* at present initializes the stack to empty, which is not what we would want.

This is not an insurmountable problem, but there is an altogether easier way. We can treat left and right parentheses as low precedence operators.

The only slight difficulty is that a left parenthesis, considered as an operator, can legally appear only in the *wantoperand* state. A right parenthesis can legally appear only in the *wantoperator* state. Encountering either should not change the current state. After finding a left parenthesis we still want an operand, and after a right we still want an operator.

### Evaluator

We don't need to make any changes at all. Any grouping parentheses found are discarded by the parser and never present in the final parse result.

## Libraries

In [None]:
import glob       # for searching directories

import re         # for regular exprssions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual separator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )
        
def TOstring(tag, this):
    
    if showTrace:
        TOshow( tag, ' '.join([str(e) for e in this]) )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res, stk):

    if ok:
        TOstring( 'Current RPN', res )
        TOstring( 'Operator Stack', stk )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )

def ETshoweval(ok, stk):
    
    if ok:
        TOstring( 'Operand Stack', stk )


# Common

In [None]:
intMax =  4294967295                # 2**32-1, for range checking
intMin = -4294967296                # -(2**32)

# Parser

In [None]:
versionNumber = '5.00'

# operands accepted:
# - decimal integer literals
# - hexadecimal integer literals

# operators accepted:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - grouping parentheses

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

def PEdoparse(this):
       
    # initialize
    
    expr = this                # save to new variable but retain original for error reports
    start = 15                 # tracked so we can report where in an expression an error occurred
    token = None               # anything successfully matched
    ok = wantoperand = True    # flags
    result = []                # rpn expression
    stk = [ ('EOE', 1) ]       # operator stack
               
    def parseErr(mesg):
        '''report parse error'''
        UIerror(mesg)
        UIwriteln(f'>>> {this}')
        UIwriteln(f'{"^^near here".rjust(start)}')
        return False
     
    def popGEop(prec):
        '''pop operators of equal or greater precedence'''
        while prec <= stk[-1][1]:
            result.append(stk.pop()[0])
            
    def pushLeft(op, prec):
        '''push left associative operator on stack'''
        popGEop(prec)
        stk.append( (op, prec) )
            
    def popGop(prec):
        '''pop operators of greater precedence'''
        while prec < stk[-1][1]:
            result.append(stk.pop()[0])
            
    def pushRight(op, prec):
        '''push right associative operator on stack'''
        popGop(prec)
        stk.append( (op, prec) )
        
    # clear operators off stack until reaching target operator
    
    def popUntil(op, prec):
        '''clear and check operator stack'''
        popGEop(prec)
        if op == stk.pop()[0]:      # top remaining operator is the one we want to see ?
            return True             # yes...
        
        # ...no
        
        elif op == '(':
            return parseErr('Unmatched right parenthesis')
        elif op == 'EOE':
            return parseErr('Unmatched left parenthesis')
                  
    # convert unsigned literal to internal form
    
    def convertUint(ulit, base):
        
        uint = 0
        
        # isolate the significant portion of 'ulit'
        
        p = re.search('[1-9A-F][0-9A-F]*', ulit.upper())
        
        if p != None:
            for digit in p.group():
                digval = '0123456789ABCDEF'.find(digit)
                if uint <= (intMax - digval)/base:
                    uint =  uint * base + digval
                else:
                    return parseErr(f'\'{ulit}\' is out of range')
        
        result.append(uint)
        return True
    
    # test if expression starts with given regular expression
    
    def startsWith(regex):
        
        nonlocal expr, start, token
        
        p = re.match(regex, expr)
        if p == None:
            return False
        else:
            token = p.group()              # what we matched
            start += len(token)            # update to next match position in original string
            expr = expr[len(token):]       # "chop off" what we matched
            PTshowtoken(token)             # trace
            return True
                 
    # top level main loop
    
    while ok and len(expr):
        
        _ = startsWith('[ ]+')                             # skip leading whitespace
            
        PTshowexpr(expr)                                   # trace
            
        # look for operand
             
        if wantoperand:
            
            if startsWith('[(]'):
                '''left parenthesis ?'''
                stk.append( ('(', 2) )                     # push directly on stack
                
            
            elif startsWith('[-+]'):
                '''unary negation or plus ?'''
                pushRight( 'U' + token, 80 )                # decorate
                
            else:
        
                wantoperand = False                         # flip
            
                if startsWith('0[xX][0-9a-fA-F]+'):
                    '''unsigned hexadecimal literal ?'''
                    ok = convertUint(token, 16)
    
                elif startsWith('[0-9]+'):
                    '''unsigned decimal literal ?'''
                    ok = convertUint(token, 10)
    
                else:
                    '''malformed'''
                    ok = parseErr('Expecting operand')
            
        # look for operator
        
        else:
            
            if startsWith('[)]'):
                ok = popUntil( '(', 4 )
                
            else:
            
                wantoperand = True                                # flip
            
                if startsWith('[*/]'):
                    '''binary multiplication or division ?'''
                    pushLeft( token, 70 )
            
                elif startsWith('[-+]'):
                    '''binary addition or subtraction ?'''
                    pushLeft( 'B' + token, 60 )                   # decorate
                
                else:
                    '''malformed'''
                    ok = parseErr('Expecting operator')
            
        PTshowparse(ok, result, stk )                         # trace
        
    if ok:
        if wantoperand:
            ok = parseErr('Unexpected end of expression')     # must be in 'wantoperator' state   
        else:
            ok = popUntil( 'EOE', 3 )                         # clear operator stack
                   
    return (ok, result if ok else None)                       # done

### How it works

When a left parenthesis is found it is immediately pushed on the operator stack without taking anything off first. Its precedence is deliberately lower than any operator found in the original infix, which prevents any of them from popping it off. Thus anything that was already on the operator stack must stay there until the left parenthesis is removed.

>Note the left and right parentheses are also regular expression metacharacters, so we again use the character class 'trick' rather than escaping them to 'turn off' that interpretation. Because of the way Python (and many other programming languages) process strings multiple times before finally using them, it can be irritatably difficult to determine how many sequential escape characters are needed to accomplish the task.

Removing them is job of the right parenthesis. It has a slightly higher precedence, but still below anything in the original infix. It clears the operator stack down to what is supposed to be a left parenthesis, then discards both itself and whatever stopped it. Thus anything that appears between the two parentheses is placed into the Reverse Polish before anything that does not.

This is similar enough to the stack clearing we already do after reaching the end of an expression that we introduce a new function, *popUntil()*, to handle both cases.

To recap, we have set the precedences of these four operators to:

4) Right parenthesis

3) Clear stack

2) Left parenthesis

1) EOE

*popUntil()* clears the operator stack as far as it can, then checks and discards the topmost remaining operator. Nothing new is placed on the stack.

>We always want to discard a left parenthesis whenever we find one. It doesn’t matter if *EOE* is discarded as well, since we’re done anyway.

*EOE* has the lowest precedence of all, and a left parenthesis just above it. Both are lower than the precedences used to clear the stack, so either will stop the process.

If we find a right parenthesis in the infix, we eventually want to find a left parenthesis on the operator stack. If there is no left parenthesis we’ll find *EOE* instead, and we have a right without a left. After the end of the infix is reached we want to find *EOE*, but if we find a left parenthesis instead we have a left without a right. Since we know what we're looking for, we also know what we didn't find, so we can report the appropriate error.

>if you object that there is no reason to use four precedences for all of this when two will do – one for operators we push on the stack to protect it and one for operators we use to remove that protection – well, you’re right. Picky, picky, picky…

# Evaluator

In [None]:
# operators handled:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division

# errors detected:
# - out of range
# - division by zero

# return tuple:
# - (True, result)
# - (False, None)

def EEdoeval(rpn):
    
    stk = []
    ok = True
    
    def inRange(ok, val):
        '''range check test result'''
        if ok:
            stk.append( val )
        else:
            UIerror( 'Evaluation result out of range' )
        return ok
    
    def unNeg():
        '''unary negation'''
        arg = stk.pop()
        return inRange( arg != intMin, -arg )
            
    def binAdd():
        '''binary addition'''
        rgt = stk.pop()
        lft = stk.pop()
        
        if lft >= 0:
            return inRange( rgt <= intMax - lft, lft+rgt )       
        else:
            return inRange( rgt >= intMin - lft, lft+rgt )
        
    def binSub():
        '''binary subtraction'''
        rgt = stk.pop()
        lft = stk.pop()
        
        if lft >= 0:
            return inRange( lft - intMax <= rgt, lft-rgt )
        else:
            return inRange( lft - intMin >= rgt, lft-rgt )
        
    def binMul():
        '''binary multiplication'''
        rgt = stk.pop()
        lft = stk.pop()
        
        if lft == 0 or rgt == 0:
            return inRange( True, 0 )
  
        if lft > 0:
            if rgt > 0:
                return inRange( rgt <= intMax / lft, lft*rgt )
            else:
                return inRange( rgt >= intMin / lft, lft*rgt )

        else:
            if rgt > 0:
                return inRange( rgt <= intMin / lft, lft * rgt )
            else:
                return inRange( rgt >= intMax / lft, lft * rgt )
            
    def binDiv():
        '''binary division'''
        rgt = stk.pop()
        lft = stk.pop()
        
        if rgt != 0:
            return inRange( True, lft//rgt )      # floored division so result is an integer
        else:
            UIerror( 'Division by zero' )
            return False
        
    # main loop
        
    for v in rpn:
        
        ETshowtoken(v)
        
        if v == 'U-':          # unary negation ?
            ok = unNeg()
            
        elif v == 'B+':        # binary addition ?
            ok = binAdd()
            
        elif v == 'B-':        # binary subtraction ?
            ok = binSub()
            
        elif v == '*':         # binary multiplication ?
            ok = binMul()
            
        elif v == '/':         # vinary division ?
            ok = binDiv()
            
        elif v != 'U+':        # it's probably an operand
            stk.append( v ) 
            
        if not ok:
            return (False, None)
         
        ETshoweval( ok, stk )
            
    return ( True, stk.pop() )


## Running the parser

In [None]:
passCnt = failCnt = 0                       # most useful for test input files, but never any harm

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    UIshow( 'Parser', versionNumber )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser

def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    UIshow( 'Input', this )
    ok, res = PEdoparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = EEdoeval( res )
        if ok:
            UIshow( 'Final Eval', res )
    if ok:
        passCnt += 1
    else:
        failCnt += 1

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = versionNumber[:len(versionNumber)-3]
    tail = versionNumber[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):
    
    UIwriteln(f'Parser {versionNumber} vs {this[-12:-4]}')
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    currFn = makePath('pass', currNum())

    startUp(showBatch)
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    currFn = makePath('fail',currNum())

    startUp(showBatch)
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)               
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good()        # run current parser against its own pass test. Use good('1234') to run against specific pass test.

In [None]:
bad()         # run current parser against its own fail test. Use bad('5678') to run against specific fail test.

In [None]:
regress()     # run parser against all previous and current tests