# Parser 3.10 - Unary Plus

The goal for this version of the parser is to be able to recognize the unary plus operator.

There may be some question about what a unary plus operator should actually *do*. One possibility is for it to act as an *abs()* function, making negative values positive and leaving positive values alone. This would have the same inability to convert *intMin* to a positive value that arithmetic negation does, but we could handle that the same way we do for arithmetic negation.

Other choices include having unary plus force the 'usual arithmetic conversions' (as in [C](https://en.wikipedia.org/wiki/C_(programming_language)) \), convert strings representing numbers to numeric values but leave numeric values unchanged (as in [Javascript](https://en.wikipedia.org/wiki/JavaScript)), or simply be available as a placeholder for later definition in languages that support [operator overloading](https://en.wikipedia.org/wiki/Operator_overloading) (as in [C++](https://en.wikipedia.org/wiki/C%2B%2B)).

Since later we plan to use *abs()* as an example of a one-argument function, we'll make the simplest possible choice and define unary plus to be a null operator that does not do anything.

Actually we are more interested here in something else: because we already use the **'+'** character for binary addition, we'll need some way to distinguish between its unary and binary meanings in our parser and evaluator.

This is an example of what PJ Brown in his book [Writing Interactive Compilers and Interpreters](https://www.amazon.com/Writing-Interactive-Compilers-Interpreters-Computing/dp/0471100722) calls a *polymorphic operator*. This has not become a common term since the book was published, but I find it quite useful to mean symbols which can represent more than one operator.

Polymorphic operators are not uncommon in programming languages. In [BASIC](https://en.wikipedia.org/wiki/BASIC) the symbol **‘+’** is used for both arithmetic addition and string concatenation, and the **‘=’** symbol for both variable assignment and equality comparison. In [C](https://en.wikipedia.org/wiki/C_(programming_language)) the **‘\*’** symbol is used for both arithmetic multiplication and [pointer dereferencing](https://en.wikipedia.org/wiki/Dereference_operator), and the **‘&’** symbol for both [bitwise AND](https://en.wikipedia.org/wiki/Bitwise_operation#AND) and taking the address of a variable or function (eg., to initialize a pointer). Both BASIC and C use the **‘-‘** symbol to mean arithmetic negation as well as arithmetic subtraction.

An alternative to polymorphic operators is to use different symbols for different operations. Some versions of BASIC use **‘+’** for arithmetic addition and **'&’** for string concatenation. [Pascal](https://en.wikipedia.org/wiki/Pascal_(programming_language)) uses **‘:=’** for variable assignment and **‘=’** for equality comparison. C uses **‘=’** for variable assignment and **‘==’** for equality comparison.

Unique operators for different operations are in many ways helpful for parsing. On the other hand, there are only so many symbols in the [ASCII character set](https://en.wikipedia.org/wiki/ASCII) (a subset of [UTF-8](https://en.wikipedia.org/wiki/UTF-8)). The more operators a computer language supports, the harder it is to come up with unique representations.

The whole point of Reverse Polish notation is that there is absolutely no ambiguity about what to do. Every operator must be unique so no decision is necessary to determine how to handle it. Operators which are polymorphic in the original infix must be made unique in the Reverse Polish.

The only real consideration is that however this is done, the distinction between operands and operators is maintained. The evaluator – or whatever else might handle the finished Reverse Polish – has to be able to tell the difference.

We will use a simple decorating scheme by prefixing each unary operator with the letter *U*. Anything that starts with a letter cannot be a number, and so cannot be mistaken for a numeric operand. The Reverse Polish form of unary plus will be *U+* while that of arithmetic negation will be *U-*.

>At this point we don't actually need to distinguish the arithmetic negation operator from anything else, but it won’t hurt to do so.

## Libraries

In [None]:
import glob       # for searching directories

import re         # for regular exprssions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual separator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )
        
def TOstring(tag, this):
    
    if showTrace:
        TOshow( tag, ' '.join([str(e) for e in this]) )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res, stk):

    if ok:
        TOstring( 'Current RPN', res )
        TOstring( 'Operator Stack', stk )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )

def ETshoweval(ok, stk):
    
    if ok:
        TOstring( 'Operand Stack', stk )


# Common

In [None]:
intMax =  4294967295                # 2**32-1, for range checking
intMin = -4294967296                # -(2**32)

# Parser

In [None]:
versionNumber = '3.10'

# operands accepted:
# - decimal integer literals
# - hexadecimal integer literals

# operators accepted:
# - unary negation, plus
# - binary addition

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

def PEdoparse(this):
       
    # initialize
    
    expr = this                # save to new variable but retain original for error reports
    start = 15                 # tracked so we can report where in an expression an error occurred
    token = None               # anything successfully matched
    ok = wantoperand = True    # flags
    result = []                # rpn expression
    stk = [ ('EOE', 1) ]       # operator stack
               
    def parseErr(mesg):
        '''report parse error'''
        UIerror(mesg)
        UIwriteln(f'>>> {this}')
        UIwriteln(f'{"^^near here".rjust(start)}')
        return False
     
    def popGEop(prec):
        '''pop operators of equal or greater precedence'''
        while prec <= stk[-1][1]:
            result.append(stk.pop()[0])
            
    def pushLeft(op, prec):
        '''push left associative operator on stack'''
        popGEop(prec)
        stk.append( (op, prec) )
            
    def popGop(prec):
        '''pop operators of greater precedence'''
        while prec < stk[-1][1]:
            result.append(stk.pop()[0])
            
    def pushRight(op, prec):
        '''push right associative operator on stack'''
        popGop(prec)
        stk.append( (op, prec) )
            
    # convert unsigned literal to internal form
    # - all chars in input known to be legal hexadecimal characters
    # - checks that value is within range
    
    def convertUint(ulit, base):
        
        uint = 0
        
        # isolate the significant portion of 'ulit'
        # - this drops any leading prefix and all leading zeroes
        # - if search fails, then input is all zeroes (and so is value)
        
        p = re.search('[1-9A-F][0-9A-F]*', ulit.upper())
        
        if p != None:
            for digit in p.group():
                digval = '0123456789ABCDEF'.find(digit)
                if uint <= (intMax - digval)/base:
                    uint =  uint * base + digval
                else:
                    return parseErr(f'\'{ulit}\' is out of range')
        
        result.append(uint)
        return True
    
    # test if expression starts with given regular expression
    
    def startsWith(regex):
        
        nonlocal expr, start, token
        
        p = re.match(regex, expr)
        if p == None:
            return False
        else:
            token = p.group()              # what we matched
            start += len(token)            # update to next match position in original string
            expr = expr[len(token):]       # "chop off" what we matched
            PTshowtoken(token)             # trace
            return True
                 
    # top level main loop
    
    while ok and len(expr):
        
        # skip leading whitespace
        
        _ = startsWith('[ ]+')
            
        # trace
        
        PTshowexpr(expr)
            
        # look for operand
             
        if wantoperand:
            
            if startsWith('[-+]'):
                '''unary negation ?'''
                pushRight( 'U' + token, 80 )
                
            else:
        
                wantoperand = False         # no non-state changing token found
            
                if startsWith('0[xX][0-9a-fA-F]+'):
                    '''unsigned hexadecimal literal ?'''
                    ok = convertUint(token, 16)
    
                elif startsWith('[0-9]+'):
                    '''unsigned decimal literal ?'''
                    ok = convertUint(token, 10)
    
                else:
                    '''malformed'''
                    ok = parseErr('Expecting operand')
            
        # look for operator
        
        else:
            
            wantoperand = True
            
            if startsWith('[+]'):
                '''binary addition ?'''
                pushLeft( token, 60 )
                
            else:
                '''malformed'''
                ok = parseErr('Expecting operator')
            
        # trace
        
        PTshowparse(ok, result, stk )
        
    # at end must be in 'wantoperator' state
    # - ie., last token must be an operand
                    
    if ok and wantoperand:
        ok = parseErr('Unexpected end of expression')
        
    # clear operator stack
    
    if ok:
        popGEop( 3 )
                    
    # done
                    
    return (ok, result if ok else None)

### How it works

In the *wantoperand* state, we simply add a **'+'** character to the character class already consisting of just one **'-'** character. That's all we have to do to get the parser to recognize unary plus. Any match here will consist of a single instance of one or the other character, but not both, since the regular expression matches only one character at a time.

If we do have a match, the exact character we have is held in *token*. We don't know which one it is, but we don't care. We simply attach a *'U'* (for *U*nary) prefix and push it on the operator stack. Unary negation will also acquire a prefix because of this, but again we don't care. We'll have to adjust the evaluator to match, but that's easy to do.

In general we can extend this technique to any group of operators which share the same precedence.

# Evaluator

In [None]:
# operators handled:
# - unary negation, plus
# - binary addition

# errors detected:
# - out of range

# return tuple:
# - (True, result)
# - (False, None)

def EEdoeval(rpn):
    
    stk = []
    ok = True
    
    def inRange(ok, val):
        '''range check test result'''
        if ok:
            stk.append( val )
        else:
            UIerror( 'Evaluation result out of range' )
        return ok
    
    def unNeg():
        '''unary negation'''
        arg = stk.pop()
        return inRange( arg != intMin, -arg )                   # watch for the un-negatable
            
    def binAdd():
        '''binary addition'''
        rgt = stk.pop()
        lft = stk.pop()
        
        if lft >= 0:
            return inRange( rgt <= intMax - lft, lft+rgt )      # re-arranged to avoid overflow
        else:
            return inRange( rgt >= intMin - lft, lft+rgt )      # re-arranged to avoid underflow
            
    # main loop
        
    for v in rpn:
        
        ETshowtoken(v)
        
        if v == 'U-':         # unary negation ?
            ok = unNeg()
  
        elif v == '+':        # binary addition ?
            ok = binAdd()
            
        elif v != 'U+':       # it's probably an operand, but maybe not
            stk.append( v )
            
        if not ok:
            return ( False, None )
         
        ETshoweval( ok, stk )
            
    return ( True, stk.pop() )


### How it works

Because unary plus is simply a do-nothing operator supplied strictly for symmetry (and for experience dealing with polymorphism), we don't need to do anything with it. We'll simply ignore any we come across. The main thing we do need to worry about is distinguishing between binary and unary **'+'** symbols, which we already made possible by decorating the unary version during parsing.

## Running the parser

In [None]:
passCnt = failCnt = 0                       # most useful for test input files, but never any harm

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    UIshow( 'Parser', versionNumber )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser

def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    UIshow( 'Input', this )
    ok, res = PEdoparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = EEdoeval( res )
        if ok:
            UIshow( 'Final Eval', res )
    if ok:
        passCnt += 1
    else:
        failCnt += 1

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = versionNumber[:len(versionNumber)-3]
    tail = versionNumber[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):
    
    UIwriteln(f'Parser {versionNumber} vs {this[-12:-4]}')
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    currFn = makePath('pass', currNum())

    startUp(showBatch)
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    currFn = makePath('fail',currNum())

    startUp(showBatch)
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)               
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good()        # run current parser against its own pass test. Use good('1234') to run against specific pass test.

In [None]:
bad()         # run current parser against its own fail test. Use bad('5678') to run against specific fail test.

In [None]:
regress()     # run parser against all previous and current tests