# Parser 17.00 - Function Type Checking Revisited

We want to add functions which return string values to our set of example functions. They should be capable of accepting one or more arguments, some of which may be optional, just like our existing functions.

This requires some re-thinking. Up to now all our functions could implicitly rely on all their arguments and return values being numeric. This allowed some simplicities we can no longer take for granted. For example, we could type check each argument as soon as it was complete, because we knew they had to be numeric.

>In practice this meant only immediately adding any needed symbol de-reference operator.

We can't continue to do that if we don't know what the type of any particular argument is supposed to be. Obviously we're going to have to keep that information somewhere, and (equally obviously) that place is going to be the function dictionary *fncDispatch{}*.

But, given that we have this information, how are we going to use it? Trying to type check each argument as soon as it is complete (terminated either by a comma or the close parenthesis after the last argument) means we would have to keep track of the name of the function currently being parsed and which argument we have just parsed. Not an insurmountable difficulty, but complicated – and things only get worse when we contemplate nested functions.

No, we’re going to go in a different direction. Instead of type checking each argument as soon as it is complete, we are going to wait until the close parenthesis that terminates the function call. Only then will we type check all of a function's arguments at once.

This complicates the machinery of type checking. In this version of the parser we're only going to implement that machinery. Since we're not yet going to add any new functions, the net result will be a parser that does nothing more than the previous version. But it will work a little harder to do so.

## Libraries

In [None]:
import glob       # for searching directories
import math
import random     # for 'random()'

import re         # for regular expressions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual separator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )
        
def TOstring(tag, this):
    
    if showTrace:
        TOshow( tag, ' '.join([str(e) for e in this]) )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res, opStk, typStk):

    if ok:
        TOstring( 'Current RPN', res )
        TOstring( 'Operator Stack', opStk )
        TOstring( 'Type Stack', typStk )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )

def ETshoweval(stk):
    
    TOstring( 'Operand Stack', stk )


# Functions

In [None]:
def functionErr(nam, msg, val):
    UIerror( f'{nam}(): {msg} value: {val}')
    return ( False, None )

def fncAbs(val):
    '''absolute value of val'''
    return (True, abs(val))

def fncMax(args):
    '''max of two or more vals'''
    return (True, max(args))

def fncMin(args):
    '''min of two more more vals'''
    return (True, min(args))

def fncRand():
    '''random decimal'''
    return(True, random.random())

def fncRnd(val):
    '''rounded val'''
    if type(val) is not list:
        return (True, round(val))
    elif type(val[1]) is int:
        return (True, round(val[0], val[1]))
    else:
        return functionErr('ROUND', 'non-integer', val[1])

def fncSgn(val):
    '''sign of val'''
    return (True, 1 if val > 0 else -1 if val < 0 else 0 )

def fncSqt(val):
    '''square root of val'''
    if val >= 0:
        return (True, math.sqrt(val) )
    else:
        return functionErr('SQR', 'negative', val)
    
# known functions

fncDispatch = {
     'ABS': (fncAbs,  1, 1,    ['number', 'number']),
     'MAX': (fncMax,  2, None, ['number', 'number', 'number']),
     'MIN': (fncMin,  2, None, ['number', 'number', 'number']),
  'RANDOM': (fncRand, 0, 0,    ['number'] ),
   'ROUND': (fncRnd,  1, 2,    ['number', 'number', 'number']),
    'SIGN': (fncSgn,  1, 1,    ['number', 'number']),
    'SQRT': (fncSqt,  1, 1,    ['number', 'number'])
}


We introduce a list of types for each entry in *fncDispatch{}*. This are analogous to the type lists in *_typeChkLst{}* and serve the same purpose. The first (leftmost) entry in each list is the return type of the function. All other entries are argument types, checked in right-to-left order.

We deal with optional arguments dynamically within *typeCheck()*. We've retained the entries for minimum and maximum number of arguments to help with this.

>We could also get rid of them and instead rely on the length of the type check lists themselves to tell us this information. Plus flags on the list entries to indicate which are optional. But overall it's much simpler to leave them be.

# Parser

In [None]:
# operands accepted:
# - decimal and hexadecimal floating point literals
# - scalar and array numeric variables
# - numeric functions with zero or more arguments
# - string literals w/ escape sequences
# - scalar and array string variables

# operators accepted:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - grouping parentheses
# - logical not, equality, inequality
# - assignment and shortcut assignment
# - prefix and postfix increment and decrement
# - logical short circuit
# - ternary conditional
# - string concatenation, multiplication
# - string logical not, equality, inequality
# - string variables, assignment, shortcut assignment

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

class Parser(object):
    
    VERSIONNUMBER = '17.00'
    
    _FLTMAX =  4294967295                                  # 2**32-1
    _FLTMIN = -4294967296                                  # -(2**32)

    _expMax = {
        'P' : math.log2(_FLTMAX),                          # max base 2 exponent
        'E' : math.log10(_FLTMAX)                          # max base 10 exponent
    }

    _typeChkLst = {                                         # index is left-to-right, check right-to-left
        '?err': [  None,   'tererr'],
         'a2x': [  None,   'argsep'],
         'f2x': [  None,   'fncsym'],
         'n2_': [  None,   'number'],
         'n2n': ['number', 'number'],
         'v2n': ['number', 'numsym'],
        'n?2n': ['number', 'C?-op',  'number'],
        'nn2n': ['number', 'number', 'number'],
        'nt2t': [  None,   'toptyp', 'number'],
         't2t': [  None,   'toptyp'],
        'tm2t': [  None,   'match',  'toptyp'],
        'vt2t': [  None,     None,   'topsym'],
        'vn2n': ['number', 'numsym', 'number'],
        'vn2t': [  None,   'symtyp', 'number'],
        'vn2v': [  None,   'vartyp', 'number']
    }
    
    def __init__(self):
        pass
        
    def doparse(self, this):

        def parseErr(mesg, pos):
            '''report parse error'''
            UIerror(mesg)
            UIwriteln(f'>>> {this}')
            if pos > 0:
                UIwriteln(f'{"^^near here".rjust(pos)}')
            return False

        # initialize

        expr = this                # save to new variable but retain original for error reports
        start = 15                  # tracked so we can report where in an expression an error occurred
        token = None               # anything successfully matched
        ok = wantoperand = True    # flags
        result = []                # rpn expression
        opStk = [ ('EOE', 1) ]     # operator stack
        typStk = []                # type stack
        argStk = []                # function argument count stack
        

        def typeCheck(op, chk):
            '''type check operands'''
            nonlocal argStk
                        
            check = list(self._typeChkLst[chk])                    # list() to avoid aliasing
            
            while len(check) > 1:
                want = check.pop()
                
                # are we checking a function argument separator ?
                
                if want == 'argsep':
                    if len(opStk) < 2 or opStk[-2][0] != 'B(':     # within a function call ?
                        return parseErr('Unexpected comma', 0)
                    else:
                        argStk[-1] += 1                            # one more argument
                        result.pop()                               # remove 'F,' from RPN
                        continue                       
                        
                # are we going to check function argument types ?
                
                elif want == 'fncsym':

                    argcnt = argStk.pop()
                    have, rpnPos, errPos = typStk.pop(-argcnt - 1)

                    if have != 'numsym':
                        return parseErr('Function name expected', errPos)

                    fnc = result[rpnPos - 1]
                    if not fnc in fncDispatch:
                        return parseErr(f'Unknown function name: {fnc}', errPos)
                    
                    mina = fncDispatch[fnc][1]
                    maxa = fncDispatch[fnc][2]
                    if (argcnt < mina) or (maxa != None and argcnt > maxa):
                        return parseErr(f'Bad argument count: {fnc}', errPos)
                    elif argcnt > 1:
                        result.pop()                         # remove 'B('
                        result.append(argcnt)                # argument count
                        result.append('F()')                 # multiple argument function call operator
                         
                    check = list(fncDispatch[fnc][3])        # replace with argument(s) checklist
                    if maxa == None:                         # no maximum ?
                        while argcnt >= len(check):
                            check.append( check[1] )
                    elif argcnt < maxa:                      # optional arguments not supplied ?
                        check = check[:-(maxa-argcnt)]       # remove them from checklist
                        
                    continue
                    
               # non-function type check
                        
                have, rpnPos, errPos = typStk.pop()
                
                basetype = 'number' if have.find('num') >= 0 else 'string'
                                    
                # do we want type(s) based on whatever was on top of type stack ?
                
                if want == 'toptyp':
                    check[0] = want = basetype
                    
                elif want == 'topsym':
                    check[0] = want = basetype
                    check[1] = 'numsym' if basetype  == 'number' else 'strsym'
                                   
                elif want == 'symtyp':
                    check[0] = basetype
                    want = 'numsym' if basetype == 'number' else 'strsym'
                    
                elif want == 'vartyp':
                    check[0] = want = 'numsym' if basetype == 'number' else 'strsym'
                                      
                # do we want a type based on previous type ?
                    
                elif want == 'match':
                    want = check[0]
                    
                # no problem ?
                
                if want == have:
                    continue
                
                # did we want a number ?
                
                if want == 'number':
                    
                    if have == 'numsym':
                        '''convert numeric variables to values'''
                        result.insert( rpnPos, 'U*' )
                        
                    elif have.find('num') < 0:
                        return parseErr('Numeric value expected', errPos)
                        
                # did we want a string ?
                
                elif want == 'string':
                    
                    if have == 'strlit':
                        '''convert string literals to values'''
                        result.insert( rpnPos, 'U$')
                        
                    elif have == 'strsym':
                        result.insert( rpnPos, '$U*')
                        
                    elif have.find('str') < 0:
                        return parseErr('String value expected', errPos)
                    
                # did we want a variable ?
                
                elif want == 'numsym':
                    return parseErr('Numeric variable expected', errPos)
                
                elif want == 'strsym':
                    return parseErr('String variable expected', errPos)
                                           
                # did we want a right hand '?' ?
                
                elif want == 'C?-op':
                    if opStk.pop()[0] != 'C?-':
                        return parseErr('":" without "?"', errPos)
                        
                # '?' without ':' ?
                
                elif want == 'tererr':
                    return parseErr('"?" without ":"', errPos)
                    
                # generic everything else
                
                else:
                    return parseErr('Type mismatch', errPos )
                        
            # push result type, RPN operator position, original operator position
                    
            restyp = check.pop()
            if restyp != None:
                typStk.append( (restyp, len(result), errPos) )
                if restyp == 'string' and op != 'Final':
                    '''replace numeric op with string op'''
                    result.pop()
                    result.append( '$' + op )
            return True
 
        def addOperand(op, typ):
            '''add operand to RPN'''
            result.append( op )
            typStk.append( (typ, len(result), start) )
            
        def addOperator():
            '''add operator to RPN'''            
            op, _, chk = opStk.pop()
            result.append( op )
            return typeCheck( op, chk )
            
        def popGEop(prec):
            '''pop operators of equal or greater precedence'''
            ok = True
            while ok and prec <= opStk[-1][1]:
                ok = addOperator()
            return ok
 
        def pushLeft(op, prec, chk):
            '''push left associative operator on stack'''
            if not popGEop(prec):
                return False
            opStk.append( (op, prec, chk) )
            return True

        def popGop(prec):
            '''pop operators of greater precedence'''
            ok = True
            while ok and prec < opStk[-1][1]:
                ok = addOperator()
            return ok
 
        def pushRight(op, prec, chk):
            '''push right associative operator on stack'''
            if not popGop(prec):
                return False
            opStk.append( (op, prec, chk) )
            return True

        def popUntil(op, prec):
            '''clear and check operator stack'''
            if not popGEop(prec):                       # type check failure ?
                return False

            topop = opStk.pop()[0]                      # found match ?
            if op == topop:
                return True
            
            elif op == '(':
                err = 'right parenthesis'
            elif op == '[':
                err = 'right bracket'
            elif topop == '(':
                err = 'left parenthesis'
            elif topop == '[':
                err = 'left bracket'
            else:
                err = 'EOE'
            
            return parseErr( f'Unmatched {err}', start )
   
        # operator dictionaries initialization
        
        _postOps = '[+]{2}|[-]{2}|\(\)'
        
        _postOp = {
            '++': (90, pushLeft, 'v2n'),
            '--': (90, pushLeft, 'v2n'),
            '()': (90, pushLeft, 'f2x')
        }

        _unOps = '!|[+]{1,2}|[-]{1,2}'

        _unOp =  {
            '-':  (80, pushRight, 'n2n'),
            '+':  (80, pushRight, 'n2n'),
            '!':  (80, pushRight, 't2t'),
            '++': (80, pushRight, 'v2n'),
            '--': (80, pushRight, 'v2n'),
        }

        _binOps = '[-+*/]=?|[=!]?=|\[|\('

        _binOp = {
            '[':  (90, pushLeft,  'vn2v'),
            '(':  (90, pushLeft,  'f2x'),
            '*':  (70, pushLeft,  'nt2t'),
            '/':  (70, pushLeft,  'nn2n'),
            '+':  (60, pushLeft,  'tm2t'),
            '-':  (60, pushLeft,  'nn2n'),
            '==': (50, pushLeft,  'tm2t'),
            '!=': (50, pushLeft,  'tm2t'),
            '=':  (10, pushRight, 'vt2t'),
            '*=': (10, pushRight, 'vn2t'),
            '/=': (10, pushRight, 'vn2n'),
            '+=': (10, pushRight, 'vt2t'),
            '-=': (10, pushRight, 'vn2n')
        }
        
        _condOps = '&&|[|]{2}|[?:]'
        
        _condOp = {
            '&&': [(40, pushLeft, 'n2_'),  (40, pushLeft, 'n2n')],
            '||': [(30, pushLeft, 'n2_'),  (30, pushLeft, 'n2n')],
            '?':  [(28, pushRight, 'n2_'), (24, pushRight, '?err')],
            ':':  [(26, pushRight, 'n2n'), (24, pushRight, 'n?2n')]
        }
      
        def convertFloat(fplit, base, capgrp):
            '''convert floating point literal to internal form'''
            
            def rangeErr():
                return parseErr(f'\'{fplit}\' is out of range', start)
            
            # collect the features of interest
                   
            p = re.search(capgrp, fplit.upper())
                            
            lint, lfrc, expbas, expsgn, lexp = p.group(1,2,4,5,6)
            
            # convert integer portion (if any)
 
            uint = 0
            if lint:
                p = re.search('[1-9A-F][0-9A-F]*', lint )
                if p != None:
                    for ch in p.group():
                        digval = '0123456789ABCDEF'.find(ch)
                        if uint <= (self._FLTMAX - digval)/base:
                            uint =  uint * base + digval
                        else:
                            return rangeErr()
                    
            # convert fractional portion (if any)
                    
            ufrc = 0
            if lfrc:
                fbase = 1
                for ch in lfrc:
                    digval = '0123456789ABCDEF'.find(ch)
                    fbase *= base
                    ufrc += digval/fbase
        
            if uint == self._FLTMAX and ufrc != 0:
                return rangeErr()
            
            # value so far
            
            uflt = uint + ufrc
 
            # convert exponent portion (if any)
            
            uexp = 0
            if lexp:
                for ch in lexp:
                    digval = '0123456789'.find(ch)
                    if uexp <= (self._expMax[expbas] - digval)/10:
                        uexp =  uexp * 10 + digval
                    else:
                        return rangeErr()
                    
            # adjust value by exponent (if any)
             
            if uexp:
                power = (2 if expbas == 'P' else 10) ** uexp
                if expsgn == '-':
                    uflt /= power
                elif uflt <= self._FLTMAX/power:
                    uflt *= power
                else:
                    return rangeErr()
                    
            addOperand( uflt, 'numlit' )
            return True

        def startsWith(regex):
            '''test if expression starts with given regular expression'''
            nonlocal expr, start, token

            p = re.match(regex, expr)
            if p == None:
                return False
            else:
                token = p.group()              # what we matched
                start += len(token)            # update to next match position in original string
                expr = expr[len(token):]       # "chop off" what we matched
                PTshowtoken(token)             # trace
                return True

        # top level main loop
              
        while ok and len(expr):

            _ = startsWith('[ ]+')                             # skip leading whitespace

            PTshowexpr(expr)                                   # trace

            # look for operand

            if wantoperand:

                if startsWith('[(]'):
                    '''left parenthesis ?'''
                    opStk.append( ('(', 2) )

                elif startsWith(_unOps):
                    '''right unary?'''
                    prec, assoc, check = _unOp[token]
                    assoc( 'U' + token, prec, check )

                else:

                    wantoperand = False                         # flip
                    
                    if startsWith('0[xX]([0-9a-fA-F]+([.][0-9a-fA-F]*)?|[.][0-9a-fA-F]+)([pP][-+]?[0-9]+)?'):
                        '''unsigned hexadecimal literal ?'''
                        ok = convertFloat(token, 16, '0X([0-9A-F]*)[.]?([0-9A-F]*)(([P])([-+])?([0-9]+))?')

                    elif startsWith('([0-9]+([.][0-9]*)?|[.][0-9]+)([eE][-+]?[0-9]+)?'):
                        '''unsigned decimal literal?'''
                        ok = convertFloat(token, 10, '([0-9]*)[.]?([0-9]*)(([E])([-+])?([0-9]+))?')
                        
                    elif startsWith(r'[a-zA-Z][_a-zA-Z0-9]*(\$)?'):
                        '''numeric scalar variable or function name?'''
                        addOperand( token.upper(), 'strsym' if token[-1] == '$' else 'numsym' )
                        
                    elif startsWith(r'"([^"\\]|\\.)*"'):
                        '''string literal?'''
                        addOperand( token, 'strlit')
                        
                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operand', start)

            # look for operator

            else:

                if startsWith('[)\]]'):
                    '''expression terminator?'''
                    ok = popUntil( '(' if token == ')' else '[', 4 )
                        
                    
                elif startsWith(_postOps):
                    '''postfix operator?'''
                    prec, assoc, check = _postOp[token]
                    ok = assoc( 'P' + token, prec, check )
                    if ok and token == '()':
                        argStk.append( 0 )
 
                else:
                    
                    wantoperand = True                          # flip

                    if startsWith(_binOps):
                        '''binary operator?'''
                        prec, assoc, check = _binOp[token]
                        ok = assoc( 'B' + token, prec, check )
                        if ok and (token == '[' or token == '('):
                            opStk.append( (token, 2) )
                            if token == '(':
                                argStk.append( 1 )
                                
                    elif startsWith(_condOps):
                        '''conditional operator?'''
                        op = 'C' + token
                        precl, assocl, checkl = _condOp[token][0]
                        precr, assocr, checkr = _condOp[token][1]
                        ok = assocl( op + '+', precl, checkl) and assocr( op + '-', precr, checkr)

                    elif startsWith(','):
                        '''function argument separator?'''
                        ok = pushLeft( 'F,', 6, 'a2x' )

                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operator', start)

            PTshowparse(ok, result, opStk, typStk )               # trace
 
        if ok and wantoperand:
            ok = parseErr('Unexpected end of expression', start ) # must be in 'wantoperator' state   

        if ok:
            ok = popUntil( 'EOE', 3 )                             # clear operator stack
            
        if ok:
            ok = typeCheck('Final', 't2t')                        # final type check

        return (ok, result if ok else None)                       # done
       

### How it works

In previous versions of the parser, the index and type check list for a comma was:

```Python
        'an2_': [  None,   'number', 'argsep' ]
```

The key points were a check that it appeared within the context of a function call, a check that the argument was a number, and putting nothing back on the type stack. So an entry got removed from the type stack but nothing replaced it. Which in turn meant that when it came time to start taking a closer look at the actual function name, nothing related to any argument expression remained on the type stack. We could start by testing that the top remaining element was a *numsym*, and bail out immediately if it wasn't.

In this version the test is:

```Python
         'a2x': [  None,   'argsep']
```

Which simply checks to make sure commas only appear within function calls. The result type of the expression the comma terminated is on top of the type stack, but no check of it is made, nor is it removed.

Recall that the RPN location of an operand is stacked along with its type. In the case of a function call the operand is a function name. We need to recover both it and its position as part of the type checking. But if now we never clear the type stack while collecting function arguments, the particular entry we're looking for is going to be buried under all of them. For a function with *N* arguments, *typStk[]* looks something like:

- (type(N), position(N))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<- type and RPN position of last (rightmost) function argument
- (type(N-1), position(N-1))
-  ...
- (type(2), position(2))
- (type(1), position(1))
- (type(name),position(name))&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<- type and RPN position of function name

How do we get at it? Fortunately the top element of *argStk* has been recording the number of arguments actually provided to each function all along. That's our *N*. We pop that value and use it as an index to find the exact element in *typStk[]* we want. We pop and unpack it into *have*, *rpnPos* and *errPos*. Then we proceed with the same function call checks we used in earlier versions.

>Note that we actually remove that tuple from *typStk[]*, not merely make copies of its contents. We have to do that anyway at some point. It seems simpler to do that now rather than dealing with it a second time after removing all the type entries above it. We prevent the 'hole' left by the out-of-order pop from causing trouble later by moving the whole *fncsym* check to the top of the main loop. This allows us to treat this particular pop as a slightly special case distinct from all the checks that follow it.

After completing the same checks and adjustments to the Reverse Polish we did before, we copy the new type check list from the relevant *fncDispatch{}* entry into *check*, our loop control variable. This will prevent the loop from terminating and instead initiate type checking of the function arguments.

The basic idea is that the list *check* now contains should have one type check for every argument remaining on the type stack, plus one result type. And for functions with a fixed number of arguments, that's true (it must be so if we've made it this far).

But if there are more type checks than arguments, some optional arguments were not supplied. We force those checks to be skipped by simply chopping them off the end of the check list.

>Note we are treating all function arguments as *positional*. All required arguments come first (the minimum number), followed by any optional (up to the maximum number). In the check list, checks are listed in the same order. So checks for optional arguments are at the end of the list.
>
>Function arguments do not *have* to be positional. [Python](https://www.python.org/) and [PHP](https://www.php.net/manual/en/intro-whatis.php) also provide [named parameters](https://en.wikipedia.org/wiki/Named_parameter) as an alternative.

And if there are more arguments than type checks, we duplicate the last type check until we have enough of them.

>This is based on a number of assumptions, chief of which is that only *MAX()* and *MIN()* will ever need to do this. More thought will have to be given if additional functions can have extra arguments which cannot be treated uniformly are introduced.

Since the return type of each function is included in the check list we retrieve from *fncDispatch{}*, we don't need to specify it in *_typeChkLst{{}*. In fact we can replace both the *f2n* and *fn2n* entries with *f2x*:

```Python
         'f2x': [  None,   'fncsym']
```

**None** in this case is simply a placeholder. It guarantees *check* has a length greater than one, so the main loop is entered. If all goes well, the *fncsym* check will replace *check* with the "real" list of type checks.

>So we have successfully made type checking functions more complicated with not much yet to show for it. In the next version we'll try to make all this worthwhile.

# Evaluator

In [None]:
# operators handled:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - numeric and string logical negation, equality and inequality
# - variable name de-reference
# - variable assignment and shortcut assignment
# - prefix and postfix increment and decrement
# - numeric array assignment and de-reference
# - function calls with one or more arguments
# - string literals w/ escape sequences
# - string addition, multiplication
# - string variable assignment and de-reference

# errors detected:
# - out of range
# - division by zero
# - invalid function arguments

# return tuple:
# - (True, result)
# - (False, None)

class Evaluator(Parser):
    
    def __init__(self):
        self._symTable = dict()
 
    def doeval(self, rpn):
        
        def evalErr(mesg):
            '''report evaluation time error'''
            UIerror(mesg)
            return False
         
        def pushOperand(val):
            '''push operand on stack'''
            stk.append( val )
            return True
 
        def inRange(ok, val):
            '''range check test result'''
            return pushOperand(val) if ok else evalErr( 'Evaluation result out of range' )
        
        def setSkip(down, up):
            '''set skip flags'''
            nonlocal skipLevel, downToken, upToken
            skipLevel = 1
            downToken = down
            upToken = up
            
        def checkSkip(skip, val, down, up):
            if skip:
                setSkip(down, up)
                stk.append(val)
            return True
                
        def logLftAnd(val):
            '''left branch of logical AND'''
            return checkSkip(val == 0, 0, 'C&&-', 'C&&+')
        
        def logLftOr(val):
            '''left branch of logical OR'''
            return checkSkip(val != 0, 1, 'C||-', 'C||+')
                
        def logRgt(val):
            '''right branch of logical AND and OR'''
            return pushOperand(1 if val != 0 else 0)

        def terCond(val):
            '''ternary condition'''
            if not val:
                setSkip('C:+', 'C?+')
            return True
        
        def terTrue(val):
            '''end of ternary true branch'''
            return checkSkip(True, val, 'C:-', 'C:+')
        
#        def terFalse(val):
#           '''end of ternary false branch'''
#           return pushOperand(val)
        
        def binAryNdx(ndx, nam):
            '''create array index'''
            i = int(ndx)
            return inRange(0 <= ndx <= self._FLTMAX, f'{nam}_{i}') if i == ndx else evalErr('Non-integer index')
                     
        def binVal(val, var):
            '''assign value to variable'''
            self._symTable[var] = val
            return pushOperand(val)
        
        def unVal(var):
            '''numeric variable name value'''
            return pushOperand(self._symTable[var] if var in self._symTable else 0)
        
        def unStrVal(var):
            '''string variable name value'''
            return pushOperand(self._symTable[var] if var in self._symTable else '')
        
        def unStr(val):
            '''string literal value'''
            
            def doMnem(matchObj):
                ch = matchObj.group()[1]
                return '\n' if ch == 'n' else '\t' if ch == 't' else ch

            def doHex(matchObj):
                p = matchObj.group()
                h = int(p[2:], 16)
                return chr(h & 0xFF if p[1] == 'x' else h)
                     
            s = val[1:-1]
            t = re.sub(r'\\x[0-9a-fA-F]{1,8}', doHex, s)
            u = re.sub(r'\\u[0-9a-fA-F]{4}', doHex, t)
            v = re.sub(r'\\.', doMnem, u)
            return pushOperand(v)

#        def unPlu(arg):
#           '''unary plus'''
#           return pushOperand( arg )

        def unNeg(arg):
            '''unary negation'''
            return inRange( arg != self._FLTMIN, -arg )
        
        def unNot(arg):
            '''logical not'''
            return pushOperand( not arg )

        def binAdd(rgt, lft):
            '''binary addition'''
            if lft >= 0:
                return inRange( rgt <= self._FLTMAX - lft, lft+rgt )       
            else:
                return inRange( rgt >= self._FLTMIN - lft, lft+rgt )
 
        def binSub(rgt, lft):
            '''binary subtraction'''
            if lft >= 0:
                return inRange( lft - self._FLTMAX <= rgt, lft-rgt )
            elif rgt >= 0:
                return inRange( lft - self._FLTMIN >= rgt, lft-rgt )
            else:
                return inRange( lft <= self._FLTMAX + rgt, lft-rgt )
 
        def binMul(rgt, lft):
            '''binary multiplication'''
            if abs(lft) <= 1 or abs(rgt) <= 1:
                return pushOperand( lft * rgt )

            elif lft > 0:
                if rgt > 0:
                    return inRange( rgt <= self._FLTMAX / lft, lft * rgt )
                else:
                    return inRange( rgt >= self._FLTMIN / lft, lft * rgt )

            elif rgt > 0:
                return inRange( rgt <= self._FLTMIN / lft, lft * rgt )
  
            else:
                return inRange( rgt >= self._FLTMAX / lft, lft * rgt )
 
        def binDiv(rgt, lft):
            '''binary division'''
            if abs(rgt) >= 1:
                return pushOperand( lft/rgt )
            
            elif rgt > 0:
                if lft > 0 :
                    return inRange( lft <= self._FLTMAX * rgt, lft / rgt )
                else:
                    return inRange( lft >= self._FLTMIN * rgt, lft / rgt )
            
            elif rgt < 0:
                if lft > 0:
                    return inRange( lft <= self._FLTMIN * rgt, lft / rgt )
                else:
                    return inRange( lft >= self._FLTMAX * rgt, lft / rgt )

            else:
                return evalErr( 'Division by zero' )
            
        def binEqu(rgt, lft):
            '''logical equality'''
            return pushOperand( lft == rgt )
        
        def binNeq(rgt, lft):
            '''logical inequality'''
            return pushOperand( lft != rgt )
        
        def binShortVal(val, var, op):
            '''shortcut assignment'''
            _ = unVal(var)                           # put the value of the variable on the stack
            if op(val, stk.pop()):                   # perform the arithmetic
                return binVal(stk.pop(), var)        # if successful,  also assign result to variable
            return False
        
        def binAddVal(val, var):
            return binShortVal(val, var, binAdd)
        
        def binSubVal(val, var):
            return binShortVal(val, var, binSub)
        
        def binMulVal(val, var):
            return binShortVal(val, var, binMul)
        
        def binDivVal(val, var):
            return binShortVal(val, var, binDiv)
        
        def unPfxInc(var):
            return binShortVal(1, var, binAdd)
        
        def unPfxDec(var):
            return binShortVal(1, var, binSub)
        
        def unPostFix(val, var):
            '''postfix inc/dec'''
            _ = unVal(var)                             # push current value of variable on stack
            ok = binShortVal(val, var, binAdd)         # update value of variable
            stk.pop()                                  # remove updated value from stack (if error, removes first push)
            return ok
                
        def unPstInc(var):
            return unPostFix(1, var)
            
        def unPstDec(var):
            return unPostFix(-1, var)
        
        def binFnc(val, fnc):
            '''single argument function call'''
            ok, val = fncDispatch[fnc][0](val)
            return pushOperand(val) if ok else False
                
        def multiFnc(cnt):
            '''multiple argument function call'''
            nonlocal stk
            args = stk[-cnt:]
            stk = stk[:-cnt]
            ok, val = fncDispatch[stk.pop()][0](args)               
            return pushOperand(val) if ok else False
        
        def zeroFnc(name):
            '''zero argument function call'''
            ok, val = fncDispatch[name][0]()
            return pushOperand(val) if ok else False
        
        def strAdd(srgt, slft):
            '''string concatenation'''
            return pushOperand(slft + srgt)
        
        def strMul(cnt, strng):
            '''string repeat'''
            cnt = math.floor(cnt)
            res = ''
            while cnt > 0:
                if cnt & 0x01:
                    res += strng
                cnt >>= 1                  # or 'cnt //= 2'
                strng *= 2
            return pushOperand(res)
        
        def strAddVal(val, var):
            '''shortcut string concatenation'''
            _ = unStrVal(var)
            return binVal(stk.pop() + val, var)
        
        def strMulVal(cnt, var):
            '''shortcut string replication'''
            _ = unStrVal(var)
            _ = strMul(cnt, stk.pop())
            return binVal(stk.pop(), var)
        
        # initialize
                                
        unDispatch = {
            'U*': unVal, 'U$': unStr, '$U*': unStrVal,
            'U-': unNeg, 'U+': pushOperand,
            'U!': unNot, '$U!': unNot,
            'U++': unPfxInc,   'U--': unPfxDec,
            'P++': unPstInc,   'P--': unPstDec,
            'F()': multiFnc,   'P()': zeroFnc,
            'C&&+': logLftAnd, 'C&&-': logRgt,
            'C||+': logLftOr,  'C||-': logRgt,
            'C?+': terCond,
            'C:+': terTrue, 'C:-': pushOperand
        }
        
        binDispatch = {
            'B+': binAdd, 'B-': binSub,
            'B+=': binAddVal, 'B-=': binSubVal,
            'B*': binMul, 'B/': binDiv,
            'B*=': binMulVal, 'B/=': binDivVal,
            'B==': binEqu, 'B!=': binNeq,
            'B=': binVal, '$B=': binVal,
            'B[': binAryNdx, 'B(': binFnc,
            '$B+': strAdd, '$B*': strMul,
            '$B==': binEqu, '$B!=': binNeq,
            '$B+=': strAddVal, '$B*=': strMulVal
        }
  
        skipLevel = 0
        downToken = upToken = None
        stk = []
        ok = True

        # main loop
        
        TOstring('Input', rpn)
        
        for v in rpn:

            ETshowtoken(v)

            if skipLevel > 0:
                TOshow('Skip level', f'{skipLevel}')
                if v == downToken:
                    skipLevel -= 1
                elif v == upToken:
                    skipLevel += 1
            
            elif v in binDispatch:
                ok = binDispatch[v](stk.pop(), stk.pop())
                
            elif v in unDispatch:
                ok = unDispatch[v](stk.pop())
                
            else:
                stk.append( v )

            if not ok:
                return ( False, None )

            ETshoweval( stk )

        return ( True, stk.pop() )


### How it works

We change *multiFnc()* to use slice notation to retrieve function arguments. This may be more efficient than retrieving them one at a time. More importantly, passed arguments are in the same order and position they are in the original infix. This order is stable whether or not optional arguments are present.

>That is, the first required argument is always in *args[0]*, the second in *args[1]*, and so on.

## Running the parser

In [None]:
passCnt = failCnt = 0                          # most useful for test input files, but never any harm

myParser = myEvaluator = None                  # where we keep instances of our classes

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    global myParser, myEvaluator
    if not myParser:
        myParser = Parser()
    if not myEvaluator:
        myEvaluator = Evaluator()
    UIshow( 'Parser', myParser.VERSIONNUMBER )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser
        
def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    neg = this[0] == '@'
    if neg:
        this = this[1:]
    UIshow( 'Input', this )
    ok, res = myParser.doparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = myEvaluator.doeval( res )
        if ok:
            UIshow( 'Final Eval', res )
        if neg:
            ok = not ok
    if ok:
        passCnt += 1
    else:
        failCnt += 1

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = myParser.VERSIONNUMBER[:len(myParser.VERSIONNUMBER)-3]
    tail = myParser.VERSIONNUMBER[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):
    
    UIwriteln(f'Parser {myParser.VERSIONNUMBER} vs {this[-12:-4]}')
    
    myEvaluator._symTable.clear()
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    startUp(showBatch)                       # must create objects before we can access variables inside them 
    currFn = makePath('pass', currNum())
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    startUp(showBatch)
    currFn = makePath('fail',currNum())
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)                
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good('1120')  # run current parser against its own pass test. Use good('1234') to run against specific pass test.

In [None]:
bad('1120')   # run current parser against its own fail test. Use bad('5678') to run against specific fail test.

In [None]:
regress()     # run parser against all previous tests