# Parser 9.10 - Assignment Operator

The goal of this version of the parser is to teach it to associate arbitrary numeric values with the symbol names introduced in the previous version.

This brings up a difficulty that has not been an issue prior to this point. So far every operand has either been a numeric literal or represented the number zero. There is no problem combining these any way we like in any expression the parser can handle.

But it is not possible to assign an arbitrary value to a numeric literal. Literals represent constant values that cannot be changed. Therefore the parser must verify that the type of operand being assigned to is a symbol name, and report an error if it is not.

### Type Checking

To digress for a moment, it was an idea about type checking as the solution to the problem of polymorphic operators that originally spurred this project. Take for example the [BASIC](https://en.wikipedia.org/wiki/BASIC) operator **‘+’**. Applied to numeric values, it means arithmetic addition. Applied to strings, it means string concatenation.

Now consider the incomplete expression:

```Python
"abc" > "def" +
```

If the next operand is a string, the **‘+’** should stand for string concatenation and the expression parsed as if written:

```Python
"abc" > ( "def" + "ghi" )
```

If the next operand is a number, the **‘+’** should stand for arithmetic addition and the expression parsed as if written:

```Python
( "abc" > "def" ) + 1
```

When the parser first encounters **‘+’** in the incomplete expression it has no idea what the next operand is going to be. But the conversion to Reverse Polish is meant to remove all ambiguity regarding how to evaluate the expression. How can the parser decide which internal operator to push on the operator stack, the one for arithmetic addition or the one for string concatenation?

While musing on this difficulty it occurred to me that a decision about how to resolve a polymorphic operator could be deferred. There’s no reason that the operator the parser puts into the Reverse Polish has to be the same one it pops from the operator stack. If there was a way for the parser to know the types of the operands a polymorphic operator was going to be applied to, it could substitute a monomorphic Reverse Polish operator at that point.

>In retrospect the whole issue of how to resolve **'+'** was never really a problem in the first place. The higher precedence of **‘+’** over **‘>’** means that it will always be applied first. Since either way **‘+’** requires the same operand type on both sides, a string on one side and a number on the other is always an error.

If the types cannot be used with a popped operator, the parser can signal an error. Which is what we will do if the left side of an assignment operator is not a symbol name.

>This is the real reason for introducing type checking now. Since we only have numeric types, the parser does not yet have to deal with that type of polymorphism. Replacing ambiguous operators with explicit Reverse Polish operators will come later.

### Parser

The parser’s basic approach to type checking will be to perform a kind of trial evaluation. When operands in the input expression are placed into the Reverse Polish, they'll also have their types pushed onto a type stack. When operators in the input expression are popped off the operator stack, the type stack will also popped. A check will be made that types match what the operator can act on. If yes, the result type will be pushed back onto the type stack.

This mechanism allows us to alter both operators and operands to whatever we need them to be. Polymorphic operators can be made monomorphic. Operands can be promoted to any necessary type by implicit type conversions.

The previous version of the parser introduced the only implicit type conversion so far. The *U\**’ symbol de-reference operator converts a symbol name to the value it represents. Aside from the fact that it never appears in the original infix but does in the Reverse Polish, it behaved like any other operator with respect to operator stacking.

We could do that because it was possible to treat every symbol name in the same way. That will no longer do. In an expression like:

```Python
a = b
```

we want the *value* of **b** assigned to the *name* **a**. In other words, we want to de-reference **b** but not **a**.

However we’re not going to know how either **a** or **b** is going to be used until the operator applied to them is popped from the operator stack. By that time they will both be in the Reverse Polish. Hence whether we always or never add the de-reference operator after symbol names at the time we put them in the Reverse Polish, we need a way to find them again later in case we have to change whatever it is we did.

>In practice it seems much easier to add *U\** if we need it than to take it away if we don't. So that's what we'll do.

## Libraries

In [None]:
import glob       # for searching directories
import math       # for 'log--()' functions

import re         # for regular expressions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual separator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )
        
def TOstring(tag, this):
    
    if showTrace:
        TOshow( tag, ' '.join([str(e) for e in this]) )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res, opStk, typStk):

    if ok:
        TOstring( 'Current RPN', res )
        TOstring( 'Operator Stack', opStk )
        TOstring( 'Type Stack', typStk )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )

def ETshoweval(stk):
    
    TOstring( 'Operand Stack', stk )


# Parser

In [None]:
# operands accepted:
# - decimal and hexadecimal floating point literals
# - scalar numeric variables

# operators accepted:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - grouping parentheses
# - logical not, equality, inequality
# - assignment

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

class Parser(object):
    
    VERSIONNUMBER = '9.10'
    
    _FLTMAX =  4294967295                                  # 2**32-1
    _FLTMIN = -4294967296                                  # -(2**32)

    _expMax = {
        'P' : math.log2(_FLTMAX),                          # max base 2 exponent
        'E' : math.log10(_FLTMAX)                          # max base 10 exponent
    }

    _rgtUnOp = '[-+!]'                                     # right associative unary operators
 
    _rgtUnPrec = {                                         # precedence
        '*': 100,
        '-': 80, '+': 80, '!': 80
    }
        
    _lftBinOp = '[-+*/]|==|!='                             # left associative binary operators
        
    _lftBinPrec = {                                        # precedence
        '*': 70, '/': 70,
        '-': 60, '+': 60,
        '==': 50, '!=': 50
    }
    
    _rgtBinOp = '='                                         # right associative binary operators
    
    _rgtBinPrec = { '=': 10 }                               # precedence
    
    _typeChkNdx = {                                         # indices into type check control lists                                                           # - in precedence order (not important)
        'U-':'n2n', 'U+':'n2n','U!':'n2n',
        'B*':'nn2n', 'B/':'nn2n',
        'B-':'nn2n', 'B+':'nn2n',
        'B==':'nn2n', 'B!=':'nn2n',
        'B=':'vn2n',
        'LoneSymbol':'n2n'
        
    }
    
    _typeChkLst = {                                         # type check control lists
         'n2n': ['number', 'number'],                       # - index is left-to-right
        'nn2n': ['number', 'number', 'number'],             # check is right-to-left
        'vn2n': ['number', 'numsym', 'number']
    }
    
    def __init__(self):
        pass
           
    def doparse(self, this):

        def parseErr(mesg, pos):
            '''report parse error'''
            UIerror(mesg)
            UIwriteln(f'>>> {this}')
            if pos > 0:
                UIwriteln(f'{"^^near here".rjust(pos)}')
            return False

        # initialize

        expr = this                # save to new variable but retain original for error reports
        start = 15                 # tracked so we can report where in an expression an error occurred
        token = None               # anything successfully matched
        ok = wantoperand = True    # flags
        result = []                # rpn expression
        opStk = [ ('EOE', 1) ]     # operator stack
        typStk = []                # type stack
        

        def typeCheck(op):
            '''type check operands'''            
            check = list(self._typeChkLst[self._typeChkNdx[op]])       # list() to avoid aliasing
            while len(check) > 1:
                want = check.pop()
                have, rpnPos, errPos = typStk.pop()
                              
                # do we want a number ?
                
                if want == 'number':
                    
                    # convert symbols to numbers
                    
                    if have == 'numsym':
                        result.insert( rpnPos, 'U*' )
                        
                # do we want a variable ?
                
                elif want == 'numsym':
                    
                    if have != 'numsym':
                        return parseErr( 'Numeric variable required', errPos )
                        
            # push intermediate result type (no position)
                    
            typStk.append( (check[0], -1, errPos) )
            return True
 
        def addOperand(op, typ):
            '''add operand to RPN'''
            result.append( op )
            typStk.append( (typ, len(result), start) )
            
        def addOperator():
            '''add operator to RPN'''            
            op = opStk.pop()[0]
            result.append( op )
            return typeCheck( op )
            
        def popGEop(prec):
            '''pop operators of equal or greater precedence'''
            ok = True
            while ok and prec <= opStk[-1][1]:
                ok = addOperator()
            return ok
 
        def pushLeft(op, prec):
            '''push left associative operator on stack'''
            if not popGEop(prec):
                return False
            opStk.append( (op, prec) )
            return True

        def popGop(prec):
            '''pop operators of greater precedence'''
            ok = True
            while ok and prec < opStk[-1][1]:
                ok = addOperator()
            return ok
 
        def pushRight(op, prec):
            '''push right associative operator on stack'''
            if not popGop(prec):
                return False
            opStk.append( (op, prec) )
            return True

        def popUntil(op, prec):
            '''clear and check operator stack'''
            if not popGEop(prec):
                return False
            elif op == opStk.pop()[0]:
                return True
            elif op == '(':
                return parseErr('Unmatched right parenthesis', start)
            elif op == 'EOE':
                return parseErr('Unmatched left parenthesis', 0)
       
        def convertFloat(fplit, base, capgrp):
            '''convert floating point literal to internal form'''
            
            def rangeErr():
                return parseErr(f'\'{fplit}\' is out of range', start)
            
            # collect the features of interest
                   
            p = re.search(capgrp, fplit.upper())
                            
            lint, lfrc, expbas, expsgn, lexp = p.group(1,2,4,5,6)
            
            # convert integer portion (if any)
 
            uint = 0
            if lint:
                p = re.search('[1-9A-F][0-9A-F]*', lint )
                if p != None:
                    for ch in p.group():
                        digval = '0123456789ABCDEF'.find(ch)
                        if uint <= (self._FLTMAX - digval)/base:
                            uint =  uint * base + digval
                        else:
                            return rangeErr()
                    
            # convert fractional portion (if any)
                    
            ufrc = 0
            if lfrc:
                fbase = 1
                for ch in lfrc:
                    digval = '0123456789ABCDEF'.find(ch)
                    fbase *= base
                    ufrc += digval/fbase
        
            if uint == self._FLTMAX and ufrc != 0:
                return rangeErr()
            
            # value so far
            
            uflt = uint + ufrc
 
            # convert exponent portion (if any)
            
            uexp = 0
            if lexp:
                for ch in lexp:
                    digval = '0123456789'.find(ch)
                    if uexp <= (self._expMax[expbas] - digval)/10:
                        uexp =  uexp * 10 + digval
                    else:
                        return rangeErr()
                    
            # adjust value by exponent (if any)
             
            if uexp:
                power = (2 if expbas == 'P' else 10) ** uexp
                if expsgn == '-':
                    uflt /= power
                elif uflt <= self._FLTMAX/power:
                    uflt *= power
                else:
                    return rangeErr()
                    
            addOperand( uflt, 'numlit' )
            return True

        def startsWith(regex):
            '''test if expression starts with given regular expression'''
            nonlocal expr, start, token

            p = re.match(regex, expr)
            if p == None:
                return False
            else:
                token = p.group()              # what we matched
                start += len(token)            # update to next match position in original string
                expr = expr[len(token):]       # "chop off" what we matched
                PTshowtoken(token)             # trace
                return True

        # top level main loop
              
        while ok and len(expr):

            _ = startsWith('[ ]+')                             # skip leading whitespace

            PTshowexpr(expr)                                   # trace

            # look for operand

            if wantoperand:

                if startsWith('[(]'):
                    '''left parenthesis ?'''
                    opStk.append( ('(', 2) )


                elif startsWith(self._rgtUnOp):
                    '''right unary?'''
                    pushRight( 'U' + token, self._rgtUnPrec[token] )

                else:

                    wantoperand = False                         # flip
                    
                    if startsWith('0[xX]([0-9a-fA-F]+([.][0-9a-fA-F]*)?|[.][0-9a-fA-F]+)([pP][-+]?[0-9]+)?'):
                        '''unsigned hexadecimal literal ?'''
                        ok = convertFloat(token, 16, '0X([0-9A-F]*)[.]?([0-9A-F]*)(([P])([-+])?([0-9]+))?')

                    elif startsWith('([0-9]+([.][0-9]*)?|[.][0-9]+)([eE][-+]?[0-9]+)?'):
                        '''unsigned decimal literal?'''
                        ok = convertFloat(token, 10, '([0-9]*)[.]?([0-9]*)(([E])([-+])?([0-9]+))?')
                        
                    elif startsWith('[a-zA-Z][_a-zA-Z0-9]*'):
                        '''numeric scalar variable?'''
                        addOperand( token.upper(), 'numsym')
                        
                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operand', start)

            # look for operator

            else:

                if startsWith('[)]'):
                    ok = popUntil( '(', 4 )

                else:

                    wantoperand = True                                # flip
                        
                    if startsWith(self._lftBinOp):
                        '''left binary?'''
                        ok = pushLeft( 'B' + token, self._lftBinPrec[token] )

                    elif startsWith(self._rgtBinOp):
                        '''right binary?'''
                        ok = pushRight( 'B' + token, self._rgtBinPrec[token])

                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operator', start)

            PTshowparse(ok, result, opStk, typStk )               # trace
 
        if ok and wantoperand:
            ok = parseErr('Unexpected end of expression', start ) # must be in 'wantoperator' state   

        if ok:
            ok = popUntil( 'EOE', 3 )                             # clear operator stack
            
        if ok:
            ok = typeCheck('LoneSymbol')                          # make sure final result is a number

        return (ok, result if ok else None)                       # done

### How it works

We introduce a type stack, *typStk[]*. We also create a new function, *addOperand()*, to add an operand to the Reverse Polish we are building up. Its purpose is simply to make sure we also push the operand's type, where it is located in the RPN and where in the input expression it is from (for error reporting) onto the type stack.

>So far we only have two types, *numlit* for numeric literals and *numsym* for variable names. There will be more.

Assignment is a right associative binary operator. We add a check for it when the parser is looking for an operator.

As with operands, we also have a new function *addOperator()* to handle adding popped operators to the Reverse Polish. This is used by *popGEop()* and *popGop()* to make sure their operands are type checked.

Type checking is handled by *typeCheck()*. Its argument is an operator; its result is whether or not the operator's arguments are type correct. So far it only does two things:

- convert variable names to values everywhere except where one appears on the left side of an assignment operator
- verify that what appears on the left side of an assignment operator is a variable name

All operators have as a result an operand that has its own type, which so far are all numbers. Even though these results will not really exist until evaluation time, we already know what types they will be. We need to push that information onto the type stack so that other operators in the expression have access to them when needed. Unlike operands in the input expression, none of these intermediate result operands have an explicit position in the Reverse Polish.

All operators introduced before this version have taken one or two numeric values and produced a numeric result. The type check function uses the operator it is given as an index into a dictionary of lists that indicate which checks to apply and what the result is. Then it simply loops through the list to make sure the arguments match (or can be coerced into) what is expected.

>This is now where we silently add the *U\** de-reference operator if needed.

The assignment operator *B=* is different. It checks for a symbol name on its left side and a number on its right. It does not add the de-reference operator to any symbol it finds but does complain if no symbol is found.

Finally, *doparse()* now ends by directly calling *typeCheck()* with an argument that exists just to be an index to the dictionary of type checks. The only time this call will have any effect is when an expression consists of a lone symbol name. In that case the type *numsym* remains on the type stack because no operator was ever popped that in turn would have popped it. This final type check will cause any lone symbol name to be de-referenced (which is what we want).

The function *parseErr()* has been slightly modified. It now accepts a second parameter that indicates approximately where a problem in the input expression is. Most often this is the same as the *start* variable it has been implicitly using all along. *Start* just records the current parse position, plus a little offset for formatting purposes. But now we need to account for *typeCheck()* reporting an error from a previous position. We could simply re-set *start* to that position and call *parseErr()* as we have always done. This wouldn't matter since the parse will be immediately terminated and *start* will never be needed again. But it feels overall cleaner to feed *parseErr()* a position parameter, even if it means slightly more code.

In some ways this code is over-built for what it needs to do. There are simpler and more direct ways to accomplish the two tasks it must. However an ad hoc approach to type checking quickly becomes unwieldy as new kinds of checks become necessary. With this framework already in place it will be much easier – and cleaner – to add new type checks as we need them. And that prospect makes it worth putting up with a little overkill for the time being.

>By now you may have realized that the parser is starting to deviate considerably from the [classic shunting yard](https://en.wikipedia.org/wiki/Shunting-yard_algorithm) parsing algorithm. That process actually began with the previous version of the parser and its introduction of a de-reference operator that did not exist in the original infix.

>The good news is that the underlying principles of the shunting yard – the two states, the identification of tokens in the input expression and the use of precedence to determine when to move an operator into the Reverse Polish – are never going to be abandoned. The bad news is that we’ve only just begun perverting everything else.

# Evaluator

In [None]:
# operators handled:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - logical negation, equality and inequality
# - variable name de-reference
# - variable value assignment

# errors detected:
# - out of range
# - division by zero

# return tuple:
# - (True, result)
# - (False, None)

class Evaluator(Parser):
    
    def __init__(self):
        self._symTable = dict()
 
    def doeval(self, rpn):
 
        def inRange(ok, val):
            '''range check test result'''
            if ok:
                stk.append( val )
            else:
                UIerror( 'Evaluation result out of range' )
            return ok
        
        def pushOperand(val):
            '''push operand on stack'''
            stk.append( val )
            return True
                     
        def binVal(val, var):
            '''assign value to variable'''
            self._symTable[var] = val
            return pushOperand(val)
        
        def unVal(var):
            '''variable name value'''
            return pushOperand(self._symTable[var] if var in self._symTable else 0)

#        def unPlu(arg):
#           '''unary plus'''
#           return pushOperand( arg )

        def unNeg(arg):
            '''unary negation'''
            return inRange( arg != self._FLTMIN, -arg )
        
        def unNot(arg):
            '''logical not'''
            return pushOperand( not arg )

        def binAdd(rgt, lft):
            '''binary addition'''
            if lft >= 0:
                return inRange( rgt <= self._FLTMAX - lft, lft+rgt )       
            else:
                return inRange( rgt >= self._FLTMIN - lft, lft+rgt )

        def binSub(rgt, lft):
            '''binary subtraction'''
            if lft >= 0:
                return inRange( lft - self._FLTMAX <= rgt, lft-rgt )
            elif rgt >= 0:
                return inRange( lft - self._FLTMIN >= rgt, lft-rgt )
            else:
                return inRange( lft <= self._FLTMAX + rgt, lft-rgt )

        def binMul(rgt, lft):
            '''binary multiplication'''
            if abs(lft) <= 1 or abs(rgt) <= 1:
                return pushOperand( lft * rgt )

            elif lft > 0:
                if rgt > 0:
                    return inRange( rgt <= self._FLTMAX / lft, lft * rgt )
                else:
                    return inRange( rgt >= self._FLTMIN / lft, lft * rgt )

            elif rgt > 0:
                return inRange( rgt <= self._FLTMIN / lft, lft * rgt )
  
            else:
                return inRange( rgt >= self._FLTMAX / lft, lft * rgt )

        def binDiv(rgt, lft):
            '''binary division'''
            if abs(rgt) >= 1:
                return pushOperand( lft/rgt )
            
            elif rgt > 0:
                if lft > 0 :
                    return inRange( lft <= self._FLTMAX * rgt, lft / rgt )
                else:
                    return inRange( lft >= self._FLTMIN * rgt, lft / rgt )
            
            elif rgt < 0:
                if lft > 0:
                    return inRange( lft <= self._FLTMIN * rgt, lft / rgt )
                else:
                    return inRange( lft >= self._FLTMAX * rgt, lft / rgt )

            else:
                UIerror( 'Division by zero' )
                return False
            
        def binEqu(rgt, lft):
            '''logical equality'''
            return pushOperand( lft == rgt )
        
        def binNeq(rgt, lft):
            '''logical inequality'''
            return pushOperand( lft != rgt )
        
        # initialize
                                
        unDispatch = {
            'U*': unVal,
            'U-': unNeg,
            'U+': pushOperand,
            'U!': unNot
        }
        
        binDispatch = {
            'B+': binAdd,
            'B-': binSub,
            'B*': binMul,
            'B/': binDiv,
            'B==': binEqu,
            'B!=': binNeq,
            'B=': binVal
        }
  
        stk = []
        ok = True

        # main loop
        
        TOstring('Input', rpn)
        
        for v in rpn:

            ETshowtoken(v)
            
            if v in binDispatch:
                ok = binDispatch[v](stk.pop(), stk.pop())
                
            elif v in unDispatch:
                ok = unDispatch[v](stk.pop())
                
            else:
                stk.append( v )

            if not ok:
                return ( False, None )

            ETshoweval( stk )

        return ( True, stk.pop() )


### How it works

In contrast to the parser, relatively few changes are made to the evaluator. The function to execute assignment is initialized like any other binary operator. In (a now non-null) *\_\_init\_\_()* we establish *_symTable{}*, an instance variable we will use to associate variable names with the values they represent. The *U\** operator now checks for the existence of a name in the symbol table. If it finds one it pushes the associated value onto the operand stack, otherwise it pushes the default value we assign to all unknown names.

The only possibly odd happening is that besides associating a name with a value, the *B=* operator also pushes that same value onto the operand stack. Why does it do that?

Perhaps the easiest way to think of this is as a specific instance of the general [C](https://en.wikipedia.org/wiki/C_(programming_language)) rule that all expressions produce a value. An assignment is an expression, therefore it must yield a value.

>Technically (and non-intuitively) assignment's association of a value with a symbol is only a [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science)) in C.

Also, if assignment *didn’t* push a value we’d be trying to pop an empty operand stack at the end of evaluation, which is a fatal error. So there’s that. Is there anything else it’s good for?

Yes. Because the assignment operator is right associative at parse time and pushes a value onto the operand stack at evaluation time, multiple assignment of the same value to more than one symbol is trivial. The expression:

```Python
A = B = C = 1
```

associates each of the three symbols **A**, **B** and **C** with the value one. The parsed Reverse Polish of that expression is:

```Python
A B C 1 B= B= B=
```

At evaluation time the first *B=* associates the value *one* with the symbol **C** and pushes *one* back on the operand stack. The second *B=* does the same thing using **B**, and the third using **A** (after which we retrieve and display *one* as the expression result).

Another way to look at multiple assignment is that it’s easier to allow it than to forbid it. If it was not legal, we’d have to implement some way to detect and report violations. And that would be a pain.


## Running the parser

In [None]:
passCnt = failCnt = 0                       # most useful for test input files, but never any harm

myParser = myEvaluator = None               # where we keep instances of our classes

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    global myParser, myEvaluator 
    if not myParser:
        myParser = Parser()
    if not myEvaluator:
        myEvaluator = Evaluator()
    UIshow( 'Parser', myParser.VERSIONNUMBER )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser

def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    UIshow( 'Input', this )
    ok, res = myParser.doparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = myEvaluator.doeval( res )
        if ok:
            UIshow( 'Final Eval', res )
    if ok:
        passCnt += 1
    else:
        failCnt += 1

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = myParser.VERSIONNUMBER[:len(myParser.VERSIONNUMBER)-3]
    tail = myParser.VERSIONNUMBER[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):

    UIwriteln(f'Parser {myParser.VERSIONNUMBER} vs {this[-12:-4]}')
        
    myEvaluator._symTable.clear()           # make sure no variables persist between tests
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    startUp(showBatch)                       # must create objects before we can access variables inside them 
    currFn = makePath('pass', currNum())
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    startUp(showBatch)
    currFn = makePath('fail',currNum())
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)                
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

So far we have created only one instance each of *Parser* and *Evaluator* to use across all our tests. This has not mattered because the only things about them that persist across tests don't change.

However now that the evaluator includes a symbol dictionary, the possibility arises that if we do nothing a variable name used in more than one test might have a non-zero value at the start of a test. Depending on the order in which batch or interactive tests are executed, this can change the expected result.

To combat this source of possible error we now explicitly clear the symbol dictionary each time we call *runTest()*.

>We could also create a new instance of *Evaluator* before each test, but just clearing the dictionary is simpler and faster.

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good()        # run current parser against its own pass test. Use good('1234') to run against specific pass test.

In [None]:
bad()         # run current parser against its own fail test. Use bad('5678') to run against specific fail test.

In [None]:
regress()     # run parser against all previous tests