# Parser 6.00 - Conversion to Classes

This version does not add any new capabilties but instead begins the process of converting the parser and evaluator from stand alone functions into Python classes.

The main motivation is that the *\_\_init\_\_()* function (method) of a class offers the attractive possibility of paying a one-time initiation cost to set up anything we might want to re-use several times during the lifetime of an instance of the class. Even if we don't use this capability right away, we'll always have the option.

The main reason for doing the conversion now is that the parser and evaluator are still simple enough that the job is fairly straightforward.

An additional reason is that, as has already been mentioned, part of the purpose of this series is simply to learn how things work in Python in the first place.

While we could convert every set of related functions we've created so far (ie., all the scaffolding functions) into classes, we'll stick to just the parser and evaluator. That should be sufficient for our purposes.

## Libraries

In [None]:
import glob       # for searching directories

import re         # for regular exprssions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual separator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )
        
def TOstring(tag, this):
    
    if showTrace:
        TOshow( tag, ' '.join([str(e) for e in this]) )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res, stk):

    if ok:
        TOstring( 'Current RPN', res )
        TOstring( 'Operator Stack', stk )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )

def ETshoweval(ok, stk):
    
    if ok:
        TOstring( 'Operand Stack', stk )


# Common

In [None]:
# intMax =  4294967295                # 2**32-1, for range checking
# intMin = -4294967296                # -(2**32)   

At present both the parser and evaluator need to know what these values are. If we want our classes to be self-contained we'd prefer not to leave them dependent on global values external to themselves. Is there another way to specify these and get rid of *Common* altogether?

At first sight, the simplest approach would be to copy these values into both classes and be done with it. However if in the future these values changed, we'd have to remember to do it in both places. This is the sort of requirement that is easy to overlook. We'd have to document it somewhere (at the very least with a comment) to remind ourselves to do it.

Another approach would be to require only one class to know them. One idea might be to implement all range checking in a separate *Limits* class. This has the apparent drawback of needing to find a way to handle all the various checks we've already implemented. At first glance this seems more trouble than it's worth, plus in the end requiring more classes then we really want.

Since the parser currently has only one range check, a simpler approach might be to let only the *Evaluator* class know about the limits and require the *Parser* class to use the evaluator to check if an integer conversion has gone out of range. Changes to the evaluator would be fairly minimal.

We could also create a *Limits* base class and derive both the *Parser* and *Evaluator* classes from it, thereby incorporating its class "constants" into them. This has a certain appeal but also a certain inelegance about it as *Limits* itself doesn't really seem foundational.

Perhaps the most elegant solution would be to create only one class incorporating the range limits as instance variables and two methods, *doparse()* and *doevaluate()*. In a real world application this is probably what we would do. However for learning and expository purposes we'd prefer to keep the parser and evaluator in separate code cells.

>It's actually possible to have one Python class split over multiple notebook cells. While Jupyter notebooks themselves have no built-in way to do this, there are some [informal methods](https://stackoverflow.com/questions/45161393/jupyter-split-classes-in-multiple-cells/45161859).

What we'll do here is define these and any other values needed by both the parser and evaluator in the *Parser* class. Then we'll use it as a base class to derive the *Evaluator* class. This will give the evaluator access to these values without exposing them to anything outside either class (at least in theory).

>We'll define both limit values in the *Parser* class even though it doesn't need both in order to keep them together, just in case of future changes.

# Parser

In [None]:
# operands accepted:
# - decimal integer literals
# - hexadecimal integer literals

# operators accepted:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division
# - grouping parentheses

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

class Parser(object):

    def __init__(self):
    
        self.VERSIONNUMBER = '6.00'
    
        self._INTMAX =  4294967295                # 2**32-1, for heritable range checking
        self._INTMIN = -4294967296                # -(2**32)   

    def doparse(self, this):

        # initialize

        expr = this                # save to new variable but retain original for error reports
        start = 15                 # tracked so we can report where in an expression an error occurred
        token = None               # anything successfully matched
        ok = wantoperand = True    # flags
        result = []                # rpn expression
        stk = [ ('EOE', 1) ]       # operator stack

        def parseErr(mesg):
            '''report parse error'''
            UIerror(mesg)
            UIwriteln(f'>>> {this}')
            UIwriteln(f'{"^^near here".rjust(start)}')
            return False

        def popGEop(prec):
            '''pop operators of equal or greater precedence'''
            while prec <= stk[-1][1]:
                result.append(stk.pop()[0])

        def pushLeft(op, prec):
            '''push left associative operator on stack'''
            popGEop(prec)
            stk.append( (op, prec) )

        def popGop(prec):
            '''pop operators of greater precedence'''
            while prec < stk[-1][1]:
                result.append(stk.pop()[0])

        def pushRight(op, prec):
            '''push right associative operator on stack'''
            popGop(prec)
            stk.append( (op, prec) )

        # clear operators off stack until reaching target operator

        def popUntil(op, prec):
            '''clear and check operator stack'''
            popGEop(prec)
            if op == stk.pop()[0]:      # top remaining operator is the one we want to see ?
                return True             # yes...

            # ...no

            elif op == '(':
                return parseErr('Unmatched right parenthesis')
            elif op == 'EOE':
                return parseErr('Unmatched left parenthesis')

        # convert unsigned literal to internal form

        def convertUint(ulit, base):

            uint = 0

            # isolate the significant portion of 'ulit'

            p = re.search('[1-9A-F][0-9A-F]*', ulit.upper())

            if p != None:
                for digit in p.group():
                    digval = '0123456789ABCDEF'.find(digit)
                    if uint <= (self._INTMAX - digval)/base:
                        uint =  uint * base + digval
                    else:
                        return parseErr(f'\'{ulit}\' is out of range')

            result.append(uint)
            return True

        # test if expression starts with given regular expression

        def startsWith(regex):

            nonlocal expr, start, token

            p = re.match(regex, expr)
            if p == None:
                return False
            else:
                token = p.group()              # what we matched
                start += len(token)            # update to next match position in original string
                expr = expr[len(token):]       # "chop off" what we matched
                PTshowtoken(token)             # trace
                return True

        # top level main loop

        while ok and len(expr):

            _ = startsWith('[ ]+')                             # skip leading whitespace

            PTshowexpr(expr)                                   # trace

            # look for operand

            if wantoperand:

                if startsWith('[(]'):
                    '''left parenthesis ?'''
                    stk.append( ('(', 2) )                     # push directly on stack


                elif startsWith('[-+]'):
                    '''unary negation or plus ?'''
                    pushRight( 'U' + token, 80 )                # decorate

                else:

                    wantoperand = False                         # flip

                    if startsWith('0[xX][0-9a-fA-F]+'):
                        '''unsigned hexadecimal literal ?'''
                        ok = convertUint(token, 16)

                    elif startsWith('[0-9]+'):
                        '''unsigned decimal literal ?'''
                        ok = convertUint(token, 10)

                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operand')

            # look for operator

            else:

                if startsWith('[)]'):
                    ok = popUntil( '(', 4 )

                else:

                    wantoperand = True                                # flip

                    if startsWith('[*/]'):
                        '''binary multiplication or division ?'''
                        pushLeft( token, 70 )

                    elif startsWith('[-+]'):
                        '''binary addition or subtraction ?'''
                        pushLeft( 'B' + token, 60 )                   # decorate

                    else:
                        '''malformed'''
                        ok = parseErr('Expecting operator')

            PTshowparse(ok, result, stk )                         # trace

        if ok:
            if wantoperand:
                ok = parseErr('Unexpected end of expression')     # must be in 'wantoperator' state   
            else:
                ok = popUntil( 'EOE', 3 )                         # clear operator stack

        return (ok, result if ok else None)                       # done

### How it works

Conversion to a class is fairly straightforward.

We explicitly derive *Parser* from *Object* just to be safe.

We use the *\_\_init\_\_()* function to initialize a few instance variables. We will actually treat them as constants. Some we don't really want to expose to anything outside the class (or its children). These we prefix with an underscore and hope are thereby respected as private constants (though there may be no guarantee).

We add a *self* parameter to *doparse()* to account for the actual class instance itself being implicitly passed to the function (though not explicit in the actual call).

We change the reference to the global *intMax* in the *convertUint()* function to *self._INTMAX*, the instance's own private copy of the integer maximum value.

And that's it.

# Evaluator

In [None]:
# operators handled:
# - unary negation, plus
# - binary addition, subtraction, multiplication, division

# errors detected:
# - out of range
# - division by zero

# return tuple:
# - (True, result)
# - (False, None)

class Evaluator(Parser):
    
    def __init__(self):
        super().__init__()
    
    def doeval(self, rpn):

        stk = []
        ok = True

        def inRange(ok, val):
            '''range check test result'''
            if ok:
                stk.append( val )
            else:
                UIerror( 'Evaluation result out of range' )
            return ok

        def unNeg():
            '''unary negation'''
            arg = stk.pop()
            return inRange( arg != self._INTMIN, -arg )

        def binAdd():
            '''binary addition'''
            rgt = stk.pop()
            lft = stk.pop()

            if lft >= 0:
                return inRange( rgt <= self._INTMAX - lft, lft+rgt )       
            else:
                return inRange( rgt >= self._INTMIN - lft, lft+rgt )

        def binSub():
            '''binary subtraction'''
            rgt = stk.pop()
            lft = stk.pop()

            if lft >= 0:
                return inRange( lft - self._INTMAX <= rgt, lft-rgt )
            else:
                return inRange( lft - self._INTMIN >= rgt, lft-rgt )

        def binMul():
            '''binary multiplication'''
            rgt = stk.pop()
            lft = stk.pop()

            if lft == 0 or rgt == 0:
                return inRange( True, 0 )

            if lft > 0:
                if rgt > 0:
                    return inRange( rgt <= self._INTMAX / lft, lft*rgt )
                else:
                    return inRange( rgt >= self._INTMIN / lft, lft*rgt )

            else:
                if rgt > 0:
                    return inRange( rgt <= self._INTMIN / lft, lft * rgt )
                else:
                    return inRange( rgt >= self._INTMAX / lft, lft * rgt )

        def binDiv():
            '''binary division'''
            rgt = stk.pop()
            lft = stk.pop()

            if rgt != 0:
                return inRange( True, lft//rgt )      # floored division so result is an integer
            else:
                UIerror( 'Division by zero' )
                return False

        # main loop

        for v in rpn:

            ETshowtoken(v)

            if v == 'U-':          # unary negation ?
                ok = unNeg()

            elif v == 'B+':        # binary addition ?
                ok = binAdd()

            elif v == 'B-':        # binary subtraction ?
                ok = binSub()

            elif v == '*':         # binary multiplication ?
                ok = binMul()

            elif v == '/':         # vinary division ?
                ok = binDiv()

            elif v != 'U+':        # it's probably an operand
                stk.append( v ) 

            if not ok:
                return (False, None)

            ETshoweval( ok, stk )

        return ( True, stk.pop() )


### How it works

This is even simpler than for the parser.

We derive *Evaluator* from *Parser* so we can get at the integer limit values it contains.

Since those values are instance rather than class variables, we need to make sure *Parser.\_\_init\_\_()* runs. We explicitly call it via *super()* just to make very clear what's going on.

>The *Evaluator* class has nothing in particular to initialize itself. It doesn't actually need an *\_\_init\_\_()* function at all. If omitted, *Parser.\_\_init\_\_()* would still be implicitly called during instantiation. Either way works.

We change each reference to a global limit value to the instance's own copy of those limit values.

We add a *self* parameter to *doeval()*.

And that's it.

## Running the parser

In [None]:
passCnt = failCnt = 0                       # most useful for test input files, but never any harm

myParser = myEvaluator = None               # where we keep instances of our classes

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    global myParser, myEvaluator
    if not myParser:
        myParser = Parser()
    if not myEvaluator:
        myEvaluator = Evaluator()
    UIshow( 'Parser', myParser.VERSIONNUMBER )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser

def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    UIshow( 'Input', this )
    ok, res = myParser.doparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = myEvaluator.doeval( res )
        if ok:
            UIshow( 'Final Eval', res )
    if ok:
        passCnt += 1
    else:
        failCnt += 1

We now create instances of both the parser and evaluator in the scaffold function *startUp()*. We make the variables referencing them global so we don't have to pass them all around our scaffolding functions.

It's probably not necessary to check if an instance already exists before creating one. Python's automatic garbage collection should take care of reclaiming abruptly unreferenced existing instances if we just overwrite *myParser* and *myEvaluator* with newly created ones each time *startUp()* is called. Nevertheless, it's a cheap check and conserves hardware resources such as CPU time and memory space.

Now that we have parser and evaluator instances, we must remember to use them when we want to access their data and methods, so we make a few other minor changes to *startUp()* and *parseOne()*

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = myParser.VERSIONNUMBER[:len(myParser.VERSIONNUMBER)-3]
    tail = myParser.VERSIONNUMBER[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):
    
    UIwriteln(f'Parser {myParser.VERSIONNUMBER} vs {this[-12:-4]}')
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    startUp(showBatch)                       # must create objects before we can access variables inside them 
    currFn = makePath('pass', currNum())
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    startUp(showBatch)
    currFn = makePath('fail',currNum())
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)               
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

In previous versions running the cells containing the parser and evaluator before running this one implicitly made their functions available to this one. Now running those cells only defines those classes and they must be explicitly instantiated before we can use them.

In particular, *currNum()* needs the value of *VERSIONNUMBER*. As we've defined *startUp()* to be where our classes get instanciated, we've re-arranged the order of function calls to make sure that call comes before the any to *currNum()*.

The same sort of requirement held when *versionNumber* was held in a global variable. The cell defining its value had to be run before this cell (it just wasn't quite so obvious).

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good('0500')  # use last version with functional changes

In [None]:
bad('0500')   # use last version with functional changes

In [None]:
regress()     # run parser against all previous tests

Because this version has no new capabilities, there are no new tests for it. We can still run it against any previous version's tests (and should, to make sure we didn't mess anything up by accident).