# Parser 2.00 - Arithmetic Addition Operator

The goal of this version is for the parser to accept and evaluate [infix expressions](https://en.wikipedia.org/wiki/Infix_notation) consisting of only decimal and hexadecimal numeric values and the [binary](https://en.wikipedia.org/wiki/Binary_operation) arithmetic addition operator **‘+’**.

For such a limited aim the code changes are surprisingly extensive. There is also a new facility for tracing parsing and evaluation. This is not necessary for proper operation of the parser or evaluator but can be helpful in understanding what happens (or sadly, doesn’t).

Infix expressions like

```Python
1 + 1
```

or

```Python
1 + 2 + 3
```

have two properties that we are going to take advantage of:

- they start and end with a number


- they have a ‘+’ symbol between each pair of numbers

These may seem so obvious as to not be worth mentioning, but the strictly alternating pattern of numbers (operands) and **‘+’** signs (operators) allows us to design the parser as a simple [state machine](https://en.wikipedia.org/wiki/Finite-state_machine). In this case quite simple, as there will be only two states: *expecting operand* and *expecting operator*.

We can also use the state concept to decide if an expression is properly formed. The initial state is always *expecting operand*, and if we do not find one the expression is malformed. When a legal expression has been completely parsed the final state is always *expecting operator*, because the last thing that can be legally accepted is an operand. So if the final state is *expecting operand*, the expression is malformed.

Note that by these rules a single operand by itself is properly formed, so everything that we have done up to this point still works.

## Libraries

In [None]:
import glob       # for searching directories

import re         # for regular exprssions

## User output

In [None]:
visSep = '-------------'             # visual separator

def UIwriteln(this):
    '''write a single line to output'''
    print( f'{this}\n' )
    
def UIwriteSep():
    '''write a visual seperator'''
    UIwriteln( visSep )

def UIshow(tag, value):
    '''write a tagged value to output'''
    UIwriteln( f'{tag}: {value}' )

def UIerror(this):
    '''write an error message to output'''
    UIshow( 'Error', this )

# Tracing

In [None]:
# flags: show trace of processing

showInteract = True          # default for interactive use
showBatch = False            # default for batch use

showTrace = None             # control flag

# Trace Output

def TOshow(mesg, text):
    '''write trace message to output if enabled'''
    if showTrace:
        UIshow( f'{mesg:15s}', text )

# -----------------------
# Parse Tracing
# -----------------------

def PTshowexpr(this):

    TOshow( 'Parse', visSep )
    TOshow( 'Current Expr', this )

def PTshowparse(ok, res):

    if ok:
        TOshow( 'Current Parse', ' '.join([str(e) for e in res]) )

def PTshowtoken(this):

    if not this[0] == ' ':
        TOshow( "Found Token", this )

# -----------------------
# Evaluation Tracing
# -----------------------

def ETshowtoken(this):
    
    TOshow( 'Eval', visSep )
    TOshow( 'Current token', this )


Tracing is not necessary to the proper operation of either the parser or evaluator, but is quite helpful before that stage is reached. Once a new version is behaving nicely, the main use of tracing is to provide insight into what is going on “under the hood”.
 
By default tracing is active when the parser is being used interactively and suppressed when input is coming from a file. This behavior is controlled by two flags, *showInteract* and *showBatch*, set when the block is run and left alone thereafter. These in turn are used to set the *showTrace* flag during parser test startup.

Tracing does slow operation. This is not a major concern as long as this project is purely a learning exercise. If transplanted for use elsewhere, tracing should probably be removed.

# Common

In [None]:
uintMax = 4294967295                  # 2**32-1, for range checking

We move this value to a global variable because now the evaluator also needs to know what it is.

# Parser

In [None]:
versionNumber = '2.00'

# operands accepted:
# - decimal integer literals
# - hexadecimal integer literals

# operators accepted:
# - binary addition

# errors detected:
# - unrecognized input
# - out of range numeric input
# - malformed expression

# result tuple:
# - (True, [parse])
# - (False, None)

def PEdoparse(this):
       
    # initialize
    
    expr = this                # save to new variable but retain original for error reports
    start = 15                 # tracked so we can report where in an expression an error occurred
    token = None               # anything successfully matched
    ok = wantoperand = True    # flags
    result = []                # return value is built up here
               
    # report parse error

    def parseErr(mesg):
    
        UIerror(mesg)
        UIwriteln(f'>>> {this}')
        UIwriteln(f'{"^^near here".rjust(start)}')
        return False

    # convert unsigned literal to internal form
    # - all chars in input known to be legal hexadecimal characters
    # - checks that value is within range
    
    def convertUint(ulit, base):
        
        uint = 0
        
        # isolate the significant portion of 'ulit'
        # - this drops any leading prefix and all leading zeroes
        # - if search fails, then input is all zeroes (and so is value)
        
        p = re.search('[1-9A-F][0-9A-F]*', ulit.upper())
        
        if p != None:
            for digit in p.group():
                digval = '0123456789ABCDEF'.find(digit)
                if uint <= (uintMax - digval)/base:
                    uint =  uint * base + digval
                else:
                    return parseErr(f'\'{ulit}\' is out of range')
        
        result.append(uint)
        return True
    
    # test if expression starts with given regular expression
    
    def startsWith(regex):
        
        nonlocal expr, start, token
        
        p = re.match(regex, expr)
        if p == None:
            return False
        else:
            token = p.group()              # what we matched
            start += len(token)            # update to next match position in original string
            expr = expr[len(token):]       # "chop off" what we matched
            PTshowtoken(token)             # trace
            return True
                 
    # top level main loop
    
    while ok and len(expr):
        
        # skip leading whitespace
        
        _ = startsWith('[ ]+')
            
        # trace
        
        PTshowexpr(expr)
            
        # look for operand
             
        if wantoperand: 
        
            # unsigned hexadecimal literal ?
    
            if startsWith('0[xX][0-9a-fA-F]+'):
                ok = convertUint(token, 16)
        
            # unsigned decimal literal ?
    
            elif startsWith('[0-9]+'):
                ok = convertUint(token, 10)
    
            # don't know what it is
    
            else:
                ok = parseErr('Expecting operand')
            
        # look for '+' operator
        
        else:
            
            if not startsWith('[+]'):
                ok = parseErr('Expecting operator')
            
        # trace
        
        PTshowparse(ok, result )
            
        # toggle state
        
        wantoperand = not wantoperand
        
    # at end must be in 'wantoperator' state
    # - ie., last token must be an operand
                    
    if ok and wantoperand:
        ok = parseErr('Unexpected end of expression')
                    
    # done
                    
    return (ok, result if ok else None)

### How it works

The basic idea is to loop through the input expression. We use *startsWith()* to see if we can match something we're looking for. If we can, we "chop it off", shortening the expression we are parsing.

- whitespace between tokens is ignored


- operands are converted into internal form and saved to a list


- operators are ignored

>Because there is only one operator it can simply and safely be implied in the final result.

If we cannot make any match, or if a conversion overflow occurs, we have to report an error. This is centralized in *parseErr()*, using variable values we have been tracking specifically for this purpose.

>It's not enough simply to report an error occurred. We want to give the user a fighting chance to figure out what happened.

The loop is broken when we run out of expression (everything has been chopped off) or encounter an error. We make one last check to see if we're in the proper state. Then we return our success/fail flag and any result.

>We could also have pre-compiled the regular expression we use, thereby giving us access to some additonal functions that in some respects would make the parser on the whole simpler. The main reason we didn't is that we then couldn't keep the parser completely self-contained in one function. In a production version we might very well use a class to get around this.

Because the character **'+'** used to signify addition is also used as [metacharacter](https://www.regular-expressions.info/characters.html) in regular expressions, we have to somehow indicate that we want the literal meaning and not its metacharacter meaning. A [character class](https://www.regular-expressions.info/charclass.html) with only one member is a handy way to do this, as most characters have no meta meaning when part of a class.

# Evaluator

In [None]:
# errors detected:
# - out of range result

# result tuple:
# - (True, sum)
# - (False, None)

def EEdoeval(this):
    
    res = 0
    for v in this:
        ETshowtoken(v)
        if res <= uintMax - v:
            res += v
        else:
            UIerror('Evaluation result out of range')
            return (False, None)
            
    return (True, res)


Because there is only one operator so far, we can simply add up each input value to get a final result. The only thing we have to check for is overflow. We do this in a very similar way as we do in *convertUint()*, and for exactly the same reasons.

## Running the parser

In [None]:
passCnt = failCnt = 0                       # most useful for test input files, but never any harm

def startUp(flag):
    '''begin execution'''
    global passCnt, failCnt, showTrace
    UIshow( 'Parser', versionNumber )
    passCnt = failCnt = 0
    showTrace = flag
    
def shutDown():
    '''terminate execution'''
    UIwriteSep()
    UIshow( 'Pass', passCnt )
    UIshow( 'Fail', failCnt )
    
# run parser

def parseOne(this):
    '''parse/evaluate one expression'''
    global passCnt, failCnt
    UIwriteSep()
    UIshow( 'Input', this )
    ok, res = PEdoparse( this )
    if ok:
        UIshow( 'Final Parse', res )
        ok, res = EEdoeval( res )
        if ok:
            UIshow( 'Final Eval', res )
    if ok:
        passCnt += 1
    else:
        failCnt += 1

## Interactive use

In [None]:
def parse():
    
    startUp(showInteract)
    while True:
        inp = input( 'Expression: ' )
        UIwriteln( '' )                      # looks better with a blank line here
        if inp.upper()[0] == 'Q':
            break
        elif inp.strip():
            parseOne( inp )
    shutDown()

## Batch processing

In [None]:
testDir = '..\\ParserTest\\'            # directory holding test input files (empty string if same as notebook directory)

# convert current version number to match test file numbers
# - done this way so we can update only the version number and everything still works

def currNum():
    
    head = versionNumber[:len(versionNumber)-3]
    tail = versionNumber[-2:]
    return f'{head:0>2}{tail}'

# make full path name to test file

def makePath(typ, num):
    return f'{testDir}{typ}{num}.txt'

# run one test

def runTest(this):
    
    UIwriteln(f'Parser {versionNumber} vs {this[-12:-4]}')
    
    with open(this) as f:
        data = f.readlines()
    for line in data:
        test = line.strip()
        if test and test[0] != '#':         # skip blank and comment lines
            parseOne(test)
    
# run a test of current or specified version which should succeed
    
def good(num='curr'):
  
    startUp(showBatch)
    runTest(makePath('pass', currNum() if num == 'curr' else num))
    shutDown()
    
# run a test of current or specified version which should fail

def bad(num='curr'):
    
    startUp(showBatch)
    runTest(makePath('fail', currNum() if num == 'curr' else num))
    shutDown()
    
# run regression test against current and all previous test files

def regress():
            
    UIwriteln('PASS tests')
    
    currFn = makePath('pass', currNum())

    startUp(showBatch)
    failed = []
    fnlist = glob.glob(f'{testDir}pass????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = failCnt
            runTest(fn)
            if atstart < failCnt:
                failed.append(fn)               
    shutDown()
    
    UIwriteln('FAIL tests')
    
    currFn = makePath('fail',currNum())

    startUp(showBatch)
    passed = []
    fnlist = glob.glob(f'{testDir}fail????.txt')
    for fn in fnlist:
        if fn <= currFn:
            atstart = passCnt
            runTest(fn)
            if atstart < passCnt:
                passed.append(fn)               
    shutDown()
    
    if not len(failed):
        UIwriteln('All pass tests succeded')
    else:
        UIwriteln('Pass tests which failed')
        for fn in failed:
            UIwriteln(f'  {fn}')
            
    if not len(passed):
        UIwriteln('All fail tests succeded')
    else:
        UIwriteln('Fail tests which passed')
        for fn in passed:
            UIwriteln(f'   {fn}')
              

# Testing the parser

In [None]:
parse()       # interactive, one expression at a time

In [None]:
good()        # run current parser against its own pass test. Use good('1234') to run against specific pass test.

In [None]:
bad()         # run current parser against its own fail test. Use bad('5678') to run against specific fail test.

In [None]:
regress()     # run parser against all previous and current tests