### Scanner

This notebook contains the scanner of the logic to natural language translator. It reads the characters of the source consecutively and recognizes symbols they form:

- Procedure `getChar()` reads the next character of gobal variable `src` into global variable `ch`. When it reaches the end, `ch` is set to `chr(0)`. The current position is maintained in the global variable `pos`. 

- Procedure `error(msg)` reports an error as position `pos`.

- Procedure `getOrdinal()` increase the counter variable `ordinalCounter` by one each time it is called.

- procedure `getSym()` reads the next symbol in `sym` and assigns it to variables sym and val. When it reaches the end, `sym` is set to `EOF`; if `sym` is `IDENT`, `val` is a string. 


In [1]:
def getChar():
    global pos, ch
    if pos < len(src): ch, pos = src[pos], pos + 1
    else: ch, pos = chr(0), pos + 1

def error(msg):
    raise Exception(src + '\n' + (pos - 1) * ' ' + '^ ' + msg)

Procedure `getSym()` parses:
    
    symbol ::= (identifier | 'true' | 'false' | '¬' | '∧' | '∨' |
            '→' | '≡' | '∀' | '∃' | '❙' | '•' | '(' | ')' | '=' | 
            '≠' | ',' | '-' | '<' | '>' )
    identifier ::= letter {letter}
    letter ::= 'a' | ... | 'z' | 'A' | ... | 'Z'
    
Identifiers can only be a single letter, either uppercase or lowercase.

In [3]:
IDENT = 0; TRUE = 1; FALSE = 2; NOT = 3; AND = 4; OR = 5
IMPLIES = 6; EQUIVALENT = 7; INEQUIVALENT = 8; LPAREN = 9
RPAREN = 10; EXIST = 11; FORALL = 12; EQUAL = 13; NOTEQUAL = 14
SUM = 15; BAR = 16; DOT = 17; EOF = 18

KEYWORDS = {'true': TRUE, 'false': FALSE}

ordinalCounter = -1

def getOrdinal():
    global ordinalCounter
    ordinalCounter += 1
    return ''

def getSym():
    global sym, val
    while ch in ' \t\r\n': getChar()
    pos0 = pos
    if 'A' <= ch <= 'Z' or 'a' <= ch <= 'z':
        start = pos - 1
        while ('A' <= ch <= 'Z') or ('a' <= ch <= 'z'): getChar()
        val = src[start: pos - 1]
#         Multiple-letter identifiers allowed
        sym = KEYWORDS[val] if val in KEYWORDS else IDENT
#         Only single-letter identifiers allowed
#         if val in KEYWORDS:
#             sym = KEYWORDS[val]
#         elif len(val) == 1:
#             sym = IDENT
#         else:
#             error('only single letter identifier allowed')
    elif ch in '¬!': getChar(); sym = NOT
    elif ch == '∧': getChar(); sym = AND
    elif ch == '∨': getChar(); sym = OR
    elif ch in '⇒→': getChar(); sym = IMPLIES
    elif ch in '⇔≡': getChar(); sym = EQUIVALENT
    elif ch == '≢': getChar(); sym = INEQUIVALENT
    elif ch == '≠': getChar(); sym = NOTEQUAL
    elif ch == '-':
            getChar()
            if ch == '>': getChar(); sym = IMPLIES
    elif ch == '=':
            getChar()
            if ch == '>': getChar(); sym = IMPLIES
            else: getChar(); sym = EQUAL
    elif ch == '<':
            getChar()
            if ch == '=': getChar(); 
            if ch == '>': getChar(); sym = EQUIVALENT
    elif ch == '(': getChar(); sym = LPAREN
    elif ch == ')': getChar(); sym = RPAREN
    elif ch == '∀': getChar(); sym = FORALL
    elif ch == '∃': getChar(); sym = EXIST
    elif ch in '❙|': getChar(); sym = BAR
    elif ch in '•·': getChar(); sym = DOT
    elif ch == chr(0): sym = EOF
    else: error('unexpected character')