# Of Scope and Symbol Tables

All modern high-level programming languages have some notion of [scope](https://en.wikipedia.org/wiki/Scope_(computer_science)) applied to symbols that can appear in a program.  The scope of a program symbol defines the lifetime or visibility of that symbol. Program symbols that are subject to scoping rules include variable as well as function names and in object oriented languages it also includes type/class symbols among many other kind of symbols that can appear in a programming language.  Scoping of program symbols evolved in programming languages as programs became more complex and limiting the visibility of program symbols was necessary in order to avoid name clashes sources of bugs.  

Early dialects of Fortran and Basic only had global scope in that all program symbols where visible everywhere in a program.  Nowadays, almost all programming language implement static or lexical scoping where the scope of a program symbol can be determined by just looking at the program text.  This is in contrast with dynamic scoping where the scope of a program symbol is determined at runtime.  Here we only look at lexical scoping.

If a symbol is no longer available or accessible then we say that it is *out of scope*.
Consider function local symbols such as parameters variables.  These symbols are only available to the code of the function but are out of scope for all other code.

Perhaps the simplest scope is the *block scope* characterized in C-like languages with the open and closed curly braces.  All declarations of program symbols within the curly braces are local to that block scope and become out of scope as soon the execution thread leaves that scope.

The languages we have defined so far only have global scope, that is, all symbols, in particular variable names, are visible everywhere. This allowed us to have variables defined by just using them.  Once we introduce scope we need a notion declaration to assert in which scope a symbol is visible.  Here we start with variable declarations and in later chapters we will introduce additional symbols such as function names subject to scoping rules.

It is interesting that Python takes a very different approach to scoping by assuming that all variables are function local unless they are declared explicitly as *global* variables.  In our approach we are going to be more traditional and allow the user to specify explicitly in which scope a variable is visible.

In [1]:
# let the notebook access the code folder
import sys
sys.path.insert(1,"code")

# The Cuppa2 Language

We extend our Cuppa1 language with variable declarations of the form,
```
declare x = 10;
```
This particular statement declares the variable `x` in the current scope and initializes it to the value 10. If the current scope is the global (outermost) scope then we call `x` a *global* variable otherwise it is considered a *local* variable within some scope.  We call this new language Cuppa2 which is syntactically identical to Cuppa1 with the exception of the `declare` statement.

Here is the grammar of Cuppa2 which of course is identical to the grammar of Cuppa1 with the exception of the addition of the declaration statement under the non-terminal `stmt`,

```Python
    program : stmt_list

    stmt_list : stmt stmt_list
              | empty

    stmt : DECLARE ID opt_init opt_semi
         | ID '=' exp opt_semi
         | GET ID opt_semi
         | PUT exp opt_semi
         | WHILE '(' exp ')' stmt
         | IF '(' exp ')' stmt opt_else
         | '{' stmt_list '}'

    opt_init : '=' exp
             | empty
             
    opt_else : ELSE stmt
             | empty
             
    opt_semi : ';'
             | empty

    exp : exp PLUS exp
        | exp MINUS exp
        | exp TIMES exp
        | exp DIVIDE exp
        | exp EQ exp
        | exp LE exp
        | INTEGER
        | ID
        | '(' exp ')'
        | MINUS exp %prec UMINUS
        | NOT exp

    empty :
```

The parser specification [(`cuppa2_gram.py`)](code/cuppa2_gram.py) and the lexer specification [(`cuppa2_lex.py`)](code/cuppa2_lex.py) for Cuppa2 are available in the [`code`](code) folder.  We can test our parser and lexer code to make sure it works as expected,

In [2]:
from cuppa2_lex import lexer
from cuppa2_gram import parser

parser.parse("declare x = 1; put x", lexer=lexer)

Generating LALR tables


## Cuppa2 Programs

With Cuppa2 we can now write programs that have scoped variable declarations.  Here is a program that uses simple block scopes to limit the visibility of different declarations of the variable `x`,
```C
// declare a global instance of x
declare x = 1;

// create a local block scope
{
    // declare a local instance of x
    declare x = 2;
    // print out the value of local instance
    put x;
}

// create another local block scope
{
    // declare a local instance of x
    declare x = 3;
    // print out the value of local instance
    put x;
}

// print out the value of the global instance
put x;
```
The expected output of this program is,
```
2 3 1
```
What is interesting is that as soon as we introduce scope we can observe the phenomenon of [*variable shadowing*](https://en.wikipedia.org/wiki/Variable_shadowing). As soon as we declare the variable `x` in a local block scope we see that the global instance of `x` is no longer visible to the code in the local scope (look at the output of the corresponding `put` statement) and we say that the global instance of `x` is *shadowed* by the local instance of `x`.

Variable shadowing is the source of many subtle bugs in software where global variables often represent global parameters.  If not careful such a global parameter can easily be shadowed by a carelessly introduced local variable making that global parameter inaccessible to the code in that local scope.  Large software organization such as Microsoft have recognized that problem and have introduced coding standards to alleviate this source of bugs.  For example, in Microsoft's [Coding Style Conventions](https://docs.microsoft.com/en-us/windows/win32/stg/coding-style-conventions) all global variables have to start with the prefix `g_` and that prefix is forbidden for local variables therefore making it impossible for a local variable to shadow a global variable.

Introducing scope also introduces *non-local access* of variables in the sense that scoping forces us to think about reading and modifying the values of variables which are outside of our local scope.  Take a look at the following program which reads and updates the global variable `x` from within a local scope,
```C
// declare global instance of x
declare x = 2;

// start a local scope
{
      declare y = 3;
      // read and update non-local x
      x = y + x;
}

// print out the updated value of x
put x;
```
The expected output of this program is,
```
5
```
In order to deal with scoped programs like the ones above we need a more sophisticated version of our symbol table which up to this point has only been a global dictionary.

# A Symbol Table for supporting Scope

In order to support scoped programs we need something more sophisticated than the symbol table structure that consists of a single global dictionary we have been using so far.  What is perhaps remarkable is that scopes behave like a stack of symbol declarations.  Consider the following program,
```C
declare x = 1;

{
    declare y = 2;
    
    {
        declare z = 3;
        put x + y + z;
    }
}
```
Here we are dealing with three nested scopes and at each scope level we declare a symbol and initialize it.  At the innermost scope we look up all the symbols declared and initialized so far and print out the sum of their values.  The implicit behavior of this program is that if a declaration for a variable cannot be found in the current scope then we search the outer scopes until we reach the global scope.  Once we reach the global scope and we still  cannot find the symbol declaration then we are faced with an error because this means that the variable was not declared properly.


<p align="center">
  <img width="600" height="4500" src="figures/chap07/1/figure.jpg">
</p>
<!-- ![alt text](figures/chap07/1/figure.jpg) -->
<p style="text-align: center;">
Fig. 1: Visualizing nested scopes as a stack of scopes.
</p>

We can visualize the nested scopes as a stack of scopes each holding the appropriate declarations (Figure 1).  Here we modeled the scope stack as a linked list of scope objects. It is now easy to interpret the expression at the innermost scope of our program above: if a variable declaration cannot be found in the scope at the top of the stack we simply walk down the stack until we find the variable declaration in question.  If we reach the bottom of the stack, the global scope, and we still haven't found the desired variable declaration then we are faced with a *variable not declared error*.

With this view of nested scopes as a stack of scopes it is also pretty straight forward to visualize updating the values of non-local variables: walk down the stack until you find the variable declaration of the variable you want to update and then update it.  It is an error if you do not find a declaration for the variable you want to update.  Try it with this program,
```C
declare x = 2;

{
    declare y = 3;
    x = x + y;
}
put x;
```
The output of this program is `5`.

This gives us the basic outline of our new symbol table design: *a stack of dictionaries*.

---
Watch an animation of a program with scoping.  Again, here the stack of dictionaries is modeled as a linked list.

<br>
<a href="http://www.youtube.com/watch?feature=player_embedded&v=uh9Qm5AeVjs">
<img style='border:1px solid #000000' src="movie.jpg" width="120" height="90" />
</a>

---

## Symbol Table Design

Our new [symbol table](https://en.wikipedia.org/wiki/Symbol_table) design as a stack of scopes has to support the following functionality,

* Push a scope.
* Pop a scope.
* Declare a variable.
* Look up a variable.
* Update a variable value.

In addition, our symbol table design will need to implement our programming language semantics we have relied upon when we worked through the examples in the previous section,

* The `declare` statement inserts a variable declaration into the current scope.
* A variable look up returns the value associated with a variable from the current scope or the surrounding scopes.
* Every variable needs to be declared before use.
* No variable can be declared more than once in the current scope.

Here the notion of *current scope* is now easy to understand, it is simply the scope at the top of the scope stack in the symbol table.

The implementation of our symbol table is an object that holds the scope stack and has an appropriate interface to support the functionality mentioned above,
```Python
CURR_SCOPE = 0

class SymTab:

    #-------
    def __init__(self):
        # global scope dictionary must always be present
            self.scoped_symtab = [{}]

    #-------
    def push_scope(self):
        # push a new dictionary onto the stack - stack grows to the left
        self.scoped_symtab.insert(CURR_SCOPE,{})

    #-------
    def pop_scope(self):
        # pop the left most dictionary off the stack
        if len(self.scoped_symtab) == 1:
            raise ValueError("cannot pop the global scope")
        else:
            self.scoped_symtab.pop(CURR_SCOPE)

    #-------
    def declare_sym(self, sym, init):
        # declare the symbol in the current scope: dict @ position 0
                 …

    #-------
    def lookup_sym(self, sym):
        # find the first occurrence of sym in the symtab stack
        # and return the associated value
                 …

    #-------
    def update_sym(self, sym, val):
        # find the first occurrence of sym in the symtab stack
        # and update the associated value
                 …
```
The `...` in the code above means *don't worry about the implementation details right now*. We see that the constructor inserts a member `scoped_symtab` into objects of class `SymTab` and initializes it as a list with a single empty dictionary in it.  This is the dictionary for the global scope.  This dictionary has to always be present and it is an error to try to pop this global dictionary off the stack. Next we have the interface that allows us to manipulate the scope stack.  The function `push_scope` inserts a new dictionary on top of the stack.  In this case the top of the stack is at the list position 0.  We do that so that the top of the stack is always reachable by simply looking at the list position 0.   The constant `CURR_SCOPE` refers to that position in the list.  Next we have the function `pop_scope` that removes the dictionary on the top of the stack.  Following the functions that manipulate the scope in the symbol table we have the interface functions that manipulate the symbols within the symbol table.  Let's look at those more carefully.

First we have the `declare_sym` function,
```Python
def declare_sym(self, sym, init):
    # declare the symbol in the current scope: dict @ position 0
        
    # first we need to check whether the symbol was already declared
    # at this scope
    if sym in self.scoped_symtab[CURR_SCOPE]:
        raise ValueError("symbol {} already declared".format(sym))
        
    # enter the symbol in the current scope
    scope_dict = self.scoped_symtab[CURR_SCOPE]
    scope_dict[sym] = init
```
This function inserts a symbol together with its value in the current scope.  But before it does so it checks whether the symbol already exists in the current scope.  It it does then it throws an exception according to our rule that *symbols can only be declared once in the current scope*.

Next we have the `lookup_sym` function,
```Python
def lookup_sym(self, sym):
    # find the first occurrence of sym in the symtab stack
    # and return the associated value

    n_scopes = len(self.scoped_symtab)

    for scope in range(n_scopes):
        if sym in self.scoped_symtab[scope]:
            val = self.scoped_symtab[scope].get(sym)
            return val

    # not found
    raise ValueError("{} was not declared".format(sym))
```
This function finds a symbol declaration in our stack of scopes.  Notice that the function iterates through the stack of scopes starting with the current scope (at position 0 in `scoped_symtab`) all the way to the global scope (the last dictionary in the `scoped_symtab` list).  If it finds a symbol declaration in one of the dictionaries it will break out of the loop and return its associated value.  If the function iterates through the whole stack without finding a declaration of the symbol it drops through the loop and throws an exception indicating that the symbol was not declared enforcing our rule that *variables need to be declared before use*.

The last function in the `SymTab` interface is the `update_sym` function,
```Python
def update_sym(self, sym, val):
    # find the first occurrence of sym in the symtab stack
    # and update the associated value

    n_scopes = len(self.scoped_symtab)

    for scope in range(n_scopes):
        if sym in self.scoped_symtab[scope]:
            scope_dict = self.scoped_symtab[scope]
            scope_dict[sym] = val
            return

    # not found
    raise ValueError("{} was not declared".format(sym))
```
The idea here is that we need to find the declaration of a variable closest to our current scope (including the current scope) and update its associated value.  Notice that the code is very similar to the `lookup_sym` function in that it iterates over the stack of scope dictionaries until it finds a declaration of the variable symbol.  As soon as it finds the variable declaration it will update the associated dictionary with the new value and break out of the loop and return.  If it cannot find a declaration record for the variable then it will drop through the loop and throw an exception enforcing the rule that *variables need to be declared before they are used*.

The source code for the Cuppa2 symbol tables can be found in file [`cuppa2_symtab.py`](code/cuppa2_symtab.py) in the [`code`](code) folder.

# A Cuppa2 Interpreter

With the new symbol table in place we are ready to construct an interpreter for Cuppa2.  It turns out that we can reuse most of the code from our Cuppa1 interpreter with the exception that we have to add new code for the `declare` statement and we have to adjust the code for the block and assignment statements amongst a few other to deal with the new scoped symbol table.  As usual the [Cuppa2 front end](code/cuppa2_frontend_gram.py) generates an AST which our `walk` function will then interpret.  One thing to mention about the front end is that we no longer just put symbols into the symbol as we parse them.  We have to delay that until later when we know more about the scope in which  various variables can appear in.  The AST for Cuppa2 includes one additional type of node compared to the Cuppa1 AST: the `declare` node type.  Other than that the AST for Cuppa1 and Cuppa2 are identical.  The lexer for Cuppa2 is unchanged from the lexer discussed for Cuppa1.

Let's test our front end by processing a simple program that declares a variable and then prints out its value.

In [3]:
from cuppa2_lex import lexer
from cuppa2_frontend import parser
from cuppa2_state import state
from grammar_stuff import dump_AST

In [4]:
parser.parse("declare x = 1; put x", lexer=lexer)
dump_AST(state.AST)


(seq 
  |(declare x 
  |  |(integer 1)) 
  |(seq 
  |  |(put 
  |  |  |(id x)) 
  |  |(nil)))


Moving on to the interpretation tree walker we see that the additional node type in the Cuppa2 AST is reflected in the [interpretation tree walker](code/cuppa2_interp_walk.py),
```Python
def walk(node):
    # node format: (TYPE, [child1[, child2[, ...]]])
    type = node[0]
    
    if type in dispatch_dict:
        node_function = dispatch_dict[type]
        return node_function(node)
    else:
        raise ValueError("walk: unknown tree node type: " + type)

# a dictionary to associate tree nodes with node functions
dispatch_dict = {
    'seq'     : seq,
    'nil'     : nil,
    'declare' : declare_stmt,
    'assign'  : assign_stmt,
    'get'     : get_stmt,
    'put'     : put_stmt,
    'while'   : while_stmt,
    'if'      : if_stmt,
    'block'   : block_stmt,
    'integer' : integer_exp,
    'id'      : id_exp,
    'paren'   : paren_exp,
    '+'       : plus_exp,
    '-'       : minus_exp,
    '*'       : times_exp,
    '/'       : divide_exp,
    '=='      : eq_exp,
    '<='      : le_exp,
    'uminus'  : uminus_exp,
    'not'     : not_exp
}
```
Notice that the dispatch dictionary associates the `declare` node with the `declare_stmt` node function,
```Python
def declare_stmt(node):

    try: # try the declare pattern without initializer
        (DECLARE, name, (NIL,)) = node
        assert_match(DECLARE, 'declare')
        assert_match(NIL, 'nil')

    except ValueError: # try declare with initializer
        (DECLARE, name, init_val) = node
        assert_match(DECLARE, 'declare')
        
        value = walk(init_val)
        state.symbol_table.declare_sym(name, value)

    else: # declare pattern matched
        # when no initializer is present we init with the value 0
        state.symbol_table.declare_sym(name, 0)
```
The `declare_stmt` node function has to deal with two different flavors of the `declare` node: one with an initializer and one without one.  We can see that the node function first attempts to pattern match the `declare` node without an initializer by matching the `nil` tree node in the last argument.  If this pattern is successfully matched execution continues in the `else` clause of the [`try-except` Python statement](https://docs.python.org/3/tutorial/errors.html) where we declare the variable with a default initial value of 0 in the current scope in the symbol table.

Python will throw an exception should the declaration pattern without the initializer fail to match.  In this case the code in the `except` clause of the [`try-except` Python statement](https://docs.python.org/3/tutorial/errors.html) will be executed.  Here we try to match the current AST node against a pattern that assumes we have an initializer.  If this pattern match is successful then we walk the expression tree of the initialization value computing an actual value which we then associate with the variable name during its declaration in the symbol table.

We have other node functions impacted by the new scoping rules and the new symbol table structure.  We start with the assignment statement,
```Python
def assign_stmt(node):

    (ASSIGN, name, exp) = node
    assert_match(ASSIGN, 'assign')
    
    value = walk(exp)
    state.symbol_table.update_sym(name, value)
```
What changed between in the `assign_stmt` function from Cuppa1 to Cuppa2 is the fact that in Cuppa2 variables have to be declared before you can assign values to them with the assignment statement.  Therefore the behavior of the assignment statement in Cuppa2 is an update rather than a symbol definition in a dictionary as in Cuppa1.  Furthermore, recall that that update can be a non-local update, that is, we are potentially updating a variable outside of our local scope.

The next function we look at is the `get_stmt` function,
```Python
def get_stmt(node):

    (GET, name) = node
    assert_match(GET, 'get')

    s = input("Value for " + name + '? ')
    
    try:
        value = int(s)
    except ValueError:
        raise ValueError("expected an integer value for " + name)
    
    state.symbol_table.update_sym(name, value)
```
Again we can see that it is the same function as in Cuppa1 with the exception that the symbol table performs an update rather than a symbol definition.

The `block_stmt` function is interesting because here we explicitly manipulate the scope stack of the symbol table reflecting the fact that a block statement in Cuppa2 introduces a new scope,
```Python
def block_stmt(node):
    
    (BLOCK, stmt_list) = node
    assert_match(BLOCK, 'block')
    
    state.symbol_table.push_scope()
    walk(stmt_list)
    state.symbol_table.pop_scope()
```
Here we push a new scope onto the scope stack in the symbol table just before we walk the statement list of the block statement.  Once we are done interpreting the statements within the block statement we pop the scope off the stack implementing the behavior we have intuitively used when we looked at the Cuppa2 programs at the beginning of this chapter.

This only leaves the `id_exp` node function.  This function retrieves the value of a variable within an expression,
```Python
def id_exp(node):
    
    (ID, name) = node
    assert_match(ID, 'id')
    
    return state.symbol_table.lookup_sym(name)
```
Again this function is identical to the function in Cuppa1 except that we have to deal with the fact that we might be trying to retrieve the value of a non-local variable, a variable not in our current scope.  Therefore this function uses the `lookup_sym` function from our scoped symbol table.

All other node function remain identical to the node functions in Cuppa1.  Which in some sense is kind of remarkable and illustrates an interesting modularity in programming language design.

Before we try our new interpreter we should mention that the [state object](code/cuppa2_state.py) in Cuppa2 is virtually identical to the state object in Cuppa1 except that it makes reference to the new scoped symbol table.

Let's try our new interpreter on some of the programs we discussed earlier.

In [5]:
from cuppa2_lex import lexer
from cuppa2_frontend import parser
from cuppa2_state import state
from cuppa2_interp_walk import walk

Here is the first program we discussed and according to our scoping rules the output should be the sequence `2 3 1`.

In [6]:
program =\
'''
// declare a global instance of x
declare x = 1;

// create a local block scope
{
    // declare a local instance of x
    declare x = 2;
    // print out the value of local instance
    put x;
}

// create another local block scope
{
    // declare a local instance of x
    declare x = 3;
    // print out the value of local instance
    put x;
}

// print out the value of the global instance
put x;
'''

state.initialize()
parser.parse(program, lexer=lexer)
walk(state.AST)

> 2
> 3
> 1


Success!

The next program we discussed was a program with non-local variable access.  This program reads and updates the global variable `x` from within a nested local scope.  The expected output is the value `5`.

In [7]:
program =\
'''
// declare global instance of x
declare x = 2;

// start a local scope
{
      declare y = 3;
      // read and update non-local x
      x = y + x;
}

// print out the updated value of x
put x;

'''

state.initialize()
parser.parse(program, lexer=lexer)
walk(state.AST)

> 5


Again, our interpreter works as expected!

Let's try one more.  This program reads the values of various local and non-local variables.  The expected output is `6`.

In [8]:
program = \
'''
declare x = 1;

{
    declare y = 2;

    {
        declare z = 3;
        put x + y + z;
    }
}
'''

state.initialize()
parser.parse(program, lexer=lexer)
walk(state.AST)

> 6


Yup, just as expected!

As we have done for Cuppa1 and our other languages we can wrap the lexing, parsing and tree walking into a convenient [`interp`  function](code/cuppa2_interp.py), which by the way looks identical to the `interp` function in Cuppa1 except that it of course references our new objects,

```Python
from cuppa2_lex import lexer
from cuppa2_frontend import parser
from cuppa2_state import state
from cuppa2_interp_walk import walk

def interp(input_stream):

    # initialize the state object
    state.initialize()

    # build the AST
    parser.parse(input_stream, lexer=lexer)

    # walk the AST
    walk(state.AST)
```
We can call this function to execute our Cuppa2 programs. Let's give it a try,

In [9]:
from cuppa2_interp import interp

program = \
'''
declare x = 1
put x
'''

interp(program)

> 1


# Syntactic vs. Semantic Errors

As the programming languages we are developing here are becoming more complex we can observe that different classes of errors can happen within a program.  Two such classes are *syntactic* and *semantic* errors. We can define these classes of errors as follows,

* Syntactic errors are errors that occur in the structure of a program such as a missing semicolon or a misspelled keyword.

* Semantic errors are errors in the behavior of a program such as using a variable before it was declared.

Syntactic errors can be detected and reported by the parser.  However, in contrast programs with semantic errors are usually syntactically correct.  That is the parser cannot detect these errors and we have to rely on other ways of detecting these errors.  In the case of Cuppa2 we used the symbol table to enforce some of the semantics or behavior rules of the language with regards to variable declarations.  

A classic example of a semantic error  is the [division by zero](https://en.wikipedia.org/wiki/Division_by_zero) error.  It is a semantic error because in general it is not detectable by an interpreter or compiler at a parse time. In most programming languages a division by zero results in the termination of the program with an error.  An interpreter will detect this kind of error while executing a program.  However, what sets this error apart from for example, an undeclared variable error, is the fact that a compiler cannot detect this semantic error.  This kind of semantic error is due to a faulty calculation in the algorithm a program implements and since a compiler does not perform any computation but simply translates a program into a target language this faulty behavior is passed along to the translated program which will then terminate when this error occurs.

Another classic example of a semantic error is the [`null` pointer dereference](https://en.wikipedia.org/wiki/Null_pointer) in C.  In C it is illegal to dereference a pointer variable whose value is 0 or `null`.  Doing so will result in a "crash" of the program.  However, it is impossible for a C compiler to flag such a dereference at compile time because it is usally the result of a mistake in the algorithm a program implements.  Since a compiler does not perform any computations but simply translates a source program it cannot detect these kinds of errors and again the faulty behavior is passed along to the translated program which will then "crash" when this error occurs.  

Let's take a look at various syntactic and semantic errors.  The first program contains a misspelled keyword which can easily flagged by the parser,

In [10]:
program = \
'''
define x = 1
put x
'''

try:
    interp(program)
except Exception as e:
    print("Error: " + str(e))

Syntax error at 'x'
Error: x was not declared


Clearly the parser found the misspelled keyword on the first line of the program right before the variable `x`.  Our parser implementation considers syntax errors recoverable errors and therefore just flags them and keeps going.  Which brings us to the second error message above.  Since there was no legal declaration of the variable `x` its occurrence in the `put` statement is flagged as an error.  Our second program makes this explicit.  Here we insert a semantic error by using a variable without declaring it,

In [11]:
program = \
'''
declare x = 1
put y + x
'''

try:
    interp(program)
except Exception as e:
    print("Error: " + str(e))

Error: y was not declared


Here we are dealing with a semantic error because the parser did not flag it as a syntactic error.  It is an error in the behavior of the program -- using the variable `y` without declaring it.  Again, the hallmark of semantic errors is that they occur in syntactically correct programs.

Let's take a look at the *division by zero*  error in the Cuppa2 interpreter,

In [12]:
program = \
'''
declare x = 0
put 4 / x
'''

try:
    interp(program)
except Exception as e:
    print("Error: " + str(e))

Error: integer division or modulo by zero


In contrast, we find that the Cuppa1 compiler cannot detect this error and passes it along to the Exp1Bytecode virtual machine.

In [13]:
from cuppa1_cc import cc as cuppa1_compiler
from exp1bytecode_interp import interp as bytecode_run

program = \
'''
x = 0
put 4 / x
'''

print("**** compile code ****")
try:
    bytecode = cuppa1_compiler(program)
except Exception as e:
    print("Error: " + str(e))

print("**** run code ****")
try:
    bytecode_run(bytecode)
except Exception as e:
    print("Error: " + str(e))

**** compile code ****
**** run code ****
Error: integer division or modulo by zero


As predicted, the compiler was not able to find the semantic error and passed it along to the abstract machine which found the error when it executed the translated Cuppa1 program.

# Compiling Scoped Code

Compiling scoped code raises a set of issues because most low-level languages do not support scoping.  Since scoping is purely a high-level language construct restricting the visibility of symbols to make programs more robust, the compiler has to somehow simulate scoping in the low-level target language.  As we will see, an effective way to simulate scoping is by cleverly renaming variables so that variables with the same name in different scopes do not clash in the global scope of the target language.

How the actual declarations of variables in the source language get mapped into the target language depends on the target language.
In our case, where we are interested in building a compiler from the Cuppa2 language to Exp1bytecode, the declarations of variables in Cuppa2 just become assignment statements in our low-level target language.  In assembly code for actual machine language the declaration statements of a high-level language are typically mapped to *symbolic memory locations* of the target machine and these symbolic memory locations then serve as the variables in the assembly code. Whatever the precise mapping is, it is clear that variable declarations and scope are purely high-level language constructs and it is the job of the compiler to map those into appropriate low-level structures in the target language.

<p align="center">
  <img width="800" height="600" src="figures/chap07/2/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 2: Exp1Bytecode for a Cuppa2 program with only gobal variables.
</p>


For illustration purposes let's take a look at a couple of examples where we map Cuppa2 programs into Exp1Bytecode. Figure 2 shows a Cuppa2 program that only declares global variables.  Scoping has no impact on the translation of this program into Exp1Bytecode and the translated program will execute correctly.

<p align="center">
  <img width="800" height="600" src="figures/chap07/3/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 3: Naively translating a Cuppa2 program with local scopes into Exp1Bytecode.
</p>

The situation changes drastically as soon as we introduce variables declared in local scopes in our Cuppa2 program it Figure 3. Here we  show the naive translation of such a program into Exp1Bytecode.  We have color coded the code according to scoping level: *red* for the global scope and *green* for the nested scope.  It is easy to see that the red `x` and the green `x` clash in the global scope of the target language.  Because of this clash the translated program generates incorrect output.


<p align="center">
  <img width="800" height="600" src="figures/chap07/4/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 4: Translating a Cuppa2 program with local scopes into Exp1Bytecode using variable name prefixes to simulate scope.
</p>

We can restore the correct behavior of the translated program by renaming the variables in the translated program so that variables with the same name but declared in different scopes in the source language do not clash in the global scope of the target language.  Here we use a name prefix that is based on the scope where the variable was declared.   If the variable was declared in the global scope then it receives a prefix of `R$` in the target language.  If it was declared in a local scope at the first nesting level then it receives a prefix of `R$$`.  If the variable was declared in local scope at the second nesting level then it receives a prefix of `R$$$`, *etc.*  It doesn't really matter what the prefix is as long as it is easy to calculate in the compiler and does not clash with possible variable names of the source language.  Here we chose a prefix based on the `$` sign since this symbol cannot appear in a variable name in Cuppa2.

Figure 4 shows a working translation of our previously problematic Cuppa2 program.  Notice that the two variables no longer clash in the target language and the correct behavior of the source program is restored in the translated program.

<p align="center">
  <img width="800" height="600" src="figures/chap07/5/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 5: Translating a program with multiple local scopes. 
</p>


Figure 5 illustrates the translation of a Cuppa2 program with multiple local scopes. This works because the two local scopes are at the same level and can never be active at the same time.  Therefore, it is safe to reuse the variable names due to the scoping.  The only caveat here is that you have to be careful that the variables are properly initialized before you use them in context of the second scope.  In our compiler we will always do that.  If no initializer exists for a local variable then we initialize the variable with the default value 0.

# A Cuppa2 Compiler

The architecture for our Cuppa2 compiler shown in Figure 6 has the same architecture we used for our basic Cuppa1 compiler in Chapter 6.  We have a front end constructing our AST and we use a tree walker for our code generation phase.  Finally, the `output` module turns the generated instruction tuples into actual Exp1bytecode.

<p align="center">
  <img width="800" height="600" src="figures/chap07/6/figure.jpg">
</p>
<p style="text-align: center;">
Fig. 6: Cuppa2 compiler architecture.
</p>


Just to reiterate, compilers do not compute values instead they validate the source program as much as they can, making sure that the syntactic structure and the intended behavior of the program are correct but they do not execute the program.  If the correctness of the source program is established a compiler generates code for the target machine that then executes it exhibiting the intended behavior such as computing values and interacting with the user among many other things.

The fact that compilers do not compute values but validate the source program has an effect on the symbol table:
rather than storing variable-value pairs the symbol table act merely as a record holder for variables seen/declared in order to enforce semantic rules such as *each variable has to be declared before its use*.
In our case, it is convenient to use the symbol table to compute and store (name, target-name) pairs where the target-name is the name generated by adding a prefix such as our `R$$$` prefixes to indicate to which scope level a 
variable belongs to.

## The Symbol Table and Front End

As we mentioned above, the symbol table in compilers does not hold (name, value) pairs but instead is used to validate declarations of variables throughout the program.  In our case we will also use the symbol table to compute and store our name prefixes for Exp1Bytecode variable names.  Here is the [Cuppa2 symbol table](code/cuppa2_cc_symtab.py) modified for our purposes,
```Python
class SymTab:

    #-------
    def __init__(self):
        …
    #-------
    def push_scope(self):
        # push a new dictionary onto the stack - stack grows to the left
        …
    #-------
    def pop_scope(self):
        # pop the left most dictionary off the stack
        …
    #-------
    def declare_sym(self, sym):
        # declare the symbol in the current scope: dict @ position 0
        
        # first we need to check whether the symbol was already declared
        # at this scope
        if sym in self.scoped_symtab[CURR_SCOPE]:
            raise ValueError("symbol {} already declared".format(sym))
        
        # enter the symbol in the current scope
        n_scopes = len(self.scoped_symtab)
        prefix = create_prefix(n_scopes-1)
        scope_dict = self.scoped_symtab[CURR_SCOPE]
        scope_dict[sym] = prefix + sym # value is the prefixed name

    #-------
    def lookup_sym(self, sym):
        # find the first occurrence of sym in the symtab stack
        # and return the associated value
        …
```
Notice that we no longer have an `update_sym` function.  This function is superfluous in the context of a compiler since it does not compute values.  Furthermore, notice that declaring a symbol no longer requires a value but instead the function `declare_sym` computes a prefix based on the current scoping level and uses it as a value in the (name, value) pair in the symbol table.

The [compiler front end](code/cuppa2_cc_frontend.py) is the same front end we used for the interpreter with the exception that here we need to include the [compiler state object](code/cuppa2_cc_state.py) which in turn uses the new symbol table from above.  The compiler front end generates the same AST as the interpreter front end,

In [14]:
from cuppa2_lex import lexer
from cuppa2_cc_frontend import parser
from cuppa2_cc_state import state
from grammar_stuff import dump_AST

In [15]:
parser.parse("declare x = 1; put x", lexer=lexer)
dump_AST(state.AST)


(seq 
  |(declare x 
  |  |(integer 1)) 
  |(seq 
  |  |(put 
  |  |  |(id x)) 
  |  |(nil)))


## The Code Generator

Because the compiler is based on the Cuppa1 compiler we can use most of that code generator but we need to modify the code generation tree walker for declaration, get, and assignment statements as well as for variables that can appear in expressions.  This is not unlike the modifications we had to make to the interpretation tree walker for our Cuppa2 interpreter.  Here is the node function that deals with variable declaration AST nodes,
```Python
def declare_stmt(node):
    
    try: # try the declare pattern without initializer
        (DECLARE, name, (NIL,)) = node
        assert_match(DECLARE, 'declare')
        assert_match(NIL, 'nil')
    
    except ValueError: # try declare with initializer
        (DECLARE, name, init_val) = node
        assert_match(DECLARE, 'declare')
        
        state.symbol_table.declare_sym(name)
        scoped_name = state.symbol_table.lookup_sym(name)
        value = walk(init_val)
        code = [('store', scoped_name, str(value))]

        return code

    else: # declare pattern matched
        # when no initializer is present we init with the value 0
        state.symbol_table.declare_sym(name)
        scoped_name = state.symbol_table.lookup_sym(name)

        code = [('store', scoped_name, '0')]

        return code
```
As we have seen when we developed the interpreter `declare` nodes come in two flavors: with and without an initializer.  The interesting part here is that both flavors of the declaration statements are translated into `store` statements in the target language.  Note that we declare a symbol in the symbol table to make it known that that symbol was declared at the source language level.  We need to do that in order to be able to enforce the a *variable needs to be declared before using it* semantic rule.  However, right after declaring the symbol we turn around and ask the symbol table for the associated translated name which now has the scope prefix attached to it.  We use this translated name to generate the `store` instruction tuple.

In assignment and get statements the `lookup_sym` serves double duty as well. By looking up a symbol we enforce the rule that a symbol needs to be declared before use and we also retrieve the translated name of the symbol at the same time. Here are the two relevant node functions,
```Python
def assign_stmt(node):

    (ASSIGN, name, exp) = node
    assert_match(ASSIGN, 'assign')
    
    exp_code = walk(exp)
    scoped_name = state.symbol_table.lookup_sym(name)
    code = [('store', scoped_name, exp_code)]
    
    return code

```
and,
```Python
def get_stmt(node):

    (GET, name) = node
    assert_match(GET, 'get')

    scoped_name = state.symbol_table.lookup_sym(name)
    code = [('input', scoped_name)]

    return code
```
As we can see, assignment statements are mapped into `store` instructions and get statements are mapped into `input` instructions in the identical way as we have seen in the Cuppa1 compiler with the exception that now we use the translated name in the target code rather than the source language name.

The leaves variable in expressions to look at.  Here is the node function for variables that can appear in expressions,
```Python
def id_exp(node):
    
    (ID, name) = node
    assert_match(ID, 'id')
    
    scoped_name = state.symbol_table.lookup_sym(name)
    
    return scoped_name

```
This node function simply maps the source name of a variable into the translated name of the variable and again enforces that fact that variables need to be declared before use.  The remainder of the code generator remains unchanged with the exception that we need to insert the `declare_stmt` node function into the dispatch dictionary in an appropriate way. The file [`cuppa2_cc_codegen.py`](code/cuppa2_cc_codegen.py) contains the full code generator.

The [output phase](code/cuppa2_cc_output.py) is exactly the same as the Cuppa1 output phase, but in order to keep things simple we deleted the peephole optimization.


Here are some final thoughts on the Cuppa2 compiler before we move on to testing it. The difference between the Cuppa1 and Cuppa2 languages is the introduction of scope and variable declarations. These are purely high-level language constructs and we see this manifested in that the only thing that really changed in the Cuppa2 compiler compared to the Cuppa1 compiler is how variables are named!
That means the Cuppa2 compiler is completely responsible for enforcing scope it cannot pass that through to the underlying abstract machine since this machine has no concept of scope.

## Testing the Compiler

In [16]:
from cuppa2_cc import cc as cuppa2_compiler
from exp1bytecode_interp import interp as bytecode_run
from cuppa2_examples import scope1, scope2

In [17]:
print(scope1)


declare x = 1;
{
    declare x = 2;
    put x;
}
{
    declare x = 3;
    put x;
}
put x;



According to the semantics of the Cuppa2 language the expected output for the program is,
```
2
3
1
```
Let's see if we can replicate that.  First, let's look at the translated code,

In [18]:
bytecode = cuppa2_compiler(scope1)
print(bytecode)

	store R$x 1 ;
	store R$$x 2 ;
	print R$$x ;
	store R$$x 3 ;
	print R$$x ;
	print R$x ;
	stop ;



That looks reasonable and running the code in the abstract machine gives us,

In [19]:
bytecode_run(bytecode)

> 2
> 3
> 1


Exactly what we expected!  Let's try the other program,

In [20]:
print(scope2)


declare x = 1;
put x;
{
    x = 2;
}
put x;
{
    x = 3;
}
put x;



Here we expect the output to be,
```
1
2
3
```
Running the compiler and the abstract machine,

In [21]:
bytecode = cuppa2_compiler(scope2)
print(bytecode)

	store R$x 1 ;
	print R$x ;
	store R$x 2 ;
	print R$x ;
	store R$x 3 ;
	print R$x ;
	stop ;



In [22]:
bytecode_run(bytecode)

> 1
> 2
> 3


Exactly what we expected! That means our compiler implements scoping correctly.

# Summary

Every modern programming language has some notion of scope in order to limit the visibility of symbols within programs.  Here we designed our Cuppa2 language by extending the Cuppa1 language with the `declare` statement allowing us declare variables in specific scopes and by allowing block statements to introduce new scope levels.  In order to support scoping properly we added functionality to our symbol table mechanism in the form of a stack of dictionaries.  As we saw, this works very well since nested scopes can be modeled nicely via a stack.  We built  a Cuppa2 interpreter and a compiler both loosely based on their Cuppa1 equivalents.  It was interesting to observe that we only had to modify the code directly affected by the language design change.  In addition we had to address specific challenges when compiling scoped code since in most cases the target language does not support any kind of scoping.

# Notes

Perhaps one of the earliest references to block structure and scope in programming language design can be found in C.A.R. Hoare's paper,

Hoare, C. A. (1973). [*Hints on programming language design*](http://i.stanford.edu/pub/cstr/reports/cs/tr/73/403/CS-TR-73-403.pdf). (No. STAN-CS-73-403). STANFORD UNIV CA DEPT OF COMPUTER SCIENCE.

Kernighan and Ritchie describe the block structure and the effect of variable shadowing (hiding) in section 4.8 in their classic C programming language book,

Kernighan, B. W., Ritchie, D. M. (1988). *The C Programming Language, 2nd Edition*. Prentice-Hall software series.

This is perhaps interesting because the block structure design of our Cuppa family of languages is loosely based on their ideas.

# Exercises

1. Write a Cuppa2 compiler that translates Cuppa2 programs into your favorite programming language. Can you take advantage of scoping rules in the target language?

1. Design and implement a programming language of your own choosing that also supports scoping.

1. Write a Cuppa2 compiler that translates Cuppa2 programs into Java virtual machine code.