# Forth Project: Semantic Analysis

Once syntax trees are built, additional analysis can be done by evaluating
attributes on tree nodes to gather necessary semantic information from the
source code not easily detected during parsing. It usually includes type
checking, symbol table construction to makes sure a variable is declared
before use, and decorating the AST to prepare it for the next compilation phase.

## Type System

First, you will need to define objects that represent the different
builtin data types and record information about their capabilities.

Let's define classes that represent types.  There is a general class used to represent
all types.  Each basic type is then a singleton instance of the type class.
```python
class uCType:
      pass

int_type = uCType("int", ...)
float_type = uCType("float", ...)
char_type = uCType("char", ...)
```
The contents of the type class is entirely up to you.  However, you will minimally need
to encode some information about what operators are supported (+, -, *, etc.), and
default values.

Once you have defined the built-in types, you will need to make sure they get registered
with any symbol tables or code that checks for type names.

In [4]:
class uCType:
    '''
    Class that represents a type in the uC language.  Basic
    Types are declared as singleton instances of this type.
    '''
    def __init__(self, name, binary_ops=set(), unary_ops=set(),
                 rel_ops=set(), assign_ops=set()):
        '''
        You must implement yourself and figure out what to store.
        '''
        self.typename = name
        self.unary_ops = unary_ops
        self.binary_ops = binary_ops
        self.rel_ops = rel_ops
        self.assign_ops = assign_ops

# Create specific instances of basic types. You will need to add
# appropriate arguments depending on your definition of uCType
IntType = uCType("int",
                 unary_ops   = {"-", "+", "--", "++", "p--", "p++", "*", "&"},
                 binary_ops  = {"+", "-", "*", "/", "%"},
                 rel_ops     = {"==", "!=", "<", ">", "<=", ">="},
                 assign_ops  = {"=", "+=", "-=", "*=", "/=", "%="}
                 )

FloatType = uCType("float",
                   # TODO: Complete
    )
CharType = uCType("char",
                  # TODO: Complete
    )

# Array, Pointer & Function types need to be instantiated for each declaration
class ArrayType(uCType):
    def __init__(self, element_type, size=None):
       """
       type: Any of the uCTypes can be used as the array's type. This
             means that there's support for nested types, like matrices.
       size: Integer with the length of the array.
       """
       self.type = element_type
       self.size = size
       super().__init__(None, unary_ops={"*", "&"}, rel_ops={"==", "!="})

# TODO: Complete

In your type checking code, you will need to reference the
above type objects.   Think of how you will want to access
them.

## Symbol table

Then, you will need to define a symbol table that keeps track of
previously declared identifiers.  The symbol table will be consulted
whenever the compiler needs to lookup information about variable and
constant declarations.



In [2]:
class SymbolTable(dict):
    """ Class representing a symbol table. It should provide functionality
        for adding and looking up nodes associated with identifiers.
    """
    def __init__(self):
        super().__init__()

    def add(self, name, value):
        self[name] = value

    def lookup(self, name):
        return self.get(name, None)

Notice that the semantic analysis will have to manage the symbol table in order to handle the multiple scopes of the program.

## Visiting the AST

The visitor pattern is often used in compiler to traverse data structures that represent the programs, either syntax 
tree or any other intermediate representations. For this purpose, we provide the following `NodeVisitor` class to 
allow you to visit the AST. This class was modeled after Python's own AST visiting facilities (the ast module of Python 3).

In [2]:
class NodeVisitor:
    """ A base NodeVisitor class for visiting uc_ast nodes.
        Subclass it and define your own visit_XXX methods, where
        XXX is the class name you want to visit with these
        methods.
    """

    _method_cache = None

    def visit(self, node):
        """ Visit a node.
        """

        if self._method_cache is None:
            self._method_cache = {}

        visitor = self._method_cache.get(node.__class__.__name__, None)
        if visitor is None:
            method = 'visit_' + node.__class__.__name__
            visitor = getattr(self, method, self.generic_visit)
            self._method_cache[node.__class__.__name__] = visitor

        return visitor(node)

    def generic_visit(self, node):
        """ Called if no explicit visitor function exists for a
            node. Implements preorder visiting of the node.
        """
        for _, child in node.children():
            self.visit(child)

For example, a small visitor for constant can be implemented this way:

```python
class ConstantVisitor(NodeVisitor):
    def __init__(self):
        self.values = []

    def visit_Constant(self, node):
        self.values.append(node.value)
```

This visitor would create a list of values of all the constant nodes encountered below the given node. To use it, simply instantiate the visitor and call its visit method on the node of your choice:

```python
cv = ConstantVisitor()
cv.visit(node)
```

Note that:
*   The `generic_visit()` method will be called for AST nodes for which no `visit_XXX` method was defined.
*   The children of nodes for which a `visit_XXX` method was defined will not be visited - if you need this, call
    `generic_visit()` on the node.
*   The `generic_visit()` method could be implemented more efficiently by defining AST nodes as iterable object using 
    the `__iter__()` method of Python. Feel free to optimize 

## Standardized Errors

To check for semantic errors and to print their description, the following method should be used:

In [None]:
    def _assert_semantic(self, condition, msg_code, coord, name="", ltype="", rtype=""):
        """Check condition, if false print selected error message and exit"""
        error_msgs = {
            1: f"{name} is not defined",
            2: f"{ltype} must be of type(int)",
            3: "Expression must be of type(bool)",
            4: f"Cannot assign {rtype} to {ltype}",
            5: f"Assignment operator {name} is not supported by {ltype}",
            6: f"Binary operator {name} does not have matching LHS/RHS types",
            7: f"Binary operator {name} is not supported by {ltype}",
            8: "Break statement must be inside a loop",
            9: "Array dimension mismatch",
            10: f"Size mismatch on {name} initialization",
            11: f"{name} initialization type mismatch",
            12: f"{name} initialization must be a single element",
            13: "Lists have different sizes",
            14: "List & variable have different sizes",
            15: f"conditional expression is {ltype}, not type(bool)",
            16: f"{name} is not a function",
            17: f"no. arguments to call {name} function mismatch",
            18: f"Type mismatch with parameter {name}",
            19: "The condition expression must be of type(bool)",
            20: "Expression must be a constant",
            21: "Expression is not of basic type",
            22: f"{name} does not reference a variable of basic type",
            23: f"\n{name}\nIs not a variable",
            24: f"Return of {ltype} is incompatible with {rtype} function definition",
            25: f"Name {name} is already defined in this scope",
            26: f"Unary operator {name} is not supported",
            27: "Undefined error",
        }
        if not condition:
            msg = error_msgs.get(msg_code)
            print("SemanticError: %s %s" % (msg, coord), file=sys.stdout)
            sys.exit(1)

As shown in the function code, each message is associated to a specific error code. The error message is printed according to the given coordinate and facultative arguments. The `_assert_semantic` method is used to standardize the output of the semantic analysis and must be used to pass the automatic tests. Some examples showing how to use the method are provided in the next section.

## Implementing the analysis

Finally, you'll need to write code that walks the AST, decorates it with additional information and enforces a set of
semantic rules as explained by the guidelines below. For walking the AST, use the NodeVisitor class. An initial
implementation of the semantic analysis is provided in the code below.

In [16]:
class Visitor(NodeVisitor):
    '''
    Program visitor class. This class uses the visitor pattern. You need to define methods
    of the form visit_NodeName() for each kind of AST node that you want to process.
    '''
    def __init__(self):
        # Initialize the symbol table
        self.symtab = SymbolTable()

        # Add built-in type names (int, float, char)
        self.typemap = {
            "int": IntType,
            "float": FloatType,
            "char": CharType,
            # TODO
        }
        
        # TODO: Complete...

    def visit_Program(self, node):
        # Visit all of the global declarations
        for _decl in node.gdecls:
            self.visit(_decl)
        # TODO: Manage the symbol table

    def visit_BinaryOp(self,node):
        # Visit the left and right expression
        self.visit(node.left)
        ltype = node.left.uc_type
        self.visit(node.right)
        rtype = node.right.uc_type
        # TODO: 
        # - Make sure left and right operands have the same type
        # - Make sure the operation is supported
        # - Assign the result type to current node
        
    def visit_Assignment(self, node):
        # visit right side
        self.visit(node.rvalue)
        rtype = node.rvalue.uc_type
        # visit left side (must be a location)
        _var = node.lvalue
        self.visit(_var)
        if isinstance(_var, ID):
            self._assert_semantic(_var.scope is not None, 1, node.coord, name=_var.name)
        ltype = node.lvalue.uc_type
        # Check that assignment is allowed
        self._assert_semantic(ltype == rtype, 4, node.coord, ltype=ltype, rtype=rtype)
        # Check that assign_ops is supported by the type
        self._assert_semantic(node.op in ltype.assign_ops, 5, node.coord, name=node.op, ltype=ltype)

__IMPORTANT:__ The AST you built previously only contains information (like types) at specific nodes. Beside finding the possible remaining errors of the program, the semantic analysis should be used to figure out additional information (such as typing all expressions) which will be useful for code generation, the next compilation phase.  This process is usally called "decorating the AST".

## Guidelines

Additionnally, we provide a set of guidelines that can be used to implement each function of the semantic analysis 
(type checking, definition checking, etc). Please read those carefully.
   
### Program / Functions

1. Program (`visit_Program`)

Visit all of the global declarations. Record the associated symbol table.

2. Function Definition (`visit_FuncDef`)

Initialize the list of declarations that appears inside loops. Save the reference to current function.
    
Visit the return type of the Function, the function declaration, the parameters, and the function body.

3. Parameter list (`visit_ParamList`)

Just visit all parameters.

### Declarations / Type

1. Global Declaration (`visit_GlobalDecl`)

Just visit each global declaration. 

2. Declaration (`visit_Decl`)

Visit the types of the declaration (VarDecl, ArrayDecl, FuncDecl). Check if the function or the variable is defined,
otherwise return an error. If there is an initial value defined, visit it.

3. Variable Declaration (`visit_VarDecl`)

First visit the type to adjust the list of types to uCType objects. Then, get the name of variable and make sure it is
not defined in the current scope, otherwise return an error. Next, insert its identifier in the symbol table. Finally,
copy the type to the identifier.

4. Array Declaration (`visit_ArrayDecl`)

First visit the type to adjust the list of types to uCType objects. Array is a modifier type, so append this info in
the ID object. Visit the array dimension if defined else the dim will be infered after visit initialization in Decl
object.

5. Function Declaration (`visit_FuncDecl`)

Start by visiting the type. Add the function to the symbol table. Then, visit the arguments. Create the type of the
function using its return type and the type of its arguments.

6. Declaration List (`visit_DeclList`)

Visit all of the declarations that appear inside the statement. Append the declaration to the list of decls in the
current function. This list will be used by the code generation to allocate the variables.

7. Type (`visit_Type`)

Get the matching basic uCType.

### Statements

1. If Block (`visit_If`)

First, visit the condition. Then, check if the conditional expression is of boolean type or return a type error.
Finally, visit the statements related to the *then*, and to the *else* (in case there are any).
    
```c
if(3.1) { } // Error. Conditional expression should be of type boolean.
```

2. For Block (`visit_For`)

First, append the current loop node to the dedicated list attribute used to bind the node to nested break statement.
Then, visit the initialization, the condition and check if the conditional expression is of boolean type or return a
type error. Next, visit the increment (`next`) and the body of the loop (`stmt`).

3. While Block (`visit_While`)

First, append the current loop node to the dedicated list attribute used to bind the node to nested break statement.
Then, visit the condition and check if the conditional expression is of boolean type or return a type error. Finally,
visit the body of the while (`stmt`).

4. Compound Statement (`visit_Compound`)

Visit the list of block items (declarations or statements).

5. Assignement (`visit_Assignment`)

Visit the right side. Visit the left side. The left side must have been defined previously or return an error. Check
that the assignment is allowed otherwise return a type error: the left and right hand sides of an assignment operation
must be declared as the same type. Check that assign_ops is supported by the type or return an error, attempts to use
unsupported operators should result in an error.

Here is the exhaustive list of operators supported by each type:
```python
    # int:
        assign_ops  = {"=", "+=", "-=", "*=", "/=", "%="}

    # float:
        assign_ops = {"=", "+=", "-=", "*=", "/=", "%="}

    # char:
        assign_ops  = {"="}

    # bool:
        assign_ops  = {"="}

    # array:
        assign_ops  = {"="}
```

See the example below:
```c
float f = 0.3;
int j = f;             // Error: Cannot assign type(float) to type(int)
```

6. Break (`visit_Break`)

Check the Break statement is inside a loop. If not, it must return an error. Bind it to the current loop node.

7. Funcion Call (`visit_FuncCall`)

Verify that the given name is a function, or return an error if it is not. Initialize the node type and name
using the symbole table. Check that the number and the type of the arguments correspond to
the parameters in the function definition or return an error.

8. Assert (`visit_Assert`)

Visit the expression and verify it is of boolean type or return a type error.

9. Empty Statement (`visit_EmptyStatement`)

Do nothing, just `pass`.

10. Print (`visit_Print`)

Just visit each expression and check if it is of basic type. Returns an error otherwise.

11. Read (`visit_Read`)

Visit each name and verify that all identifiers used have been defined and are variables.

12. Return (`visit_Return`)

Visit the expression and check that its type is identical to the return type of the function definition.


### Expressions

1. Constant (`visit_Constant`)

Get the matching uCType and convert the value to respective type.

2. Identifier (`visit_ID`)

Look for its declaration in the symbol table and bind the ID to it.
Also, initialize the type, kind, and scope attributes.

2. Cast operation (`visit_Cast`)

Visit the expression and the targeted type (`to_type`). Then initialize the node type accordingly.

3. Binary Operation (`visit_BinaryOp`)

Start by visiting each operands of the operation. Verify that both operands have the same
type or return a type error. Verify the operator of the binary operation is compatible with the operands type,
attempts to use unsupported operators should result in an error. Binary operations using arithmetic operator
produce a result of the same type than the operands. Binary operations using relational operator produce
boolean type. Otherwise, you get a type error. Set the type of the current node representing the binary operation.
    
Here is the exhaustive list of operators supported by each type:
```python
    # int:
        binary_ops  = {"+", "-", "*", "/", "%"}
        rel_ops     = {"==", "!=", "<", ">", "<=", ">="}

    # float:
        binary_ops = {"+", "-", "*", "/", "%"}
        rel_ops    = {"==", "!=", "<", ">", "<=", ">="}

    # char:
        rel_ops     = {"==", "!=", "&&", "||"}

    # bool:
        rel_ops     = {"==", "!=", "&&", "||"}

    # array:
        rel_ops     = {"==", "!="}

    # string:
        rel_ops     = {"==", "!="}
```
    
For example:
```c
        int a = 2;
        float b = 3.14;

        int c = a + 3;    // OK
        int d = a + b;    // Error.  int + float
        int e = b + 4.5;  // Error.  int = float
        char a[] = "Hello" + "World";     // OK
        char b[] = "Hello" * "World";     // Error: unsupported op *
```

4. Unary Operation (`visit_UnaryOp`)

Start by visiting the operand of the operation. Verify the operator of the operation is
compatible with the operand type, attempts to use unsupported operators should result in
an error. Set the type of the current node representing the unary operation with the same
type as the operand.

Here is the exhaustive list of operators supported by each type:
```python
    # int:
        unary_ops   = {"-", "+", "--", "++", "p--", "p++"}   # where p stands for postfix

    # float:
        unary_ops  = {"-", "+", "*", "&"}

    # bool:
        unary_ops   = {"!"}
```

5. Expression List (`visit_ExprList`)

Visit each element of the list and verify that identifiers have already been defined or return an error.

6. Array Reference (`visit_ArrayRef`)

Start by visiting the subscript. If the subscript is an identifier, verify that it has
already been defined or return an error. Check that the type of the subscript is integer
or return an error. Visit the name and initialize the type of the node.

See the example below:
```c
int v[2] = {1, 2};
float f = 0.3;
int j = v[f];             // Error: array index must be of type(int)
```

7. Initialization List (`visit_InitList`)

Visit each element of the list. If they are scalar (not InitList), verify they are
constants or return an error.