# Seventh Project: LLVM-IR Code Generation

In this seventh project, you are going to translate the optimized SSA intermediate
representation uCIR into LLVM IR, the intermediate representation of LLVM that is
partially specified in [LLVM Primer](./doc/llvm_primer.ipynb). LLVM is a set of
production-quality reusable libraries for building compilers. LLVM separates computer
architectures from language issues and simplifies the design and portability of new
compilers. Before you start, carefully study this simplified specification of the
LLVM-IR, tailored to our needs, to familiarize yourself with data structures,
addressing modes, and the instructions. The full LLVM IR specifications are available
[here](https://llvm.org/docs/LangRef.html).

We will try to give you a hand by giving you some of the pieces prewritten and
littered your code with helpful statements. By the time you are done, you will have a
pretty thorough understanding of the execution code for uC programs and will even gain
a little reading familiarity with LLVM-IR.

## LLVM python binding

To carry out this step, we will make use of the *llvmlite* library: a lightweight LLVM python 
binding for writing JIT compilers. In fact, we do not need the full LLVM API. Only the IR
builder, optimizer, and JIT compiler APIs are necessary. So, *llvmlite* is a project tailored
for our usage that uses the following approach: a small C wrapper around the parts of the LLVM C++
API we need that are not already exposed by the LLVM C API; a ctypes Python wrapper around
the C API; and a pure Python implementation of the subset of the LLVM IR builder. You can
find the documentation and how to install *llvmlite* at http://llvmlite.pydata.org. An
example, extracted from the *llvmlite* documentation, is presented [here](./doc/llvm_example.ipynb).
Study it carefully.


## Generating LLVM-IR Code

The basic idea in this step is similar to the one used by the interpreter: creating a 
class that walks through the CFG, gets the sequence of instructions inside the basic block 
and triggers a method for each kind of instruction. However, it will be generating
LLVM instructions instead of running the instructions. The code below contains the initial
class and methods to interface with LLVM. 

The `visit_Program` method is used to visit the CFG of each function and generating the 
LLVM instruction thanks to the `LLVMFunctionVisitor` presented later. Notice that each
function is visited twice: a first time to create the LLVM IR function and its blocks, 
then a second time to generate the LLVM instructions. You will have to adapt the 
`visit_Program` method (feel free to modify it) and the `LLVMFunctionVisitor` class to 
allow this two-phases generation. The `execute_ir` method can be used to compile and 
execute the program with or without the LLVM compilation optimizations depending of its
argument `opt`. The `save_ir` method simply prints the IR in a file to allow you to 
watch it. Further instructions are contained in the comments.

In [None]:
from llvmlite import ir, binding
from ctypes import CFUNCTYPE, c_int

class LLVMCodeGenerator(NodeVisitor):
    def __init__(self, viewcfg):
        self.viewcfg = viewcfg
        self.binding = binding
        self.binding.initialize()
        self.binding.initialize_native_target()
        self.binding.initialize_native_asmprinter()
        
        self.module = ir.Module(name=__file__)
        self.module.triple = self.binding.get_default_triple()
        
        self.engine = self._create_execution_engine()
        
        # declare external functions
        self._declare_printf_function()
        self._declare_scanf_function()

    def _create_execution_engine(self):
        """
        Create an ExecutionEngine suitable for JIT code generation on
        the host CPU.  The engine is reusable for an arbitrary number of
        modules.
        """
        target = self.binding.Target.from_default_triple()
        target_machine = target.create_target_machine()
        # And an execution engine with an empty backing module
        backing_mod = binding.parse_assembly("")
        return binding.create_mcjit_compiler(backing_mod, target_machine)

    def _declare_printf_function(self):
        voidptr_ty = ir.IntType(8).as_pointer()
        printf_ty = ir.FunctionType(ir.IntType(32), [voidptr_ty], var_arg=True)
        printf = ir.Function(self.module, printf_ty, name="printf")
        self.printf = printf

    def _declare_scanf_function(self):
        voidptr_ty = ir.IntType(8).as_pointer()
        scanf_ty = ir.FunctionType(ir.IntType(32), [voidptr_ty], var_arg=True)
        scanf = ir.Function(self.module, scanf_ty, name="scanf")
        self.scanf = scanf

    def _compile_ir(self):
        """
        Compile the LLVM IR string with the given engine.
        The compiled module object is returned.
        """
        # Create a LLVM module object from the IR
        llvm_ir = str(self.module)
        mod = self.binding.parse_assembly(llvm_ir)
        mod.verify()
        # Now add the module and make sure it is ready for execution
        self.engine.add_module(mod)
        self.engine.finalize_object()
        self.engine.run_static_constructors()
        return mod

    def save_ir(self, outputfile):
        output_file.write(str(self.module))
            
    def execute_ir(self, opt, opt_file):
        mod = self._compile_ir()
        
        if opt:
            # apply some optimization passes on module
            pmb = self.binding.create_pass_manager_builder()
            pm = self.binding.create_module_pass_manager()
            
            pmb.opt_level = 0;
            if opt == 'ctm' or opt == 'all':
                # Sparse conditional constant propagation and merging
                pm.add_sccp_pass()
                # Merges duplicate global constants together
                pm.add_constant_merge_pass()
                # Combine inst to form fewer, simple inst
                # This pass also does algebraic simplification
                pm.add_instruction_combining_pass()
            if opt == 'dce' or opt == 'all':
                pm.add_dead_code_elimination_pass()
            if opt == 'cfg' or opt  == 'all':
                # Performs dead code elimination and basic block merging
                pm.add_cfg_simplification_pass()
            
            pmb.populate(pm)
            pm.run(mod)
            opt_file.write(str(mod))
            
        # Obtain a pointer to the compiled 'main' - it's the address of its JITed code in memory.
        main_ptr = self.engine.get_function_address('main')
        # To convert an address to an actual callable thing we have to use
        # CFUNCTYPE, and specify the arguments & return type.
        main_function = CFUNCTYPE(c_int)(main_ptr)
        # Now 'main_function' is an actual callable we can invoke
        res = main_function()
        
    def visit_Program(self, node):
        # node.text contains the global instructions into the Program node
        self._generate_global_instructions(node.text)
        # Visit all the function definitions and emit the llvm code from the
        # uCIR code stored inside basic blocks.
        for _decl in node.gdecls:
            if isinstance(_decl, FuncDef):
                # _decl.cfg contains the Control Flow Graph for the function
                bb = LLVMFunctionVisitor(self.module)
                # Visit the CFG to define the Function and Create the Basic Blocks
                bb.visit(_decl.cfg)
                # Visit CFG again to create the instructions inside Basic Blocks
                bb.visit(_decl.cfg)
                if self.viewcfg:
                    dot = binding.get_function_cfg(bb.func)
                    gv = binding.view_dot_graph(dot, _decl.decl.name.name, False)
                    gv.filename = _decl.decl.name.name + ".ll.gv"
                    gv.view()

The code snippet below provides part of the class structure that generates LLVM 
instructions for a function. In particular, the ```LLVMFunctionVisitor``` class and 
the ```build``` method. 

For this project, you have complete the `visit_BasicBlock` and `visit_ConditionBlock`
methods. Similarly to what we explained earlier, those methods will be called twice:
a first time to create the blocks and a second time to create the instruction within
the blocks. You will all need to create ```_build_instr``` methods to handle each 
uCIR instruction (`_build_return`, `_build_add`,`_build_gt`,`_build_store`, etc).
As an example, this code snippet also shows how you can map the print instruction 
and (```_build_print```) from uCIR to LLVM using the ```printf``` function. Use it 
as a guide.

In [None]:
def make_bytearray(buf):
    # Make a byte array constant from *buf*.
    b = bytearray(buf)
    n = len(b)
    return ir.Constant(ir.ArrayType(ir.IntType(8), n), b)


class LLVMFunctionVisitor(BlockVisitor):

    def __init__(self, module):
        self.module = module
        self.func = None
        self.builder = None
        self.loc = {}

    def _global_constant(self, builder_or_module, name, value, linkage='internal'):
        # Get or create a (LLVM module-)global constant with *name* or *value*.
        if isinstance(builder_or_module, ir.Module):
            mod = builder_or_module
        else:
            mod = builder_or_module.module
        data = ir.GlobalVariable(mod, value.type, name=name)
        data.linkage = linkage
        data.global_constant = True
        data.initializer = value
        data.align = 1
        return data

    def _cio(self, fname, format, *target):
        # Make global constant for string format
        mod = self.builder.module
        fmt_bytes = make_bytearray((format + '\00').encode('ascii'))
        global_fmt = self._global_constant(mod, mod.get_unique_name('.fmt'), fmt_bytes)
        fn = mod.get_global(fname)
        ptr_fmt = self.builder.bitcast(global_fmt, ir.IntType(8).as_pointer())
        return self.builder.call(fn, [ptr_fmt] + list(target))

    def _get_loc(self, target):
        try:
            if target[0] == "%":
                return self.loc[target]
            elif target[0] == "@":
                return self.module.get_global(target[1:])
        except KeyError:
            return None

    def _build_print(self, val_type, target):
        if target:
            # get the object assigned to target
            _value = self._get_loc(target)
            if val_type == 'int':
                self._cio('printf', '%d', _value)
            elif val_type == 'float':
                self._cio('printf', '%.2f', _value)
            elif val_type == 'char':
                self._cio('printf', '%c', _value)
            elif val_type == 'string':
                self._cio('printf', '%s', _value)
        else:
            self._cio('printf', '\n')
            
    def _extract_operation(self, inst):
        _modifier = {}
        _ctype = None
        _aux = inst.split("_")
        _opcode = _aux[0]
        if _opcode not in {"fptosi", "sitofp", "jump", "cbranch", "define"}:
            _ctype = _aux[1]
            for i, _val in enumerate(_aux[2:]):
                if _val.isdigit():
                    _modifier["dim" + str(i)] = _val
                elif _val == "*":
                    _modifier["ptr" + str(i)] = _val
        return _opcode, _ctype, _modifier
    
    def build(self, inst):
        opcode, ctype, modifier = self._extract_operation(inst[0])
        if hasattr(self, "_build_" + opcode):
            args = inst[1:] if len(inst) > 1 else (None,)
            if not modifier:
                getattr(self, "_build_" + opcode)(ctype, *args)
            else:
                getattr(self, "_build_" + opcode + "_")(ctype, *inst[1:], **modifier)
        else:
            print("Warning: No _build_" + opcode + "() method", flush=True)
            
    def visit_BasicBlock(self, block):
        # TODO: Complete
        # Create the LLVM function when visiting its first block
        # First visit of the block should create its LLVM equivalent
        # Second visit should create the LLVM instructions within the block
        pass

    def visit_ConditionBlock(self, block):
        # TODO: Complete
        # Create the LLVM function when visiting its first block
        # First visit of the block should create its LLVM equivalent
        # Second visit should create the LLVM instructions within the block
        pass

## Differences between uCIR and llvm IR

### Accessing an array element

Some uCIR and llvm IR code generation strategies are different. Be the example below where
access to 1 element of matrix ```v``` is made. While uCIR visualizes the linearized matrix,
computing the index of the element ```v[i][j]``` (```index = i * ncol + j```), llvm IR
first accesses the row vector ```i``` and then the column element ```j``` in this vector,
generating with this two ```getelementptr``` instructions. One strategy that can be used
to translate from uCIR to llvm IR is to recalculate indices ```i``` and ```j``` from the
calculated ```index```, based on the number of columns (```ncol```) that is known at
compile time. The redundant code can then be eliminated by the llvm DCE.

```
int main() {
    int v[][] = { {1,2}, {3,4}, {5,6} };
    int x, i, j;
    x = v[i][j];
    return 0;
}
```
**uCIR** without opt (note that the code does not run. variables i & j was not initialized):
```
@.const_v.0 = global int[3][2] [[1, 2], [3, 4], [5, 6]]

define int @main ()
entry:
  %1 = alloc int 
  %v = alloc int[3][2] 
  %x = alloc int 
  %i = alloc int 
  %j = alloc int 
  store int[3][2] @.const_v.0 %v 
  %2 = literal int 2 
  %3 = load int %i 
  %4 = mul int %2 %3 
  %5 = load int %j 
  %6 = add int %4 %5 
  %7 = elem int %v %6 
  %8 = load int* %7 
  store int %8 %x 
  %9 = literal int 0 
  store int %9 %1 
  jump label %exit
exit:
  %10 = load int %1 
  return int %10
```
**llvm IR**:
```
; ModuleID = "/Users/marcio/PycharmProjects/mc921/uc/uc_llvm.py"
target triple = "x86_64-apple-darwin19.6.0"
target datalayout = ""

declare i32 @"printf"(i8* %".1", ...) 

declare i32 @"scanf"(i8* %".1", ...) 

@".const_v.0" = constant [3 x [2 x i32]] [[2 x i32] [i32 1, i32 2], [2 x i32] [i32 3, i32 4], [2 x i32] [i32 5, i32 6]], align 16
define i32 @"main"() 
{
entry:
  %"1" = alloca i32, align 4
  %"v" = alloca [3 x [2 x i32]], align 16
  %"x" = alloca i32, align 4
  %"i" = alloca i32, align 4
  %"j" = alloca i32, align 4
  %".2" = bitcast [3 x [2 x i32]]* @".const_v.0" to i8*
  %".3" = bitcast [3 x [2 x i32]]* %"v" to i8*
  call void @"llvm.memcpy.p0i8.p0i8.i64"(i8* %".3", i8* %".2", i64 24, i1 false)
  %".5" = load i32, i32* %"i", align 4
  %".6" = mul i32 2, %".5"
  %".7" = load i32, i32* %"j", align 4
  %".8" = add i32 %".6", %".7"
  %".9" = sdiv i32 %".8", 2
  %".10" = srem i32 %".8", 2
  %".11" = getelementptr [3 x [2 x i32]], [3 x [2 x i32]]* %"v", i32 0, i32 %".9"
  %".12" = getelementptr [2 x i32], [2 x i32]* %".11", i32 0, i32 %".10"
  %".13" = load i32, i32* %".12", align 4
  store i32 %".13", i32* %"x", align 4
  store i32 0, i32* %"1", align 4
  br label %"exit"
exit:
  %".17" = load i32, i32* %"1", align 4
  ret i32 %".17"
}

declare void @"llvm.memcpy.p0i8.p0i8.i64"(i8* %".1", i8* %".2", i64 %".3", i1 %".4") 
```

### Pointer arithmetic

Accessing the content of pointer variables performed by the ```load_type_*``` instruction
in uCIR requires two ```load``` instructions in llvm IR to obtain dereferencing. See the
example below:

```
int main() {
    int i, *r;
    i = *r;
    return 0;
}
```
**uCIR** with opt on:
```
define int @main ()
entry:
  %1 = alloc int 
  %i = alloc int 
  %r = alloc int* 
  %2 = load int* %r 
  store int %2 %i 
  %3 = literal int 0 
  store int %3 %1 
  jump label %exit
exit:
  %4 = load int %1 
  return int %4
```
**llvm IR**:
```
; ModuleID = "/Users/marcio/PycharmProjects/mc921/uc/uc_llvm.py"
target triple = "x86_64-apple-darwin19.6.0"
target datalayout = ""

declare i32 @"printf"(i8* %".1", ...) 

declare i32 @"scanf"(i8* %".1", ...) 

define i32 @"main"() 
{
entry:
  %"1" = alloca i32, align 4
  %"i" = alloca i32, align 4
  %"r" = alloca i32*, align 8
  %".2" = load i32*, i32** %"r", align 8
  %".3" = load i32, i32* %".2", align 4
  store i32 %".3", i32* %"i", align 4
  store i32 0, i32* %"1", align 4
  br label %"exit"
exit:
  %".7" = load i32, i32* %"1", align 4
  ret i32 %".7"
}
```

Likewise, the ```store_type_*``` statement may require a ```load``` followed by a ```store``` on llvm:

```
int main() {
    int i, *r;
    *r = i;
    return 0;
}
```
**uCIR** (without opt):
```
define int @main ()
entry:
  %1 = alloc int 
  %i = alloc int 
  %r = alloc int* 
  %2 = load int %i 
  store int* %2 %r 
  %3 = literal int 0 
  store int %3 %1 
  jump label %exit
exit:
  %4 = load int %1 
  return int %4
```
**llvm IR**:
```
; ModuleID = "/Users/marcio/PycharmProjects/mc921/uc/uc_llvm.py"
target triple = "x86_64-apple-darwin19.6.0"
target datalayout = ""

declare i32 @"printf"(i8* %".1", ...) 

declare i32 @"scanf"(i8* %".1", ...) 

define i32 @"main"() 
{
entry:
  %"1" = alloca i32, align 4
  %"i" = alloca i32, align 4
  %"r" = alloca i32*, align 8
  %".2" = load i32, i32* %"i", align 4
  %".3" = load i32*, i32** %"r"
  store i32 %".2", i32* %".3"
  store i32 0, i32* %"1", align 4
  br label %"exit"
exit:
  %".7" = load i32, i32* %"1", align 4
  ret i32 %".7"
}
```

## Experimenting LLVM optimizations (optional)

Once you have successfully generated the LLVM-IR of the program, you can try the 
optimizations of the LLVM compiler using the `opt` argument of the `execute_ir` 
method. For example, you can compare the optimized LLVM-IR of the programs using
the dataflow optimizations you implemented in the last project with the 
equivalent LLVM optimizations. First, compile the program with your optimizations
enabled, generate the LLVM-IR without using LLVM optimizations and save the 
file. Then, re-compile the same program without your optimization, generate the
LLVM-IR with LLVM optimizations enabled and save the file. Finally, compare the 
two files and check the difference. You could also compare the resulting CFGs.