Deadline: Fri, 31.03.2023 (Week 11), 22:00
This coursework already provides:
- I) Lowering from the choco AST (
choco.ast) to a flat version of the choco AST (choco.ir). - II) A skeleton for lowering the
choco.irto an SSA RISCV-V dialect (riscv_ssa). - III) Lowering from
riscv_ssato a dialect for RISC-V assembly (riscv). - IV) A printer for the
riscvdialect which emits assembly files.
- Core: Code generation
- Expert: Optimization
First, you will need to clone the repository. If you used GitHub before, you will already have either an HTTPS access token, or an SSH key. If you do not have either, you will need to create this. You can use either, just follow the guides below:
- To create an SSH key, use this
- To create an HTTPS token, use this At some point, you will be asked to specify what you want to do with the token: feel free to tick all the boxes.
At this point, you can clone the repository:
- if you used HTTPS above, use the command
git clone https://github.com/compiling-techniques/REPOSITORY_NAME.git - if you used SSH, use the command
git clone git@github.com:compiling-techniques/REPOSITORY_NAME.git
Now enter the repository directory: cd REPOSITORY_NAME. If this is your first time using Git, you will need to set up a bit of configuration:
git config --global user.name "your_github_username"
git config --global user.email "your_github_email"
This will set your username and email globally (for all repositories, unless they overwrite this), so all future GitHub repositories will already have this set up. If you do not wish to do that, just omit the --global.
You can verify the setup using git config -l, which will just tell you what settings you set.
You can create an isolated python environment using venv.
To set up venv for the assignment, follow the steps below (a summary is given below the bulleted list):
- Set up a virtual environment for the assignment with
python3 -m venv env. This creates a subdirectory calledenvin the current folder that creates an isolated version of Python. - Activate the virtual environment by
sourceing the activation file:source env/bin/activate - Confirm that you are in the virtual environment by running
which python. The output should be/path/to/coursework/env/bin/python. - Run
pip install -U -r requirements.txtto install dependencies within the virtual environment. - Run
pip install -e .to install the ChocoPy compiler as a package. - If you are using PyCharm, please configure PyCharm to work with the environment by following the instructions on this page of the PyCharm manual: Configure a virtual environment.
In summary, the process looks as follows:
/path/to/coursework$ /afs/inf.ed.ac.uk/group/teaching/ct/python/3.10/bin/python3.10 -m venv env # set up virtual environment called `env`
/path/to/coursework$ source env/bin/activate # run activation file
(env) /path/to/coursework$ # (env) shows up
(env) /path/to/coursework$ which python # confirm python path
/path/to/coursework/env/bin/python
(env) /path/to/coursework$ pip install -U -r requirements.txt # install dependencies
(env) /path/to/coursework$ pip install -e . # install ChocoPy as a package
(env) /path/to/coursework$ # get to hacking, and best of luck! :)It would be convenient for you, if you used a modern IDE for Python.
A popular choice, PyCharm, comes pre-installed in your DICE desktop environment.
If you decide to use PyCharm, in order to install the packages, you should open the embedded terminal (Alt+F12 by default)
and follow the previous instructions using pip install -U -r requirements.txt.
Instead of cloning the repository to your local machine, you can also use GitHub codespaces to do your coursework. This is a beta-test, so it is an optional offer that is delivered on a best-effort basis. TO use GitHub codespaces click on the green "Code" button at the top of this repository, select "code spaces" and create your personal codespace. Then enter the console and run:
$ export PATH=/home/codespace/.local/lib/python3.10/site-packages/bin/:$PATH
$ pip install -U -r requirements.txt # install dependencies
$ pip install -e . # install ChocoPy as a package
$ # get to hacking, and best of luck! :)You can use lit to automatically test your code, which is include in the requirements.txt.
To run it locally, do:
lit -v tests/end-to-endThis will examine recursively all the files with valid formats inside the above directory.
The -v flag adds a verbose output with more information in case some tests fail.
You can also leverage the --timeout <seconds> flag, in order to bound the time allowed for your test cases to run.
This way you can detect if your parser loops infinitely in some test cases.
For more details on the configuration of lit, see tests/lit.cfg.
For more info on lit check the online documentation.
The goal of task 1 is to write a code generator for the ChocoPy Language, targeting the RISC-V architecture. The output RISC-V program will be run on a RISC-V interpreter.
In particular, you need to implement the lowering from the choco.ir dialect to the riscv_ssa dialect.
The template already provides a partial implementation of the lowering.
You will have to implement the rest.
First read this README completely and carefully! It explains the lowering phases of the code generation, as well as how to run the assembly code. We have also added some hints that are useful for this coursework.
To understand how lowering from choco.ast to choco.ir works (step I), have a look at choco/choco_ast_to_choco_flat.py.
After you have a quick look, it would be particularly useful for understanding the next steps, if you get a good understanding of the following files:
choco/choco_flat_introduce_library_calls.pychoco/for_to_while.py
The choco.ir dialect is described in choco/dialects/choco_flat.py.
To understand how you should lower choco.ir to riscv_ssa (step II), you need look at what RISC-V instructions are available in the class RISCVSSA of riscv/ssa_dialect.py.
Each instruction subclasses a RISC-V Operation, e.g., Riscv1Rd1Rs1ImmOperation, which tells you how to use this particular instruction.
For example, the aforementioned operation uses one destination register (Rd), one source register (Rs) and one immediate (Imm).
A RISC-V instruction that corresponds to this encoding is AddIOp, which adds an immediate to the source register, and places the result on the destination register.
We give you the implementation of the runtime library, which is automatically added to your riscv programs. Some of the library function calls are already generated by the frontend, but you will need to generate some calls for these functions:
_malloc: Allocate in the heap the number of bytes given as argument, and return a pointer to it._error_len_none: Print an error and exit the program. This should be used whenlenis called on aNonevalue._list_index_oob: Print an error and exit the program. This should be used when an array is accessed out of bounds._list_index_none: Print an error and exit the program. This should be used whenNoneis indexed.
In order to transform the IRs, you will need to use the xDSL rewriter engine.
Here is for instance the given skeleton for the UnaryExpr rewrite pattern:
class UnaryExprPattern(RewritePattern):
@op_type_rewrite_pattern
def match_and_rewrite(self, unary_op: UnaryExpr,
rewriter: PatternRewriter):
raise NotImplementedError()
The match_and_rewrite method will be called for each UnaryExpr in the IR.
The rewriter argument has multiple methods that you will need to use throughout the coursework:
insert_op_before_matched_op: Inserts new operations given as inputs before the matched operation (unary_ophere).erase_matched_op: Erase the matched operation. If the matched operation had results that are still used, this will trigger an error.erase_op: Erase the given operation. The given operation should either be the matched operation, or an operation contained in the matched operation's regions and blocks. If the given operation had results that are still used, this will trigger an error.replace_matched_op: Replace the matched operation with new operations given as input. The SSA results of the matched operation will be replaced by the SSA results of the last given operation. Optionally, you can provide a list of SSA values to replace the results of the matched operation.inline_block_before_matched_op: Move the operations in a block right before the matched operation.
Note that insert_op_before_matched_op and inline_block_before_matched_op should not be used after replace_matched_op or erase_matched_op.
To get started coding look at the file choco/choco_flat_to_riscv_ssa.py which contains an incomplete implementation of the lowering to riscv_ssa.
The file contains:
- An example rewrite pattern for call expressions.
- A utility function to call the rewriters that lower
choco.irtoriscv_ssa.
You will need to complete the implementations of the other rewrite patterns.
- Literals (except strings)
- Unary expressions (arithmetic and logical)
- Arithmetic binary expressions
- Assign statements (alloca, load, store)
- If statements
- Logical and conditional expressions
- While statements
- List expressions
- Index operation
- String literals and string operations
The tests/end-to-end folder contains a directory for each section,
making it easier to check your progress.
Now, you can actually execute a ChocoPy program! First, you can get the RISC-V assembly, by doing:
cd /path/to/coursework
choco-opt -p all -t riscv tests/end-to-end/print-integer-literal.chocSecond, you can run the RISC-V assembly, using the riscv-interpreter tool:
riscv-interpreter tests/riscv/interpreter/hello_world.sFinally, you can combine the above tools in order to execute directly a ChocoPy program:
cd /path/to/coursework
choco-opt -p all -t riscv tests/end-to-end/print-integer-literal.choc >temp.s && riscv-interpreter temp.sAlso, take a look at how lit will run tests with filecheck, examining the first line of, e.g., tests/end-to-end/print-integer-literal.choc.
In your implementation, values will all be unboxed, meaning that they do not carry type information.
However, with unboxed, it is not possible to use any value of type object, since we
cannot recover its original type statically. To simplify your implementation, we will thus not test
any code that has object values, or heterogeneous lists. In particular, this include expressions
such as if True then 2 else "foo", or [0, True].
Similarly, we only support print with a bool, int, or str input.
Initially, you can check the correct implementation of Alloca, Store and Load, using the test cases inside end-to-end/var-defs.
Alloca and Store are necessary for variable definitions, Store is also necessary for assignments, and Load is necessary whenever you are accessing a variable.
The and, or, and if/else expressions use short-circuit evaluation. This
means that in the case of an and for instance, False and foo() will not lead
to the execution of foo, since and will return False for any value
returned by foo. This is important in the case where foo has a side-effect.
and, or, and if/else expressions use regions to separate both "branches".
Since both branches can have SSA variables, the "result" of a branch is given by
the yield operation. Here is for instance the code for 0 > 1 or 1 >= 2:
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"bool"> = choco.ir.effectful_binary_expr() ["op" = "or"] {
%1 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 0 : !i32]
%2 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 1 : !i32]
%3 : !choco.ir.named_type<"bool"> = choco.ir.binary_expr(%1 : !choco.ir.named_type<"int">, %2 : !choco.ir.named_type<"int">) ["op" = ">"]
choco.ir.yield(%3 : !choco.ir.named_type<"bool">)
} {
%4 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 1 : !i32]
%5 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 2 : !i32]
%6 : !choco.ir.named_type<"bool"> = choco.ir.binary_expr(%4 : !choco.ir.named_type<"int">, %5 : !choco.ir.named_type<"int">) ["op" = ">="]
choco.ir.yield(%6 : !choco.ir.named_type<"bool">)
}
}
}
Things will start to become more difficult with control flow.
Essentially, control flow constructs will be needed for if-else statements, while statements, but also for the logical binary expressions (and/or) and the conditional expression.
Assume the following program:
if True:
42
else:
17You can see what is the choco.ir form of this program by running:
$ choco-opt -p check-assign-target,name-analysis,type-checking,choco-ast-to-choco-flat,choco-flat-introduce-library-calls,for-to-while /path/to/program
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"bool"> = choco.ir.literal() ["value" = !choco.ir.bool<True>]
choco.ir.if(%0 : !choco.ir.named_type<"bool">) {
%1 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 42 : !i32]
} {
%2 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 17 : !i32]
}
}
}
Notice the SSA values created for the condition (%0), the value in the if-then block (%1), and the value in the if-else block (%2).
We sketch how you should approach control flow, when lowering from choco.ir to riscv_ssa:
- Assuming that you evaluate conditions by comparing them with zero, you need an SSA value for the constant
0. - You need a RISC-V branch instruction to compare your condition with zero.
- Since we have control flow, you need labels to represent locations in code, i.e., where is the code for the if-then block, for the if-else block, and for the code after the if statement.
- If your condition evaluates to false, then you need to jump to the if-else label.
- Otherwise, you continue with the if-then block, but you also need an extra instruction to jump to the code after the if statement.
The correct riscv_ssa output would be:
$ choco-opt -p check-assign-target,name-analysis,type-checking,choco-ast-to-choco-flat,choco-flat-introduce-library-calls,for-to-while,choco-flat-to-riscv-ssa /path/to/program
builtin.module() {
riscv_ssa.func() ["func_name" = "_main"] {
%0 : !riscv_ssa.reg = riscv_ssa.li() ["immediate" = 1 : !i32]
%1 : !riscv_ssa.reg = riscv_ssa.li() ["immediate" = 0 : !i32]
riscv_ssa.beq(%0 : !riscv_ssa.reg, %1 : !riscv_ssa.reg) ["offset" = !riscv.label<if_else_1>]
riscv_ssa.label() ["label" = !riscv.label<if_then_1>]
%2 : !riscv_ssa.reg = riscv_ssa.li() ["immediate" = 42 : !i32]
riscv_ssa.j() ["offset" = !riscv.label<if_after_1>]
riscv_ssa.label() ["label" = !riscv.label<if_else_1>]
%3 : !riscv_ssa.reg = riscv_ssa.li() ["immediate" = 17 : !i32]
riscv_ssa.label() ["label" = !riscv.label<if_after_1>]
}
}
There are many things to notice here.
%0is the SSA value ofTruein the condition.%1is the SSA value of zero.riscv_ssa.beqcompares the condition with zero. If the condition evaluates to zero (i.e., false), we jump to theif_else_1label.- Otherwise, we continue to the if-then block (
if_then_1label). %2is the only thing inside the if-then block.riscv_ssa.jjumps over the if-else block to the code after if (if_after_1label).%3is the only thing inside the if-else block.
In order to generate branch instructions in the riscv_ssa dialect, you can choose one of the BEQOp, BNEOp, etc.
See the ricsv/ssa_dialect for more details.
These instructions are of Riscv2Rs1OffOperation type, which means that they take two SSA values and one offset as an argument.
The offset should be a string representing the label name to which you are branching.
A final thing to keep in mind is that each label name should be unique.
If you use for the if_then_label variable the string "if_then", and your program contains multiple if statements,
you will end up with identical label names (which is wrong).
Use the counters provided in the rewrite patterns to get unique names for your labels.
The goal of this task is to optimize the code in terms of code size, that is to reduce the generated lines of RISC-V assembly.
We give examples of different forms below.
For this task, you can modify or add any pass you want, at any level (dialect) of the compilation pipeline. We only ask you to not modify nor remove the library calls, nor change the command line API. Be careful that your optimizations are correct, because otherwise you might lose points in task 1 as well!
The files in tests/end-to-end/code-size-optimization contain examples that have some optimization opportunities.
We recommend you to implement these optimizations (in order of difficulty):
- Constant folding (you can continue the implementation of
choco/constant_folding.py) - Some arithmetic rewrite rules (such as
x * 0 = 0, or(x + 3) + 5 = x + 8) - Removal of unnecessary
load/storefor temporary chocopy variables. - Improving the register allocator to not always spill registers on the stack. This can be done in a pass after the translation of riscv-ssa to riscv, or by modifying the register allocator (hard, but would yield better code reduction).
We include two minimal examples of rewriters, which operate on the choco.ir level.
So, you need to have a good understanding of the following files:
choco/constant_folding.pychoco/dead_code_elimination.py
We take as similar approach as in choco/constant_folding.py.
First, you need to create a similar file, e.g., choco/your_new_pass.py.
Then, this file can contain your rewriter pattern and a function that will invoke the PatternRewriteWalker with your rewriter.
For example, this function could be called choco_flat_your_new_pass, assuming that your new pass operates on choco.ir (choco flat).
Finally, you need to import this invocation function from choco-opt, by including:
from choco.your_new_pass import choco_flat_your_new_passand, also, you need to add the function into the passes_native list:
passes_native = [
# Semantic Analysis
check_assign_target,
# ...
# IR Optimization
# ...
choco_flat_your_new_pass,
# Code Generation
# ...
]Adding a new separate pass could help with development and debugging, as you can include only some of your passes in choco-opt invocation and see if one of them is causing an error.
In particular, instead of passing -p all to choco-opt, you can be explicit about the passes:
cd /path/to/coursework
choco-opt -p check-assign-target,name-analysis,type-checking,warn-dead-code,choco-ast-to-choco-flat,choco-flat-introduce-library-calls,choco-flat-constant-folding,choco-flat-dead-code-elimination,for-to-while,choco-flat-to-riscv-ssa,riscv-ssa-to-riscv,riscv-function-lowering -t riscv tests/end-to-end/print-integer-literal.chocNotice the choco-flat-constant-folding and choco-flat-dead-code-elimination passes.
You could remove or add more passes likewise.
Run:
choco-opt -hfor more info.
If you want to always use the -p all flag for brevity, you can also comment out the pass you want to remove from the passes_native list in choco-opt.
The code:
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 42 : !i32]
%1 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 8 : !i32]
choco.ir.call_expr(%0 : !choco.ir.named_type<"int">) ["func_name" = "print"]
}
}
can be converted to:
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 42 : !i32]
choco.ir.call_expr(%0 : !choco.ir.named_type<"int">) ["func_name" = "print"]
}
}
where the SSA value %1 was removed since it was not used later.
The code:
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 42 : !i32]
%1 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 8 : !i32]
%2 : !choco.ir.named_type<"int"> = choco.ir.binary_expr(%0 : !choco.ir.named_type<"int">, %1 : !choco.ir.named_type<"int">) ["op" = "+"]
choco.ir.call_expr(%2 : !choco.ir.named_type<"int">) ["func_name" = "print"]
}
}
can be converted to:
builtin.module() {
choco.ir.func_def() ["func_name" = "_main", "return_type" = !choco.ir.named_type<"<None>">] {
%0 : !choco.ir.named_type<"int"> = choco.ir.literal() ["value" = 50 : !i32]
choco.ir.call_expr(%0 : !choco.ir.named_type<"int">) ["func_name" = "print"]
}
}
where the binary expression was replaced by the sum of the constants,
and the SSA values %1 and %2 were eliminated as dead.
The dead code elimination was enabled by the constant propagation.
You are encouraged to commit your changes regularly. This allows you to track the history of your changes so that you can revert to an earlier version of your code if you need to. It also protects you from losing any of your work in the case of a computer failure.
Furthermore, every time you commit and push your changes in the main branch, your points are updated,
giving you continuous feedback on your implementation.
This badge shows how many successful test cases you've had so far.
These points are provisional and are not related to the final grade. The number of successful test cases is only an indicator of how strong your implementation is, with respect to how many language features it covers. Your parser will be tested also on other test cases, so it is important that you write your own tests to ensure that your code is thoroughly tested.
Once you have pushed all of your changes to GitHub and you are happy with your code and your points, you're finished! We will grade your last submission. Submissions after the deadline, will result in penalties without an approved extension. See "Assessment" on Learn for details.
Follow our academic guidelines and take advantage of Piazza to discuss any open questions.
For the following courseworks, the full lexer and parser will be provided.
Please remember the good scholarly practice requirements of the University regarding work for credit.
The number of passing test cases is intended to give you an idea of the quality of the code. However, the actual grading of your coursework takes place after the deadline and takes into account public and hidden automatic tests, as well as potential manual reviews.
Submitted code will be checked for similarity with other submissions using the MOSS system. MOSS has been effective in the past at finding similarities and it is not fooled by name changes or reordering of code blocks. Courseworks are INDIVIDUAL, and we expect everyone to turn in their sole, independent work.
Extensions: Please refer to the "Assessment" page on Learn for information on extensions.
If you have questions, you should consult the lecture slides and recordings. If you have questions about the coursework, please start by checking existing discussions on Piazza. If you can't find the answer to your question, start a new discussion. It is quite possible that other students will have encountered and solved the same problem and will be able to help you. The TA will also monitor Piazza and clarify things as necessary, after allowing time for student discussion to take place.