Skip to content

My MiniDecaf compiler. Project for THU compiler course 2020

Notifications You must be signed in to change notification settings

yangdong02/mydecaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MyDecaf: A Simple MiniDecaf Compiler

MyDecaf is a compiler written in Rust from scratch. It can translate a programming language similar to C, called minidecaf, into RISCV assembly (runnable in QEMU).

The development process follows the tutorial from Tsinghua University 2020 Compiler course. The parser (src/mparser.rs) is generated from https://github.com/MashPlant/lalr1. The compiler passed all the testcases in the course material.

Testing

See https://github.com/decaf-lang/minidecaf-tests

Features

For more details about language specification, see https://decaf-lang.github.io/minidecaf-tutorial/

  • Unary, binary arithmetic and logical expressions
  • Local & Global variables
  • Scope, block statements
  • Assignment, if-else, for, while, do-while, break, continue
  • Functions, function calls
  • Arrays, pointers, arrays of pointers (but no pointer to arrays)
  • Lvalue & rvalue inference
  • Type checking

Besides the language specification, the compiler additionally implemented the optimization to evaluate constant expressions at compile time.

For example, it supports the following quicksort testing code:

int qsort(int *a, int l, int r) {
    int i = l;
    int j = r;
    int p = a[(l+r)/2];
    int flag = 1;
    while (i <= j) {
        while (a[i] < p) i = i + 1;
        while (a[j] > p) j = j - 1;
        if (i > j) break;
        int u = a[i]; a[i] = a[j]; a[j] = u;
        i = i + 1;
        j = j - 1;
    }
    if (i < r) qsort(a, i, r);
    if (j > l) qsort(a, l, j);
}

int rand(int *state) {
    *state = *state * 5000087 + 198250529;
    return *state % 1000;
}

int initArr(int n, int *a) {
    int state = 474230941;
    int i = 0;
    while (i < n) {
        a[i] = rand(&state);
        i = i + 1;
    }
}

int isSorted(int n, int *a) {
    int i = 0;
    while (i < n-1) {
        if ( *(a+i) > *(a+i+1) )
            return 0;
        i = i + 1;
    }
    return 1;
}

int n = 1000000;
int a[1000000];

int main() {
    int* ap = (int*) a;
    initArr(n, ap);
    int sorted_before = isSorted(n, ap);
    qsort(ap, 0, n-1);
    int sorted_after = isSorted(n, ap);
    if (!(sorted_before==0 && sorted_after==1))
        return 1;
    return 0;
}

Grammar Specification

program
    : (function | declaration)*

function
    : type Identifier '(' parameter_list ')' (compound_statement | ';')

type
    : 'int'
    | type '*'

parameter_list
    : (type Identifier (',' type Identifier)*)?

compound_statement
    : '{' block_item* '}'

block_item
    : statement
    | declaration

statement
    : 'return' expression ';'
    | expression? ';'
    | 'if' '(' expression ')' statement ('else' statement)?
    | compound_statement
    | 'for' '(' expression? ';' expression? ';' expression? ')' statement
    | 'for' '(' declaration expression? ';' expression? ')' statement
    | 'while' '(' expression ')' statement
    | 'do' statement 'while' '(' expression ')' ';'
    | 'break' ';'
    | 'continue' ';'

declaration
    : type Identifier ('[' Integer ']')* ('=' expression)? ';'

expression_list
    : (expression (',' expression)*)?

expression
    : assignment

assignment
    : conditional
    | unary '=' expression

conditional
    : logical_or
    | logical_or '?' expression ':' conditional

logical_or
    : logical_and
    | logical_or '||' logical_and

logical_and
    : equality
    | logical_and '&&' equality

equality
    : relational
    | equality ('=='|'!=') relational

relational
    : additive
    | relational ('<'|'>'|'<='|'>=') additive

additive
    : multiplicative
    | additive ('+'|'-') multiplicative

multiplicative
    : unary
    | multiplicative ('*'|'/'|'%') unary

unary
    : postfix
    | ('-'|'~'|'!'|'&'|'*') unary
    | '(' type ')' unary

postfix
    : primary
    | Identifier '(' expression_list ')'
    | postfix '[' expression ']'

primary
    : Integer
    | '(' expression ')'
    | Identifier

lvalue rule & type checking rule: See tutorial.

Setting up the environment

As environment configuration is not mentioned in the tutorial, and it is tedious and can be time-consuming, I will briefly summarize the steps below. I work on a Google Cloud server with 8-core CPU, 32 GB Memory, 100 GB SSD, Ubuntu 20.04.

Rust

Follow the guide from https://www.rust-lang.org

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

RISCV GNU Toolchain

Follow the instructions from https://github.com/riscv/riscv-gnu-toolchain . The following sets up the environment ( download source code & compile) on my Google Cloud server in ~1h. Make sure your internet connection is fast!!! ( preferably >=10MiB/s)

git clone https://github.com/riscv/riscv-gnu-toolchain
sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev
sudo mkdir -p /opt/risc
cd riscv/riscv-gnu-toolchain/
sudo ./configure --prefix=/opt/riscv --with-arch=rv32im --with-abi=ilp32
sudo make -j8

qemu-riscv32

The following worked for the Google Cloud server:

sudo apt install qemu-user

If you cannot install qemu, you can modify check.sh and use riscv32-unknown-elf-run compiled previously in RISCV GNU toolchain instead. However, it will be extremely slow in step 12, when dealing with large global arrays and lots of instructions!!!

Compilation, Running and debugging

Compile

cargo run <source_file> > <assembly_output>
riscv32-unknown-elf-gcc <assembly_output> -o <binary_output>

Run

qemu-riscv32 <binary_output>
echo $?

Debug

In one terminal:

qemu-riscv32 <binary_output> -p <port>

In another terminal:

riscv32-unknown-elf-gdb <binary_output>
(gdb) target remote localhost:<port>

An implementation pitfall in step12: assembly for global array int a[1048576] should be like the following. This issue together with the inefficient riscv32-unknown-elf-run wasted me ~5h more on step 12 and made me bought the Google cloud server and reconfigured the environment :(

  .globl a
  .align 2
a:
  .zero 4194304
  .text

About

My MiniDecaf compiler. Project for THU compiler course 2020

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages