I plan on creating a web playground that will allow access to the compiler in the future. But for now the only way to use this compiler is to run it natively.
- Lexical analysis (Reading literals, keywords, operators, etc) {done}
- Parsing (Reading the logical structure of the code) {done}
- Semantic analysis (Type checking, variable validation, and the like) {in progress}
- Codegen (Creating the IR, linking the object files, creating a binary)
The goal of this project is to create a simple, yet powerful, compiler that can take in a file written in Micro-C and output an executable file. I chose to implement this fashion of C as to focus on core features such as loops, functions, etc. While not focusing on implementing every C language feature so that the result was still achievable.
I have been programming for a few years now and compilers always felt like this elusive black box of magic. I was extremely curious about how this thing may work as going from point A to B seemed so beyond me. I stand by the belief that anyone can achieve nearly anything so long as they work hard at it, so I began doing research. I would like to give a special thanks to Immo Landwerth on YouTube as I watched his livestream series where he writes Minsk and that gave me the inspiration to feel that it was possible to write my own compiler. I first wrote simple stuff like the .bf interpreter and .bf compiler on my GitHub. Then after doing more research I decided to start writing this compiler. I chose Rust as I wanted more experience in the language, and I really enjoy the language.
Compilers can be broken up into two main phases, frontend and backend. The frontend of a compiler deals with high level language concepts such as the rules for the language. The frontend is implemented from scratch as the functionality is highly specific to this language. This deals with tokenizing the source text program. Then representing the program as a tree and performing syntactic validation. The program will then be lowered into a more low level tree form, the mid-level intermediary representation or MLIR. This form is used to perform semantic analysis on the program, to both verify its integrity, and process it further. After this the program is briefly represented as a graph to perform control flow analysis on the program. If the program makes it to this point without rejection it is passed to the backend. The backend of the compiler works at a much lower level and deals with optimizing binary output and such. There are a ton of frameworks that make the backend much easier on the developer. The rust compiler uses LLVM. The one I am using is Cranelift, a compiler backend framework designed for the WasmTime runtime. I initially learned about this while studying the Saltwater compiler project. I chose this my backend framework as LLVM is extremely complex and very overkill for this project. Cranelift is much easier to pick up and still provides the necessary features for this project. After the backend is finished with its work it then emits a binary executable that can be run by the computer.
The compiler makes use of Rust testing with the annotation #[cfg(test)]
for conditional test compilation and #[test]
to mark test functions. As per
Rust conventions the tests are within the file of the items they are testing. There are an array of unit and integration
tests throughout the compiler to verify integrity of the system. The only code ever pushed to this repository is code
that passes all of these tests. In src/_c_test_files
you will see integration tests that correspond to which type
of test they are based on the directory they are within.