This repository contains a C++20 compiler project for the Compiler Systems Practice course. The compiler translates valid ToyC programs from standard input into RISC-V32 assembly on standard output.
ToyC is a small C-like language used by the course assignment. It supports integer variables and constants, functions, blocks, conditionals, loops, break, continue, return, expression evaluation, global declarations, and function calls.
The project now implements the correctness-oriented ToyC compiler pipeline:
- Parse valid ToyC programs into the AST.
- Run minimal semantic checks needed for correct code generation.
- Emit RISC-V32 assembly using stack slots and the basic integer calling convention.
The -opt flag is accepted for judge compatibility; optimization is intentionally limited until correctness is the stable baseline.
- C++20 compiler
- CMake
- Flex
- Bison 3.0 or newer
The course judge allows CMake or Make for C++ projects. This repository uses CMake and already includes Flex/Bison integration in CMakeLists.txt.
On macOS, /usr/bin/bison may be version 2.3, which is too old for this project. Install a newer Bison and make CMake find it before configuring:
brew install bison
cmake -S . -B build -DBISON_EXECUTABLE="$(brew --prefix bison)/bin/bison"cmake -S . -B build
cmake --build buildThe expected compiler executable is:
build/compilerThe compiler reads ToyC source code from stdin and writes RISC-V32 assembly to stdout.
build/compiler < input.tc > output.sThe judge may pass -opt during performance tests. The compiler accepts this flag; optimization behavior is added only after the correct implementation is stable.
build/compiler -opt < input.tc > output.sThe implemented pipeline is:
stdin ToyC source
-> Flex/Bison frontend
-> Abstract Syntax Tree
-> Minimal semantic analysis
-> Simple three-address IR
-> RISC-V32 assembly on stdout
Key design choices:
- Flex tokenizes ToyC source.
- Bison parses the grammar and constructs the AST directly.
- The AST uses a classic inheritance hierarchy with
std::unique_ptrownership. - Semantic analysis records only the facts required for correct code generation on valid ToyC programs.
- Control flow is lowered into a simple three-address IR with labels and branches.
- The first correct backend stores local variables, parameters, and temporaries in stack slots.
- Function calls follow the basic RISC-V calling convention.
- Global constants are folded at compile time; global variables are emitted in the data section.
- Assignment: original course requirements.
- Design: compiler architecture and implementation decisions.
- Roadmap: staged implementation plan for solo development.
- Test Plan: local correctness strategy and coverage order.
- Development Log: record of development problems and resolutions.
- Context: project glossary.
- ADRs: concise records of important architecture decisions.
Local correctness tests compare program exit codes. A ToyC sample passes when assembly generated by this compiler runs to the expected exit code, or to the same exit code as the equivalent C program compiled by gcc.
Initial coverage focuses on:
- Integer expressions and
return. - Local variables and assignment.
- Branches and loops.
- Short-circuit logic.
- Function calls and recursion.
- Global variables and constants.
- Integrated ToyC programs.
Development proceeds in small vertical slices. Each slice should leave the compiler buildable and add one end-to-end behavior before expanding the next feature area.
Problems discovered during implementation should be recorded in docs/development-log.md with the failing case, root cause, resolution, and verification result.