A complete compiler implementation for the Fun programming language, written in Scala 3. This project demonstrates the full compilation pipeline from source code to executable LLVM IR.
This is a functional programming language compiler that translates Fun source code into LLVM Intermediate Representation (IR), which can then be compiled to native machine code using LLVM tools like clang.
- Lexical Analysis: Custom tokenizer using regular expressions
- Parsing: Parser combinator-based recursive descent parser
- Type System: Static type checking with support for
Int,Double, andVoidtypes - CPS Transformation: Continuation-Passing Style transformation for proper compilation
- LLVM Code Generation: Direct compilation to LLVM IR
- Built-in Functions: Support for I/O operations (
print_int,print_char,print_string, etc.)
The Fun language supports:
- Integers (
Int): 32-bit signed integers - Floating-point (
Double): Double-precision floating-point numbers - Void: For functions with no return value
- Character constants: Single characters in single quotes
- String literals: String constants in double quotes
- Function definitions with typed parameters and return types
- Conditional expressions (
if-then-else) - Arithmetic operations:
+,-,*,/,% - Comparison operations:
==,!=,<,<=,>,>= - Function calls with argument passing
- Variable bindings with
valdeclarations - Sequential execution with semicolon (
;) operator
print_int(x: Int): Print an integerprint_char(c: Int): Print a characterprint_string(s: String): Print a stringprint_star(): Print an asteriskprint_space(): Print a spacenew_line(): Print a newlineskip(): No-op function
cflresit2025-kabir505-main/
├── README.md # This file
├── resit/
│ ├── resit.sc # Main compiler implementation
│ ├── fun_parser.sc # Parser combinator implementation
│ ├── fun_tokens.sc # Lexical analysis and tokenization
│ ├── examples/ # Example Fun programs
│ │ ├── fact.fun # Factorial computation
│ │ ├── hanoi.fun # Towers of Hanoi
│ │ ├── mand.fun # Mandelbrot set (basic)
│ │ ├── mand2.fun # Mandelbrot set (with chars)
│ │ └── sqr.fun # Square numbers
│ └── *.ll # Generated LLVM IR files
- Uses regular expressions to define language tokens
- Implements Brzozowski derivatives for efficient tokenization
- Supports keywords, identifiers, numbers, operators, and punctuation
- Parser combinator library for building recursive descent parsers
- Grammar rules for expressions, statements, and declarations
- Handles operator precedence and associativity
- Static type checking with type inference
- Type environment management
- Support for function types and variable bindings
- Converts direct-style expressions to Continuation-Passing Style
- Enables proper compilation of complex control flow
- Handles function calls, conditionals, and sequencing
- Direct translation from CPS intermediate representation to LLVM IR
- Type-aware code generation for integers and floating-point
- Proper handling of function calls and control flow
def fact(n: Int) : Int =
if n == 0 then 1 else n * fact(n - 1);
def facT(n: Int, acc: Int) : Int =
if n == 0 then acc else facT(n - 1, n * acc);
def top() : Void = {
print_int(fact(6));
print_char('\n')
};
top()def hanoi(n: Int, a: Int, b: Int, c: Int) : Void =
if n != 0 then {
hanoi(n - 1, a, c, b);
print_int(a);
print_char('-'); print_char('>');
print_int(b);
print_char('\n');
hanoi(n - 1, c, b, a)
} else skip();
hanoi(4,1,2,3)val Ymin: Double = -1.3;
val Ymax: Double = 1.3;
val Ystep: Double = 0.05;
def m_iter(m: Int, x: Double, y: Double, zr: Double, zi: Double) : Void = {
if Maxiters <= m
then print_star()
else {
if 4.0 <= zi*zi+zr*zr then print_space()
else m_iter(m + 1, x, y, x+zr*zr-zi*zi, 2.0*zr*zi+y)
}
};
y_iter(Ymin)- Scala 3.x or Ammonite (amm)
- LLVM/Clang for compiling generated IR to executable code
-
Compile Fun source to LLVM IR:
amm resit.sc -- fact.fun # or scala-cli resit.sc -- fact.fun -
Compile LLVM IR to executable:
clang fact.ll -o fact
-
Run the executable:
./fact
amm resit.sc -- run fact.funThis will:
- Parse the Fun source code
- Generate LLVM IR
- Compile to native executable
- Execute the program
- Type Inference: Automatic type inference for expressions
- Type Checking: Static type checking with error reporting
- Type Environment: Maintains variable and function type information
The compiler uses Continuation-Passing Style transformation to:
- Convert direct-style function calls to continuation-based calls
- Handle complex control flow properly
- Enable efficient code generation
- Type-aware compilation: Different code generation for
IntvsDouble - Function calls: Proper parameter passing and return value handling
- Control flow: Conditional branches and function returns
- Built-in functions: Integration with C standard library via
printf
- Parse errors: Detailed error messages with location information
- Type errors: Type mismatch detection and reporting
- Runtime errors: Graceful handling of undefined variables/functions
For the factorial function, the compiler generates:
define i32 @fact(i32 %n) {
%tmp_1 = icmp eq i32 %n, 0
br i1 %tmp_1, label %then_5, label %else_6
then_5:
ret i32 1
else_6:
%tmp_2 = sub i32 %n, 1
%tmp_3 = call i32 @fact(i32 %tmp_2)
%tmp_4 = mul i32 %n, %tmp_3
ret i32 %tmp_4
}This compiler demonstrates several advanced compiler techniques:
- Parser combinators for clean, composable parsing
- Regular expression derivatives for efficient lexical analysis
- CPS transformation for proper compilation of functional programs
- Type inference and static type checking
- Direct LLVM code generation without intermediate representations
The implementation is modular and extensible, making it easy to add new language features or optimize the generated code.