A compact compiler pipeline built with Flex and Bison that processes a C/C++-like subset language. The compiler performs complete lexical/syntax/semantic analysis, emits intermediate representation (IR) as quadruples, applies basic optimizations, and generates 16-bit 8086 assembly code compatible with MASM/TASM and emulators like emu8086.
- Complete compilation pipeline: From source code to assembly
- Multi-stage optimization: Copy propagation, algebraic simplification, dead code elimination
- 8086 assembly generation: MASM/TASM-style output for DOS emulators
- Educational focus: Clean, readable implementation for learning compiler construction
The compiler processes input through several stages:
- Lexical Analysis (
lexical.l
) - Tokenization for identifiers, literals, operators, keywords, and stream operators - Syntax Analysis (
syntax.y
) - Bison grammar parsing with semantic context building - Semantic Analysis (
headers/ts.h
) - Symbol table management and type checking - IR Generation (
headers/quads.h
) - Quadruple intermediate representation - Optimization (
headers/quads.h
) - Multiple optimization passes - Assembly Generation (
headers/assembly.h
) - 8086 code generation
.
├── headers/
│ ├── ts.h # Symbol table + semantic helpers
│ ├── quads.h # Quadruple IR + optimizer passes
│ ├── stack.h # String/int stacks for parser actions
│ └── assembly.h # 8086 assembly backend
├── lexical.l # Flex lexer specification
├── syntax.y # Bison parser grammar
├── input/
│ └── code.cpp # Source program to compile
└── (generated files)
├── lex.yy.c
├── syntax.tab.c / syntax.tab.h
├── negative.asm # Final assembly output
└── temp files: data.txt, code.txt
int
,float
char*
/std::string
(handled as String internally)- Basic template-style tokens (
IDF::IDF
) - Arrays with bounds checking
int a;
float b = 2.5;
int arr[n];
const int c = 3;
- Arithmetic:
+
,-
,*
,/
- Parentheses for precedence
- Integer/float/string literals
- Array indexing
- Assignment operators
if (condition) { ... }
for (init; condition; increment) { ... }
// Post-increment in statements
variable++;
std::cout << expression << std::endl;
Note: Recognized for parsing but not fully implemented in backend
namespace::function(args)
object.method(args)
object.attribute
object.method<T>(args)
The compiler uses quadruples stored in the quad[]
array:
typedef struct qdr {
char oper[100]; // Operation: "=", "+", "-", "*", "/", "BR", "BLE", etc.
char op1[100]; // First operand
char op2[100]; // Second operand
char res[100]; // Result
} qdr;
("=", "x", "", "y") // y = x
("+", "a", "b", "t1") // t1 = a + b
("BR", "12", "", "") // Unconditional jump to quad 12
("BLE", "20", "x", "y") // if x <= y then jump to quad 20
("BOUNDS", "0", "n-1", "") // Array bounds declaration
("ADEC", "arr", "", "") // Array declaration
The optimizer performs multiple passes until convergence or maximum iterations:
- Copy Propagation (
propagationDeCopie
) - Replace variables with their copied values - Algebraic Simplification (
simplificationAlgebrique
) - Simplify arithmetic expressions - Redundant Expression Elimination (
eliminationDeExpRedondante
) - Remove duplicate calculations - Dead Code Elimination (
eliminationDeCodeInutile
) - Remove unreachable code
Branch targets are automatically updated after transformations.
Generates segmented MASM/TASM-style assembly:
- Int:
DW
(16-bit word) - Float:
DD
(32-bit doubleword) - String:
DB 100 DUP (?)
(100-byte buffer) - Arrays: Calculated from
BOUNDS
+ADEC
quadruples
DATA SEGMENT
; Variable declarations
DATA ENDS
PILE SEGMENT STACK
; Stack space
PILE ENDS
CODE SEGMENT
MAIN:
; Program logic
MOV AX, 4C00h
INT 21h ; DOS termination
CODE ENDS
BR
→JMP
BLE
,BGE
,BG
,BL
,BE
,BNE
,BZ
→Jxx
conditionals
- Flex (lexical analyzer generator)
- Bison (parser generator)
- GCC (C compiler)
flex lexical.l
bison -d syntax.y
gcc lex.yy.c syntax.tab.c -o prog.exe
-
Prepare source code: Place your program in
./input/code.cpp
-
Run compiler:
./prog.exe
Or execute
run.bat
-
Output:
- Console: Symbol tables, IR quadruples, and optimization results
- File: Final assembly code written to
negative.asm
, you can run it using emu8086
# Build the compiler
flex lexical.l && bison -d syntax.y && gcc lex.yy.c syntax.tab.c -o compiler
# Write your source code to input/code.cpp
echo "int main() { int x = 5; return x; }" > input/code.cpp
# Compile
./compiler
# View generated assembly
cat negative.asm