This repository contains two major components of system software:
- A Compiler for "microC": a subset of the C language, that translates source code into machine-independent Three-Address Code (TAC).
- A Single-Pass Assembler for the SIC/XE (Simplified Instructional Computer / Extra Equipment) architecture that converts assembly language into an executable object file.
This project demonstrates core concepts in language translation, including lexical analysis, parsing, semantic actions, symbol table management, intermediate code generation, and assembler design.
- Language: Compiles a significant subset of C, including variables, pointers, arrays, functions, and control structures.
- Output: Generates machine-independent Three-Address Code (TAC), a common Intermediate Representation (IR).
- Technology: Built using Flex for lexical analysis and Bison for parsing.
- Symbol Table: Implements a multi-scope symbol table to manage variables, functions, and their attributes.
- Control Flow: Uses the backpatching technique to efficiently generate code for control flow statements (
if-else,while,do-while) in a single pass.
- Architecture: Targets the educational SIC/XE machine architecture.
- Design: Implemented as an efficient Single-Pass Assembler.
- Forward Reference Handling: Solves the forward reference problem without a second pass by using a generate-and-patch technique.
- Output: Produces a standard HTE (Header-Text-End) object file, ready to be loaded and executed by a SIC/XE loader.
.
├── compiler/
│ ├── lexer.l # Flex file for the Lexical Analyzer
│ ├── parser.y # Bison file for the Parser
│ ├── compiler.h # Header file for data structures and functions
│ ├── compiler.cpp # C++ implementation of backend logic
│ └── Makefile # Build script for the compiler
├── assembler/
│ ├── 1_Pass_Assembler.cpp # Main source code for the single-pass assembler
│ ├── input.txt # Example SIC/XE assembly source file
│ └── opcodes.txt # Table of SIC/XE opcodes and their hex values
└── README.md
- Languages: C++, Flex, Bison
- Compiler/Tools: g++, Make
- Core Concepts: Compilers, Assemblers, Linkers & Loaders, Data Structures
The compiler translates a source file written in microC into an intermediate representation known as Three-Address Code (TAC). This TAC can then be used by a backend to generate machine code for a specific architecture.
-
Lexical Analysis (Lexing):
The Flex file (lexer.l) defines rules to scan the source code and break it down into a stream of tokens (keywords, identifiers, operators). -
Syntax Analysis (Parsing):
The Bison file (parser.y) defines the context-free grammar of microC. It takes the token stream and builds a parse tree, verifying syntactic correctness. -
Semantic Actions & Code Generation:
Embedded within grammar rules are semantic actions that:- Manage the Symbol Table (track variables, types, sizes, and scopes).
- Perform Type Checking (ensure valid operations).
- Generate TAC using the
emit()function to create quads. - Backpatch control flow by resolving jump targets later.
Install flex, bison, and g++.
cd compiler
make # builds the compiler
./compiler < test.mcExample Input: test.mc
int main() {
int i = 10;
int a = 0;
while (i > 0) {
a = a + i;
i = i - 1;
}
return a;
}Output
A file containing the generated TAC and the Symbol Table.
The assembler translates a SIC/XE assembly language program into a machine-readable object file.
Its single-pass design improves efficiency by avoiding a second read of the source code.
-
Read and Generate
- The assembler processes the source code line-by-line.
- Object code is generated directly during this single pass.
-
Handling Forward References
When an instruction refers to a label not yet defined:- A placeholder address (e.g.,
0000) is written into the object code. - The assembler records the label and the file locations of these placeholders in an
unknown_list.
- A placeholder address (e.g.,
-
Resolve and Patch
- Once the label is defined, its address is added to the symbol table.
- Using file I/O (
seekp), the assembler revisits all placeholder positions in the object file and overwrites them with the correct address.
-
Completion
- When the
ENDdirective is encountered:- All unresolved references are patched.
- The final program length is updated in the Header record.
- When the
cd assembler
g++ -std=c++11 -o assembler 1_Pass_Assembler.cppRun
./assemblerInput Files
input.txt → The SIC/XE assembly source file (example provided).
opcodes.txt → Mapping of SIC/XE mnemonics to their hexadecimal opcodes.
Output Files
output.o → Final object file in HTE (Header-Text-End) format.
symtab.txt → Generated Symbol Table showing all labels and their addresses.