This project involved the development of an assembler for a specialized assembly language. Its primary aim was to convert human-readable assembly instructions into binary machine code, bridging the gap between high-level programming concepts and low-level execution on computers.
Implemented in ANSI C
, this project demonstrates a strong understanding of foundational programming principles. It was part of the 20465 System Programming Laboratory
course at The Open University of Israel studied during the 2021-2022 academic year, and achieved a grade of 98
.
Table of Contents
-
Preprocessing 🧹: The assembler supports preprocessing tasks, including macro expansion and line numbering.
-
Syntax Checking ✅: The assembler ensures syntax accuracy, checking for valid opcodes and operands.
-
Symbol Table 📚: The assembler generates a symbol table, computing label memory addresses.
-
Machine Code Generation 💻: The assembler produces the machine code and data images.
-
Output Files 📁: The assembler prints output files such as the machine code file, external data words file, and entry type symbols file.
-
Error Handling 🚨: The assembler handles various syntax and semantic errors, providing descriptive error messages, including line numbers and error types with clickable links to the relevant code.
-
Dynamic Memory Allocation 🧠: The assembler uses dynamic memory allocation to manage memory efficiently.
-
Modular Design 🧩: The assembler is designed with a modular architecture, with each module responsible for a specific task.
-
Coding Standards 📏: The assembler adheres to the project's coding standards, including naming conventions, indentation, and documentation.
-
Testing 🧪: The assembler is thoroughly tested, with a test suite that covers all possible scenarios, including
valgrind
memory leak checks with no errors.
The assembler now includes a new GUI, allowing users to assemble assembly code with a few clicks. The GUI is built with Gtk+
. It's written in c++
but integrates with the assembler's c
codebase using extern "C"
. This allows the assembler main
function to get services from the GUI, such as the input file path and output directory path without having to change the assembler's codebase.
Note: The GUI is currently only tested on Ubuntu 22.04
and MacOS Sonoma
and consider a work in progress. For stable usage, please use the command line interface or prevoius version of the Assembler.
Before runing the assembler, make sure you have gcc
and Gtk+
installed on your machine.
You can install Gtk+
on MacOS
using brew
:
brew install gtk+3
Use the assembler by providing an input file with assembly code. The output includes several files: a machine code file, an external data words file, and an entry type symbols file.
make
./assembler {input - without .as extension. e.g. input_example}
The screenshots below demonstrate the successful output files generated by the assembler from the input_example.as file:
-
Assembly Code Snippet (
ps.am
):; Assembly code that defines data, strings, and contains various instructions ; including 'add', 'prn', 'lea', 'inc', 'mov', 'sub', 'bne', 'cmp', 'dec', and 'stop'. .entry LIST .extern W MAIN: add r3,LIST LOOP: prn #48 macro m1 ; macro definition inc r6 mov r3, W endm lea STR, r6 m1 ; macro call sub r1, r4 bne END cmp vall, #-6 bne END[r15] dec K .entry MAIN sub LOOP[r10],r14 END: stop STR: .string "abcd" LIST: .data 6,-9 .data -100 .entry K K: .data 31 .extern va
note: the macro will be expanded in the preprocessor stage:
.entry LIST .extern W MAIN: add r3,LIST LOOP: prn #48 lea STR, r6 inc r6 mov r3, W sub r1, r4 bne END cmp vall, #-6 bne END[r15 dec K .entry MAIN sub LOOP[r10],r14 END: stop STR: .string "abcd" LIST: .data 6,-9 .data -100 .entry K K: .data 31 .extern vall
-
Entry Symbol Table (
input_example.ent
):; List of entry symbols and their addresses K,0144,0005 LIST,0144,0002 MAIN,0096,0004
-
External Symbol References (
input_example.ext
):; External symbols and their references in the code vall BASE 0125 vall OFFSET 0126 W BASE 0115 W OFFSET 0116
-
Machine Code Output (
input_example.ob
):; Binary representation of the assembly code 41 9 0100 A4-B0-C0-D0-E4 ... (additional lines of machine code) 0149 A4-B0-C0-D1-Ef
Below is a screenshot showing how the assembler handles various syntax and semantic errors from errors_example.as. Each error message is designed to be descriptive, guiding the user to identify and rectify the issues within the assembly code.
The error messages include issues like undefined operations, missing operands, invalid target registers, and failures to find symbols for direct addressing mode, showcasing the assembler's comprehensive error-checking capabilities.
The assembler includes several modules:
📝 pre.c
: Manages preprocessing tasks, including macro expansion and line numbering.🔎 syntax.c
: Ensures syntax accuracy, checking for valid opcodes and operands.🚦 first_pass.c
: Conducts the first assembly pass, generating a symbol table and computing label memory addresses.🚀 second_pass.c
: Performs the second pass, producing the machine code and data images.🖨️ print_output.c
: Prints output files such as the machine code file, external data words file, and entry type symbols file.🏁 main.c
: Coordinates the other modules to produce the final output.
Contributors are welcome! Fork the repository and submit a pull request with your changes. Please ensure your contributions are well-tested and adhere to the project's coding standards.
This project is licensed under the MIT License.