Skip to content

Double-pass assembler, written in ANSI C90 for an imaginary 14-bit computer.

Notifications You must be signed in to change notification settings

Shm00lik/C-Assembler-Project

Repository files navigation

⚜️ Assembler written in C

Double-pass assembler, written in ANSI C90 for an imaginary 14-bit computer.


🎯 The Mission

The goal of the project is to write an assembler for an assembly language.
The assembler needs to convert the assembly language, which is defined below, into machine code, which is also defined below.


Assembler With Two Passes

First of all, the assembler spreads the macros in the code. This stage is called the preProcessor.
Then, spreading the macros, the assembler converts the assembly language into machine code.
It does so by going through the code twice:

  • First Pass - The assembler goes through the code and creates a symbol table, which contains all the labels in the code, and their addresses. It also converts all instructions into binary words, of course only those that do not contain a label, since it does not yet know their addresses.

  • Second Pass - The assembler goes through the code again, fills the missing lables addresses, and converts the binary code into machine code.


🖥️ The Imaginary Computer

The computer consists of a CPU, Registers, and RAM.
The CPU has 8 registers: r0, r1, r2, r3, r4, r5, r6, r7. The size of each register is 14 bits.
The RAN has has 256 (0 - 255) memory cells, and the size of each memory cell is also 14 bits.

A cell in memory is also called a word. Each machine instruction is encoded into a number of memory words.


📫 A Word

A word is a 14-bit number, which is divided into 6 parts, described in the following table:

Bit(s) 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Meaning param1 param2 opcode source operand addressing target operand addressing ERA

⚠️ I'm not going to explain the meaning of each part.



📜 The Assembly Language

Instructions:

The assembly language consists of 16 different instructions:

Instruction Opcode
mov 00
cmp 01
add 02
sub 03
not 04
clr 05
lea 06
inc 07
dec 08
jmp 09
bne 10
red 11
prn 12
jsr 13
rts 14
stop 15

Directives:

It is also consists of 4 different directives:

  • .data - Defines a sequence of integers.
  • .string - Defines a sequence of characters.
  • .entry - Defines a label as an entry point, so it can be uesd in other assembly files (.extern's brother).
  • .extern - Defines a label as an external label. It tells the assembler that this label is defined in other assembly file (.entry's brother).

💢 The Machine Code

The machine code consists of only two characters: . & /, where . represents 0 and / represents 1.


🎭 Types of Lines

There are 4 types of lines in the assembly language:

  • Empty Line - A line that contains nothing but whitespace characters (\t, space(s) or \n). The assembler ignores these lines.
  • Comment Line - A line that starts with a semicolon (;). The assembler ignores these lines.
  • Instruction Line - A line that contains an instruction. The assembler converts these lines into machine code.
  • Directive Line - A line that contains a directive. The assembler converts these lines into machine code.

🏷️ Labels

Each line can be followed by a label. Label is like a variable name, which can be used to reference a memory cell.

Example of a using label:

1| XYZ: mov r0, r1
2| bne LOOP(XYZ, r3)

🔬 Marcos

The assembly language supports macros.
A macro is a sequence of instructions, which can be called by a single instruction.
A macro is defined by the following syntax:

1| mcr MACRO_NAME
2|     MACRO_CODE
3| endmcr

And then can be called by just writing the macro's name:

MACRO_NAME

Therefor, the end result will be the same as if the macro's code was written instead of the macro's name.


That's it! 🎉

About

Double-pass assembler, written in ANSI C90 for an imaginary 14-bit computer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published