📟 16-bit Machine Code Assembler for the Hack Assembly Language.
Python Assembly
Switch branches/tags
Nothing to show
Clone or download
aalhour Updated README.md: add source of C-Instructions picture. All rights o…
…f the photo belong to nand2tetris.org and Coursera.
Latest commit 4be6585 Mar 31, 2016



Assembler.hack is a 16-bit machine language assembler for the 16-bit Hack Assembly Language. This was done as part of building a complete 16-bit computer from the grounds up through the book, and MOOC, Elementes of Computing Systems, which is informally known as nand2tetris. Hack is also the name of the computer.


Assembler.hack takes a program source code file written in the Hack Assembly Language (see: intro section below), which is a .asm text file, and then assembles it into binary machine code (Hack Machine Language). The assembled machine code program is then written to a new .hack text file with the same name.

The Assembling process is implemented in two passes. The first pass scans the whole program, registering the labels only in the Symbol Table. The second pass scans the whole program again, registering all variables in the Symbol Table, substituting the symbols with their respective memory and/or instruction addresses from the Symbol Table, generating binary machine code and then writing the assembled machine code to the new .hack text file.

Source code is organized into several components, the decisions for their names, interfaces and APIs were already specified in the book as sort of a specification-implementation contract. All components of the Assembler reside in the /Assembler directory, as follows:

  1. Assembler.py: Main module. Implements the two passes and glues the other components together.
  2. Parser.py: Simple Parser. Parses the instructions by looking ahead 1 or 2 characters to determine their types and structures.
  3. Lex.py: A simple Lexer which is used by the Parser to break an instruction to smaller parts and sturcture it in a way that makes it easy to convert it to machine code.
  4. Code.py: Generates binary machine code for instructions. For C-Instructions, it generates machine code for its constituting parts and then merges them back altogether.
  5. SymbolTable.py: Implements a lookup table which is used to register symbols (labels and variables) and look up their memory addresses.

How to Use:

$ python Assembler.py HelloWorld.asm


  • Python 3.5.1


Note: You might need to read the Intro to Hack Assembly Lang. section below to understand the instructions in Max.asm source code.


// Given two numbers stored in the registers R0 and R1,
// compute the maximum between them and store it in the R2 register.

  D=M              // D = first number
  D=D-M            // D = first number - second number
  D;JGT            // if D>0 (first is greater) goto output_first
  D=M              // D = second number
  0;JMP            // goto output_d

  D=M              // D = first number

  M=D              // M[2] = D (greatest number)

  0;JMP            // infinite loop




The Hack Assembly Language is minimal, it mainly consists of 3 types of instructions. It ignores whitespace and allows programs to declare symbols with a single symbol declaration instruction. Symbols can either be labels or variables. It also allows the programmer to write comments in the source code, for example: // this is a single line comment.

If you cannot contain your excitement then head over to the tests directory and check out the testing programs, .asm files contain programs written in the Hack Assembly Language, and .hack files contain their equivalent binary machine code programs (Hack Machine Language).

Predefined Symbols

  • A: Address Register.
  • D: Data Register.
  • M: Refers to the register in Main Memory whose address is currently stored in A.
  • SP: RAM address 0.
  • LCL: RAM address 1.
  • ARG: RAM address 2.
  • THIS: RAM address 3.
  • THAT: RAM address 4.
  • R0-R15: Addresses of 16 RAM Registers, mapped from 0 to 15.
  • SCREEN: Base address of the Screen Map in Main Memory, which is equal to 16384.
  • KBD: Keyboard Register address in Main Memory, which is equal to 24576.

Types of Instructions:

  1. A-Instruction: Addressing instructions.
  2. C-Instruction: Computation instructions.
  3. L-Instruction: Labels (Symbols) declaration instructions.


Symbolic Syntax:

@value, where value is either a decimal non-negative number or a Symbol.


  • @21
  • @R0
Binary Syntax:

0xxxxxxxxxxxxxxx, where x is a bit, either 0 or 1. A-Instructions always have their MSB set to 0.


  • 000000000001010
  • 011111111111111

Sets the contents of the A register to the specified value. The value is either a non-negative number (i.e. 21) or a Symbol. If the value is a Symbol, then the contents of the A register is set to the value that the Symbol refers to but not the actual data in that Register or Memory Location.


Symbols can be either variables or lables. Variables are symbolic names for memory addresses to make remembering these addresses easier. Labels are instructions addresses that allow multiple jumps in the program easier to handle. Symbols declaration is not a machine instruction because machine code doesn't operate on the level of abstraction of that of labels and variables, and hence it is considered a pseudo-instruction.

Declaring Variables:

Declaring variables is a straight forward A-Instruction, example:


The instruction @i declares a variable "i", and the instruction M=0 sets the memory location of "i" in Main Memory to 0, the address "i" was automatically generated and stored in A Register by the instruction.

Declaring Labels:

To declare a label we need to use the command (LABEL_NAME), where "LABEL_NAME" can be any name we desire to have for the label, as long as it's wraped between parentheses. For example:

  // ...
  // instruction 1
  // instruction 2
  // instruction 3
  // ...

The instruction (LOOP) declares a new label called "LOOP", the assembler will resolve this label to the address of the next instruction (A or C instruction) on the following line.

The instruction @LOOP is a straight-forward A-Instruction that sets the contents of A Register to the instruction address the label refers to, whereas the 0;JMP instruction causes an unconditional jump to the address in A Register causing the program to execute the set of instructions between (LOOP) and 0;JMP infinitely.


Symbolic Syntax:

dest = comp ; jmp, where:

  1. dest: Destination register in which the result of computation will be stored.
  2. comp: Computation code.
  3. jmp: The jump directive.


  • D=0
  • M=1
  • D=D+1;JMP
  • M=M-D;JEQ
Binary Syntax:

1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3, where:

  • 111 bits: C-Instructions always begin with bits 111.
  • a bit: Chooses to load the contents of either A register or M (Main Memory register addressed by A) into the ALU for computation.
  • Bits c1 through c6: Control bits expected by the ALU to perform arithmetic or bit-wise logic operations.
  • Bits d1 through d3: Specify which memory location to store the result of ALU computation into: A, D or M.
  • Bits j1 through j3: Specify which JUMP directive to execute (either conditional or uncoditional).

Performs a computation on the CPU (arithmetic or bit-wise logic) and stores it into a destination register or memory location, and then (optionally) JUMPS to an instruction memory location that is usually addressed by a value or a Symbol (label).


C-Instructions Reference

This C-Instruction reference image is taken from the nand2tetris Coursera MOOC.


This project is licensed under the MIT License.