# A Toy Assembler

In the 2017 Princeton [Advanced Topics in Computer Science][1] seminar we were introduced to the [TOY][2] computer in [COS 126][3] lecture 18. Our evening's [assignment][5] was to implement [Hamming Codes][6] (that detect and correct single-bit errors) in the TOY machine language and I thought having an *assembler* would make the assignment so much easier. So, I wrote one.

[Lecture 18][4] makes the point that it is easy to use a computer to simulate a computer architecture and TOY has a [simple enough][7] instruction set (only 16 opcodes) that a Java version of the TOY simulator fits on one slide (`public class TOYlecture` code is embedded as a comment). I was able to adapt the Java version to Python over lunch, but completing a usable assembler by the evening's project time proved to be a longer endeavor!

## Assembler specification
Any assembler must have the capability to:
* Generate a valid file capable of being read by the [TOY][2] simulator.
* Specifiy an unambiguous lexicographical representations of all instructions.
* Parse and tokenize a valid input line.
* Assemble any valid instruction.
    * The basic 16 opcodes:
<pre>HALT
R[A] <- R[B] + R[C]
R[A] <- R[B] - R[C]
R[A] <- R[B] & R[C]
R[A] <- R[B] ^ R[C]
R[A] <- R[B] << R[C]
R[A] <- R[B] >> R[C]
R[A] <- LABEL
R[A] <- M[ LABEL ]
M[ LABEL ] <- R[A]
R[A] <- M[ R[B] ]
M[ R[A] ] <- R[B]
BRANCH=0 R[A] LABEL
BRANCH>0 R[A] LABEL
JUMP R[A]
LINK R[A] LABEL</pre>
    * Shorthand references to useful instructions:
<pre>
READ R[A]       # ≡ R[A] <- M[ 0xFF ]
WRITE R[A]      # ≡ M[ 0xFF ] <- R[A]
R[A] <- R[B]    # ≡ R[A] <- R[0] + R[B]
GOTO LABEL      # ≡ BRANCH=0 R[0] LABEL
</pre>
    * Control structures:
<pre>
 IF<>0 R[A]      # BRANCH=0 R[A] beyond ELSE or THEN
 WHILE<>0 R[A]   # BRANCH=0 R[A] beyond REPEAT
 IF<=0 R[A]      # BRANCH>0 R[A] beyond ELSE or THEN
 WHILE<=0 R[A]   # BRANCH>0 R[A] beyond REPEAT
 ELSE            # patch IF branch to here (.) and insert BRANCH=0 R[0] beyond THEN
 THEN            # patch IF or ELSE branch to here (.)
 REPEAT          # patch WHILE branch to here (.)
</pre>
    * Memory declaration and allocation:
<pre>
 : LABEL 12      # define LABEL to have value (e.g. 12) and patch forward references
 : LABEL         # define LABEL to have value here (.) and patch forward references
 ! LABEL 1234    # store value (e.g. 1234) at LABLE (must not be a forward reference)
 . 1234          # store value (e.g. 1234) at here (.) and increment here
 . LABEL 1234    # ≡ : LABEL followed by . 1234
</pre>
    * Comments:
<pre>
&#35; COMMENT
COMMENT
</pre>

* Handle named labels with forward references.
* Allow for structured code.
* Allow for comments, both line and end-of-line.
* Perform assemble-time error checking.

## Assembler design details
The TOY assembler was written hurriedly, so some features were left out in favor of simplicity. Any real TOY assembler would have to address these shortcomings. A list of [to-do](#todos)s is at the bottom of this cell.

### Generate a valid file capable of being read by the TOY simulator
The TOY simulator file format is simple. Any line matching the regular expression `^[0-9A-Fa-f]{2}:\s[0-9A-Fa-f]{4}` represents a TOY statement. All others are ignored. The assembler generates code of this form and includes the assembler statements (including *its* comments) as a comments.

### Specifiy an unambiguous lexicographical representations of all instructions
Each of the assembler's 32 valid instructions has an unambiguous lexicographical representation. Some have many variations (4096 variations on instructions with three registers or a register and an address, 256 variations on instructions with only two registers or only a label, and 16 variations on instructions with only a register).

### Parse and tokenize a valid input line
Parsing and tokenizing an input line can be complicated, depending on the syntax of the assembler. *This* assembler is kept simple by the following rules:
* Parse every line on `'#'` to split the `line` into `code` and `comment` parts (either of which can be empty) with:

   `line.split( '#' )[ 0 ].strip( )`.

* Tokenize `code` on spaces into an uppercase `ops` list wtih:

   `[ op.upper( ) for op in code.split( ' ' ) ]`.

* Any line that does not fit the lexicographical representation of an instruction is considered a comment and prepended wtih a `'#'` if one is not already there.

Making the parser / tokenizer split on spaces requires that instructions are, for example, expressed as five space-separated tokes (`R[2] <- M[ LABEL ]`), but the the parser / tokenizer becomes a single line of code in the assembler.

### Assemble any valid instruction
There are ≈20 functions of the form `self._isSomething( self, ops )` that are responsible for determining whether the `ops` list represents a valid instruction (`somethng`). All are tested in turn and if any return `True` that instruction is assembled at the location specified by the special label `PC`. The special label `PC` is often referred to as *here* or *.* (dot) and the memory declaration and allocation instructions all use it. Once a valid instruction is assembled at *here*, the value of the special label `PC` is incremented.

### Handle named labels with forward references
Useful assemblers offer the feature of named labels in code. Named labels obviate the need for code to be sprinkled with [magic numbers][8] *in lieu* of symbolic memory addresses. However, it is not always possible to know the *value* of a symbolic memory address before its *use* --- especially when such values depend on code. Such uses are [forward reference][9]s and, if they are not supported by the assembler, labeled symbolic memory addresses must be known *a priori* and become simply named magic numbers.

> In the assembler specification, any reference to `LABEL` can be replaced by a hexadecimal number on [0x00, 0xFF]. Labels are symbolic representations of memory address and must all be declared somewhere in the program to be assembled.

To implement forward references, any label referred to in the code is either (a) already declared and has a value, or (b) has yet to be declared and is a forward reference. The `self._labels` list contains either (a) integers, in which case the label is declared and its value is stored, or (b) lists, in which case it is the list of all memory address that must be patched to refer to the label's value once it is declared. (The TOY opcodes are relatively consistent and every instruction for which a forward reference is valid must have its lower byte patched with the value of the label once it is declared.)

### Allow for structured code
In 1968 [Edsger Dijkstra][10] ushered in the era of [structured programming][11] with a letter to CACM titled [*Go To Statement Considered Harmful*][12]. Since that time, it has been useful for computer languages to offer syntactic support for alternatives (`IF-ELSE-THEN`) and iteration (`WHILE`) with specific heirarchical nesting rules that avoid '[spaghetti code][13].'

To implement structured code:

* When `IF` or `WHILE` is encountered, a conditional branch instruction is assembled with an empty memory address and the memory address of that instruction is pushed on the `self._control` stack.
* When `THEN` or `REPEAT` is encountered, the address of the instruction to patch is popped from the `self._control` stack and that instruction is patched with the current value of *here*.
* When `ELSE` is encountered, the behavior is equivalent to `THEN` followed by `IF`, except that an unconditional branch instruction is assembled.

### Allow for comments, both line and end-of-line
If all of the functions of the form `self._isSomething( self, ops )` return `False` then the statement is saved as a comment at the memory location of the current program counter.

### Perform assemble-time error checking
As with most compilers, most assemblers perform assemble-time error checking. The error checking of this assembler is minimal.

* Invalid statements are converted to comments.
* The values of numbers, addresses, and lables must represent hexadecimal numbers on [0x00, 0xFF].
* In a `!` store label instruction, `LABEL` must be declared.
* After input is exhausted, all labels must be declared.
* After input is exhausted, control structures must be balanced.
> No check is made whether `ELSE` matches with `IF`, `THEN` matches with `IF` or `ELSE`, or `REPEAT` matches with `WHILE`.

<a id='todos'>
<h2>To do</h2>
</a>

* Improve the parser / tokenizer to not require spaces between tokens.
* Allow values to be saved in forward declared memory locations.
* Make it a syntax error if control structures do not match.
* Handle out-of-memory errors.
* Improve error reporting on syntax errors.

[1]:http://advanced-topics.cs.princeton.edu/
[2]:http://lift.cs.princeton.edu/xtoy/
[3]:http://www.cs.princeton.edu/courses/archive/spring17/cos126/
[4]:http://www.cs.princeton.edu/courses/archive/spring17/cos126/lectures/CS.18.MachineII.pdf
[5]:http://www.cs.princeton.edu/courses/archive/spring17/cos126/assignments/hamming.html
[6]:https://en.wikipedia.org/wiki/Hamming_code
[7]:http://introcs.cs.princeton.edu/java/60machine/reference.txt
[8]:https://en.wikipedia.org/wiki/Magic_number_(programming)
[9]:https://en.wikipedia.org/wiki/Forward_declaration#Forward_reference
[10]:https://en.wikipedia.org/wiki/Edsger_W._Dijkstra
[11]:https://en.wikipedia.org/wiki/Structured_programming
[12]:http://homepages.cwi.nl/~storm/teaching/reader/Dijkstra68.pdf
[13]:http://catb.org/jargon/html/S/spaghetti-code.html

In [1]:
#!/usr/bin/env python3
#
# Mount Google Drive, download remote files, and set local values.
#
# https://github.com/dcpetty/google-colaboratory/blob/main/remote-files/remote-files.ipynb
#
from os.path import join, realpath
dot_path = realpath('.')
gdrive = join(dot_path, 'gdrive')
notebooks = join(gdrive, 'My Drive/Colab Notebooks')
repo = 'toy'        # repo path within the Drive/Colab Notebooks directory
drive_path = join(notebooks, repo)

# https://colab.research.google.com/github/AllenDowney/ThinkPython/blob/v3/chapters/chap04.ipynb
from os.path import basename, exists
def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print("Downloaded " + str(local))
    return filename

# Mount Google Drive for accessing files in the same direcory as this notebook.
# https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
from google.colab import drive
drive.mount(gdrive)
# %cd drive_path   # cd so . is repo_path

# Download remote files for accessing in the runtime directory of this notebook.
download('https://raw.githubusercontent.com/dcpetty/google-colaboratory/refs/heads/main/toy/fibonacci.asm')
download('https://raw.githubusercontent.com/dcpetty/google-colaboratory/refs/heads/main/toy/test.asm')

# Set global variables
verbose = False # whether to print data

Mounted at /content/gdrive
Downloaded fibonacci.asm
Downloaded test.asm


The `fibonacci.asm` program assembler input is:

```
#################################################################
#
# Fibonacci program, loaded at location 0x40
#
GOTO START              # TOY simulator requires instruction at 0x10
: PC 40
: START
R[1] <- 01              # R1 <- decrement
R[9] <- M[ COUNT ]      # R9 <- COUNT (forward reference)
R[A] <- 00              # RA <- previous Fibonacci number
R[B] <- 01              # RB <- current Fibonacci number
WRITE R[B]              # send initial Fibonacci number to stdout
WHILE<>0 R[9]
    R[C] <- R[A] + R[B] # calculate next Fibonacci number
    R[A] <- R[B]        # shift previous Fibonacci number
    R[B] <- R[C]        # shift current Fibonacci number
    WRITE R[B]          # send current Fibonacci number to stdout
    R[9] <- R[9] - R[1] # decrement counter
REPEAT
: STOP
HALT                    # the HALT statement
. COUNT 0017
```

The `fibonacci.toy` assembler output is:

```
           #################################################################
           #
           # Fibonacci program, loaded at location 0x40
           #
           GOTO START              # TOY simulator requires instruction at 0x10
10: C040
           : PC 40
           : START
           R[1] <- 01              # R1 <- decrement
40: 7101
           R[9] <- M[ COUNT ]      # R9 <- COUNT (forward reference)
41: 894D
           R[A] <- 00              # RA <- previous Fibonacci number
42: 7A00
           R[B] <- 01              # RB <- current Fibonacci number
43: 7B01
           WRITE R[B]              # send initial Fibonacci number to stdout
44: 9BFF
           WHILE<>0 R[9]
45: C94C
               R[C] <- R[A] + R[B] # calculate next Fibonacci number
46: 1CAB
               R[A] <- R[B]        # shift previous Fibonacci number
47: 1AB0
               R[B] <- R[C]        # shift current Fibonacci number
48: 1BC0
               WRITE R[B]          # send current Fibonacci number to stdout
49: 9BFF
               R[9] <- R[9] - R[1] # decrement counter
4A: 2991
           REPEAT
4B: C045
           : STOP
           HALT                    # the HALT statement
4C: 0000
           . COUNT 0017
4D: 0017
```

TOY memory consists of 256 16-bit [cell](https://forth-standard.org/standard/notation)s and TOY arithmetic is [two's complement](https://en.wikipedia.org/wiki/Two's_complement). The `WRITE` operation writes values to `stdout` as hexadecimal and as signed 16-bit [two's complement](https://en.wikipedia.org/wiki/Two's_complement) numbers. Negative values are also written in their *unsigned* 16-bit representation.

The greatest value in the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_sequence) $< 2^{16}$ is $46368$ &mdash; which is the *signed* 16-bit [two's complement](https://en.wikipedia.org/wiki/Two's_complement) number $-19168$.

The Princeton [TOY simulator](https://lift.cs.princeton.edu/xtoy/) has an error check for overflow. On the 23rd calculation, it generates the following:

```
A runtime error has occurred at address 45:
The result of an arithmatic operation was not between between -32768 and 32767.
```

Changing line 22 of `fibonacci.asm` to `. COUNT 0016` will calculate the [Fibonacci sequence](https://en.wikipedia.org/wiki/Fibonacci_sequence) only up to $28657$ with no overflow.

## [`Toy`](http://lift.cs.princeton.edu/xtoy/) API

| Function | Description |
| --- | --- |
| `Toy()` | `Toy` constructor |
| `run(pc=0x10)` | Run the simulator |
| `asm(program, pc=0x10)` | Assemble the program text (where `program` is a `Path`, a `str`, or a `list`) |
| `listing(path=None)` | List the assembled program to `path` or `print` listing if `None` |

In [2]:
#!/usr/bin/env python3
#
# toy.py
#

"""Simulator and assembler for the Princeton COS 126 TOY computer."""

import re

__author__ = "David C. Petty"
__copyright__ = "Copyright 2017, David C. Petty"
__license__ = "https://creativecommons.org/licenses/by-nc-sa/4.0/"
__version__ = "0.0.2"
__maintainer__ = "David C. Petty"
__email__ = "dpetty@winchsterps.org"
__status__ = "Hack"


class TOY:
    """Simulator and assembler for the Princeton COS 126 TOY computer."""

    def __init__(self):
        """Initialize a Toy program."""
        # Symbolic constants
        self._register = {
            'R[0]': 0x0, 'R[1]': 0x1, 'R[2]': 0x2, 'R[3]': 0x3,
            'R[4]': 0x4, 'R[5]': 0x5, 'R[5]': 0x6, 'R[7]': 0x7,
            'R[8]': 0x8, 'R[9]': 0x9, 'R[A]': 0xA, 'R[B]': 0xB,
            'R[C]': 0xC, 'R[D]': 0xD, 'R[E]': 0xE, 'R[F]': 0xF, }
        self._operation = {
            '+': 1, '-': 2, '&': 3, '^': 4, '<<': 5, '>>': 6, }
        self._memory = ['M[', ]
        self._noop = ['<-', ']', ]
        self._jump = {
            'BRANCH=0': 0xC, 'BRANCH>0': 0xD, 'JUMP': 0xE, 'LINK': 0xF, }
        self._branch = [
            'IF<>0', 'IF<=0', 'ELSE', 'THEN', 'WHILE<>0', 'WHILE<=0', 'REPEAT',
        ]
        self._init()

    def _init(self):
        """Initialize assembly variables."""
        self.memory = [0, ] * 256                   # TOY memory
        self.registers = [0, ] * 16                 # TOY registers
        self.code = ''                              # TOY code listing
        self._statements = [[] for _ in range(256)]  # list of statement lists
        self._labels = {}                           # dict of labels
        self._control = []                          # control structure stack

    # ///////////////////////////// SIMULATOR //////////////////////////////

    # // http://www.cs.princeton.edu/courses/archive/spring17/cos126/lectures/CS.18.MachineII.pdf
    # public class TOYlecture
    # {
    #    // TOY simulator
    #    public static void main(String[] args)
    #    {
    #        int pc = 0x10; // program counter
    #        int[] R = new int[16]; // registers
    #        int[] M = new int[256]; // main memory
    #        In in = new In(args[0]);
    #        for (int i = 0x10; i < 0xFF && !in.isEmpty(); i++)
    #            M[i] = Integer.parseInt(in.readString(), 16);
    #        while (true)
    #            int op = (ir >> 12) & 0xF; // opcode (bits 12-15)
    #            int d = (ir >> 8) & 0xF; // dest d (bits 08-11)
    #            int s = (ir >> 4) & 0xF; // source s (bits 04-07)
    #            int t = (ir >> 0) & 0xF; // source t (bits 00-03)
    #            int addr = (ir >> 0) & 0xFF; // addr (bits 00-07)
    #            if (op == 0) break; // halt
    #            switch (op)
    #            {
    #                case 1: R[d] = R[s] + R[t]; break;
    #                case 2: R[d] = R[s] - R[t]; break;
    #                case 3: R[d] = R[s] & R[t]; break;
    #                case 4: R[d] = R[s] ^ R[t]; break;
    #                case 5: R[d] = R[s] << R[t]; break;
    #                case 6: R[d] = R[s] >> R[t]; break;
    #                case 7: R[d] = addr; break;
    #                case 8: R[d] = M[addr]; break;
    #                case 9: M[addr] = R[d]; break;
    #                case 10: R[d] = M[R[t]]; break;
    #                case 11: M[R[t]] = R[d]; break;
    #                case 12: if (R[d] == 0) pc = addr; break;
    #                case 13: if (R[d] > 0) pc = addr; break;
    #                case 14: pc = R[d]; break;
    #                case 15: R[d] = pc; pc = addr; break;
    #            }
    #        }
    #    }
    # }

    # Toy simulator (after COS 126 'Toy simulator in Java' from Lecture 18).
    def _store(self, M, addr, value):
        """Store value at M[ addr ] and handle memory-mapped output at 0xFF."""
        M[addr] = value
        if addr == 0xFF:
            f = '0x{v:04x}' + ('{n:7d} ({v:5d})' if value & 1<<15 else '{v:7d}')
            print(f.format(v=value, n=-(~value + 1 & 0xffff)))

    def _load(self, M, addr):
        """Return value at M[ addr ] and handle memory-mapped input at 0xFF."""
        if addr == 0xFF:
            M[addr] = int(raw_input('input? '))
        return M[addr]

    def run(self, pc=0x10):
        """Run program in memory starting at pc."""
        M, R, PC = self.memory, self.registers, pc
        while(True):
            ir = M[PC]
            PC += 1
            op = (ir >> 12) & 0xF   # opcode (bits 12-15)
            d = (ir >> 8) & 0xF     # dest d (bits 08-11)
            s = (ir >> 4) & 0xF     # source s (bits 04-07)
            t = (ir >> 0) & 0xF     # source t (bits 00-03)
            addr = (ir >> 0) & 0xFF # addr (bits 00-07)
            if (op == 0):
                break   # halt
            elif op == 1:
                R[d] = R[s] + R[t]
            elif op == 2:
                R[d] = R[s] - R[t]
            elif op == 3:
                R[d] = R[s] & R[t]
            elif op == 4:
                R[d] = R[s] ^ R[t]
            elif op == 5:
                R[d] = R[s] << R[t]
            elif op == 6:
                R[d] = R[s] >> R[t]
            elif op == 7:
                R[d] = addr
            elif op == 8:
                # R[d] = M[addr]
                R[d] = self._load(M, addr)
            elif op == 9:
                # M[addr] = R[d]
                self._store(M, addr, R[d])
            elif op == 10:
                # R[d] = M[R[t]]
                R[d] = self._load(M, R[t])
            elif op == 11:
                # M[R[t]] = R[d]
                self._store(M, addr, R[t])
            elif op == 12:
                if (R[d] == 0):
                    PC = addr
            elif op == 13:
                if (R[d] > 0):
                    PC = addr
            elif op == 14:
                PC = R[d]
            elif op == 15:
                R[d] = PC
                PC = addr

    # ///////////////////////////// ASSEMBLER //////////////////////////////

    def _isNumber(self, op):
        """Return True if op is (signed) hexadecimal, otherwise False."""
        return bool(re.match('[-+0-9A-F]+$', op))

    def _isLabel(self, op):
        """Return True if op not a register symbol, otherwise False."""
        return op not in self._register

    def _isHalt(self, ops):
        """HALT"""
        return len(ops) == 1 and ops[0] == 'HALT'

    def _isOperation(self, ops):
        """R[A] <- R[B] + R[C]
        R[A] <- R[B] - R[C]
        R[A] <- R[B] & R[C]
        R[A] <- R[B] ^ R[C]
        R[A] <- R[B] << R[C]
        R[A] <- R[B] >> R[C]"""
        return (len(ops) == 5 and
                all(ops[i] in self._register for i in [0, 2, 4, ]) and
                ops[3] in self._operation)

    def _isMove(self, ops):
        """R[A] <- R[B]"""
        return (len(ops) == 3 and
                all(ops[i] in self._register for i in [0, 2, ]))

    def _isLoadAddress(self, ops):
        """R[A] <- LABEL"""
        return (len(ops) == 3 and
                ops[0] in self._register and
                (self._isNumber(ops[2]) or self._isLabel(ops[2])))

    def _isLoad(self, ops):
        """R[A] <- M[ LABEL ]"""
        return (len(ops) == 5 and
                ops[0] in self._register and
                ops[2] in self._memory and
                (self._isNumber(ops[3]) or self._isLabel(ops[3])))

    def _isStore(self, ops):
        """M[ LABEL ] <- R[A]"""
        return (len(ops) == 5 and
                ops[0] in self._memory and
                (self._isNumber(ops[1]) or self._isLabel(ops[1])) and
                ops[4] in self._register)

    def _isRead(self, ops):
        """READ R[A]"""
        return (len(ops) == 2 and
                ops[0] == 'READ' and ops[1] in self._register)

    def _isWrite(self, ops):
        """WRITE R[A]"""
        return (len(ops) == 2 and
                ops[0] == 'WRITE' and ops[1] in self._register)

    def _isLoadIndirect(self, ops):
        """R[A] <- M[ R[B] ]"""
        return (len(ops) == 5 and
                ops[0] in self._register and
                ops[2] in self._memory and ops[3] in self._register)

    def _isStoreIndirect(self, ops):
        """M[ R[A] ] <- R[B]"""
        return (len(ops) == 5 and
                ops[0] in self._memory and
                ops[1] in self._register and ops[4] in self._register)

    def _isJump(self, ops):
        """JUMP R[A]"""
        return (len(ops) == 2 and
                ops[0] in self._jump and ops[1] in self._register)

    def _isJumpLink(self, ops):
        """BRANCH=0 R[A] LABEL
        BRANCH>0 R[A] LABEL
        LINK R[A] LABEL"""
        return (len(ops) == 3 and
                ops[0] in self._jump and ops[1] in self._register and
                self._isLabel(ops[2]))

    def _isGoto(self, ops):
        """GOTO LABEL"""
        return (len(ops) == 2 and
                ops[0] == 'GOTO' and self._isLabel(ops[1]))

    def _isLabelRelative(self, ops):
        """: LABEL"""
        return len(ops) == 2 and ops[0] == ':'

    def _isLabelAbsolute(self, ops):
        """: LABEL 12"""
        return (len(ops) == 3 and
                ops[0] == ':' and   # RED_FLAG: requires a number
                self._isNumber(ops[2]))

    def _isStoreLabel(self, ops):
        """! LABEL 1234"""
        return (len(ops) == 3 and
                ops[0] == '!' and   # RED_FLAG: requires a number
                self._isLabel(ops[1]) and self._isNumber(ops[2]))

    def _isAllocate(self, ops):
        """. 1234"""
        return (len(ops) == 2 and
                ops[0] == '.' and   # RED_FLAG: requires a number
                self._isNumber(ops[1]))

    def _isAllocateLabel(self, ops):
        """. LABEL 1234"""
        return (len(ops) == 3 and
                ops[0] == '.' and   # RED_FLAG: requires a number
                self._isLabel(ops[1]) and self._isNumber(ops[2]))

    def _isIfWhileNotEqual(self, ops):
        """IF<>0 R[A]
        WHILE<>0 R[A]"""
        return (len(ops) == 2 and
                ops[0] in ['IF<>0', 'WHILE<>0', ] and
                ops[1] in self._register)

    def _isIfWhileLessThan(self, ops):
        """IF<=0 R[A]
        WHILE<=0 R[A]"""
        return (len(ops) == 2 and
                ops[0] in ['IF<=0', 'WHILE<=0', ] and
                ops[1] in self._register)

    def _isWord(self, ops, word):
        """WORD"""
        return (len(ops) == 1 and ops[0] == word)

    def _address(self, op):
        """Return op if number, value if defined label, or add forward
        reference and return zero."""
        if self._isNumber(op):
            value = int(op, 16)
            assert value == value & 0xFF
            return int(op, 16)
        if op in self._labels and isinstance(self._labels[op], int):
            return self._labels[op]
        if op not in self._labels:
            self._labels[op] = []
        self._labels[op].append(self._labels['PC'])
        return 0x00

    def _patch(self, label, value):
        """Return value, after update label and resolve forward references."""
        assert value == value & 0xFF
        if label in self._labels and isinstance(self._labels[label], list):
            for addr in self._labels[label]:
                self.memory[addr] = self.memory[addr] & 0xFF00 | value
        self._labels[label] = value
        return value

    def _increment(self):
        """Increment PC."""
        self._labels['PC'] += 1

    def _saveStatement(self, address, line):
        """Save statement line at address."""
        self._statements[address].append(line.rstrip())

    def _assemble(self, op, line):
        """Assemble op into memory, _increment PC, and save statment line."""
        pc = self._labels['PC']
        self.memory[pc] = op
        self._saveStatement(pc, line)
        self._increment()

    # http://introcs.cs.princeton.edu/java/60machine/reference.txt
    # http://introcs.cs.princeton.edu/java/62toy/
    def asm(self, program, pc=0x10):
        """Assemble program (Path or str or list) into memory at pc."""
        self._init()
        self._labels['PC'] = pc
        # Process lines of program.
        from pathlib import Path
        if isinstance(program, Path):
            with open(program, 'r') as f:
                lines = f.readlines()
        else: lines = \
            program.splitlines() if isinstance(program, str) else program
        for line in lines:
            # Parse lines on spaces and remove comments on #.
            code = line.split('#')[0].strip()
            ops = [op.upper() for op in code.split(' ')]
            # Check each type of instruction.
            # HALT
            if self._isHalt(ops):
                op = 0
                self._assemble(op * 4096, line)
            # R[A] <- R[B] + R[C]
            # R[A] <- R[B] - R[C]
            # R[A] <- R[B] & R[C]
            # R[A] <- R[B] ^ R[C]
            # R[A] <- R[B] << R[C]
            # R[A] <- R[B] >> R[C]
            elif self._isOperation(ops):
                rD, arrow, rS, op, rT = (self._register[ops[0]], ops[1],
                                         self._register[ops[2]],
                                         self._operation[ops[3]],
                                         self._register[ops[4]])
                self._assemble(op * 4096 + rD * 256 + rS * 16 + rT * 1, line)
            # R[A] <- R[B]
            elif self._isMove(ops):
                op, rD, arrow, rS = (1, self._register[ops[0]], ops[1],
                                     self._register[ops[2]])
                self._assemble(op * 4096 + rD * 256 + rS * 16, line)
            # R[A] <- LABEL
            elif self._isLoadAddress(ops):
                op, rD, arrow, addr = (7, self._register[ops[0]],
                                       ops[1], self._address(ops[2]))
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # R[A] <- M[ LABEL ]
            elif self._isLoad(ops):
                op, rD, arrow, mem, addr, bracket = (8, self._register[ops[0]],
                                                     ops[1], ops[2],
                                                     self._address(ops[3]),
                                                     ops[4])
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # M[ LABEL ] <- R[A]
            elif self._isStore(ops):
                op, mem, addr, bracket, arrow, rD = (9, ops[0],
                                                     self._address(ops[1]),
                                                     ops[2], ops[3],
                                                     self._register[ops[4]])
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # READ R[A]
            elif self._isRead(ops):
                op, name, rD, addr = 8, ops[0], self._register[ops[1]], 0xFF
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # WRITE R[A]
            elif self._isWrite(ops):
                op, name, rD, addr = 9, ops[0], self._register[ops[1]], 0xFF
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # R[A] <- M[ R[B] ]
            elif self._isLoadIndirect(ops):
                op, rD, arrow, mem, rT, bracket = (0xA, self._register[ops[0]],
                                                   ops[1], ops[2],
                                                   self._register[ops[3]],
                                                   ops[4])
                self._assemble(op * 4096 + rD * 256 + rT * 1, line)
            # M[ R[A] ] <- R[B]
            elif self._isStoreIndirect(ops):
                op, mem, rT, bracket, arrow, rD = (0xB, ops[0],
                                                   self._register[ops[1]],
                                                   ops[2], ops[3],
                                                   self._register[ops[4]])
                self._assemble(op * 4096 + rD * 256 + rT * 1, line)
            # JUMP R[A]
            elif self._isJump(ops):
                op, jump, rD = 0xE, ops[0], self._register[ops[1]]
                self._assemble(op * 4096 + rD * 256, line)
            # BRANCH=0 R[A] LABEL
            # BRANCH>0 R[A] LABEL
            # LINK R[A] LABEL
            elif self._isJumpLink(ops):
                jump, rD, addr = (ops[0], self._register[ops[1]],
                                  self._address(ops[2]))
                op = self._jump[jump]
                self._assemble(op * 4096 + rD * 256 + addr * 1, line)
            # GOTO LABEL
            elif self._isGoto(ops):
                op, rD, addr = (0xC, self._register['R[0]'],
                                self._address(ops[1]))
                self._assemble(op * 4096 + rD * 256 + addr, line)
            # IF<>0 R[A]
            # WHILE<>0 R[A]
            elif self._isIfWhileNotEqual(ops):
                op, rD = 0xC, self._register[ops[1]]
                self._control.append(self._labels['PC'])
                self._assemble(op * 4096 + rD * 256, line)
            # IF<=0 R[A]
            # WHILE<=0 R[A]
            elif self._isIfWhileLessThan(ops):
                op, rD = 0xD, self._register[ops[1]]
                self._control.append(self._labels['PC'])
                self._assemble(op * 4096 + rD * 256, line)
            # ELSE
            elif self._isWord(ops, 'ELSE'):
                branch, here = self._control.pop(), self._labels['PC']
                op, rD = 0xC, self._register['R[0]']
                self._control.append(here)
                self._assemble(op * 4096 + rD * 256, line)
                self.memory[branch] = self.memory[branch] + here + 1
            # THEN
            elif self._isWord(ops, 'THEN'):
                branch, here = self._control.pop(), self._labels['PC']
                self.memory[branch] = self.memory[branch] + here
                self._saveStatement(here, line)
            # REPEAT
            elif self._isWord(ops, 'REPEAT'):
                branch, here = self._control.pop(), self._labels['PC']
                op, rD = 0xC, self._register['R[0]']
                self._assemble(op * 4096 + rD * 256 + branch, line)
                self.memory[branch] = self.memory[branch] + here + 1
            # : LABEL
            elif self._isLabelRelative(ops):
                colon, label, pc = ops[0], ops[1], self._labels['PC']
                self._patch(label, pc)
                self._saveStatement(self._labels['PC'], line)
            # : LABEL 12
            elif self._isLabelAbsolute(ops):
                colon, label, value = ops[0], ops[1], int(ops[2], 16)
                self._patch(label, value)
                self._saveStatement(self._labels['PC'], line)
            # ! LABEL 1234
            elif self._isStoreLabel(ops):
                bang, label, value = ops[0], ops[1], int(ops[2], 16) & 0xFFFF
                assert label in self._labels, '! missing label: ' + label
                self.memory[self._labels[label]] = value
                self._saveStatement(self._labels[label], line)
            # . 1234
            elif self._isAllocate(ops):
                dot, value, pc = (ops[0], int(ops[1], 16) & 0xFFFF,
                                  self._labels['PC'])
                self.memory[pc] = value
                self._saveStatement(pc, line)
                self._increment()
            # . LABEL 1234
            elif self._isAllocateLabel(ops):
                dot, label, value, pc = (ops[0], ops[1],
                                         int(ops[2], 16) & 0xFFFF,
                                         self._labels['PC'])
                self.memory[self._patch(label, pc)] = value
                self._saveStatement(pc, line)
                self._increment()
            # # COMMENT
            elif line:
                if not re.match(r'\s*#', line):
                    line = '# {}'.format(line)
                self._saveStatement(self._labels['PC'], line)
        # Make sure all labels are defined.
        for label in self._labels:
            assert not isinstance(self._labels[label], list), \
                'undefined label: ' + label
        # Make sure all control structures are balanced.
        assert len(self._control) == 0, 'control structures not balanced'

    def listing(self, filename=None):
        """Save listing of compiled program in filename (or print if None)."""
        self.code = ''
        for i in range(256):
            if self.memory[i] or self._statements[i]:
                for statement in self._statements[i]:
                    if statement is not None:
                        self.code += '{}{}\n'.format(' ' * 11, statement)
                self.code += '{:02X}: {:04X}\n'.format(i, self.memory[i])
        # Write or print self.code.
        if filename:
            with open(filename, 'w') as outFile:
                outFile.write(self.code)
        else:
            print(self.code)


# Test

The `test.asm` program tests the tokenizer and parser, each of the 24 statement types, control structures, forward references, and comments. The assembler input is:

```
#################################################################
#
# Test of random other code...
#
NOT A VALID STATEMENT, SO IT IS A COMMENT
: start                             # can be upper / lower case
R[1] <- 1
IF<>0 R[D]
    M[ COUNT ] <- R[D]  # test
    R[D] <- M[ R[D] ]   # test
    M[ R[D] ] <- R[D]   # test
    R[D] <- M[ COUNT ]  # test
    R[2] NOOP_TOKENS M[ COUNT ARE_NOT_CHECKED   # test
ELSE
    R[A] <- R[B] + R[C]
    R[A] <- R[B] - R[C]
    R[A] <- R[B] & R[C]
    R[A] <- R[B] ^ R[C]
    R[A] <- R[B] << R[C]
    R[A] <- R[B] >> R[C]
    BRANCH=0 R[D] START
    BRANCH>0 R[D] END
    JUMP R[D]
    LINK R[D] START
    R[E] <- M[ R[F] ]
    M[ R[E] ] <- R[F]
    WHILE<>0 R[E]
        R[E] <- R[E] - R[1]
    REPEAT
THEN
: END
GOTO START
. 1234
. COUNT 5678
. THIS abcd                         # Where is this comment
: THAT                              # Where is that comment
! THAT -3F22                        # And this?
```

The `test.py` program is responsible for: assembling and listing `test.ash` and assembling, listing, and running `fibonacci.asm`.

In [3]:
#!/usr/bin/env python3
#
# test.py
#

"""Minimal test code for toy.py."""

# from toy import TOY
from os.path import join
from pathlib import Path

toy = TOY()

print('################################ TEST ################################')
print()

print('#' * 10, 'Assemble test.asm\n')
toy.asm(Path(join(dot_path, 'test.asm')))
toy.listing()

print('############################## FIBONACCI #############################')
print()

print('#' * 10, 'Assemble fibonacci.asm\n')
toy.asm(Path(join(dot_path, 'fibonacci.asm')))
toy.listing(join(drive_path,'fibonacci.toy'))
toy.listing()

print('#' * 10, 'Run fibonacci.toy\n')
toy.run( 0x40 )


################################ TEST ################################

########## Assemble test.asm

           #################################################################
           #
           # Test of random other code...
           #
           # NOT A VALID STATEMENT, SO IT IS A COMMENT
           : start                             # can be upper / lower case
           R[1] <- 1
10: 7101
           IF<>0 R[D]
11: CD18
               M[ COUNT ] <- R[D]  # test
12: 9D29
               R[D] <- M[ R[D] ]   # test
13: AD0D
               M[ R[D] ] <- R[D]   # test
14: BD0D
               R[D] <- M[ COUNT ]  # test
15: 8D29
               R[2] NOOP_TOKENS M[ COUNT ARE_NOT_CHECKED   # test
16: 8229
           ELSE
17: C027
               R[A] <- R[B] + R[C]
18: 1ABC
               R[A] <- R[B] - R[C]
19: 2ABC
               R[A] <- R[B] & R[C]
1A: 3ABC
               R[A] <- R[B] ^ R[C]
1B: 4ABC
               R[A] <- R[B] << R[C]
1C: 5ABC
               R[A] <- R[B] >> R[C]
1

## Note

The [repository](https://github.com/dcpetty/google-colaboratory/tree/main/toy) for this [Google Colab Notebook](https://colab.research.google.com/) contains files copied from within this notebook (`fibonacci.asm`, `test.asm`, `test.py`, `test.txt`, &amp; `toy.py`) and one file generated by this notebook (`fibonacci.toy`).
<pre>--
David C. Petty / Winchester High School Computer Science / Mathematics / STEM
<a href="mailto:dpetty@winchesterps.org">&lt;dpetty@winchesterps.org&gt;</a> / <a href="http://j.mp/wpsdpetty">http://j.mp/wpsdpetty</a> / +1.781.629.9778 / @wpsdpetty</pre>